Geekflare is supported by our audience. We may earn affiliate commissions from buying links on this site.
Share on:

How to Stop ChatGPT Plugins from Scraping Your Website Content

How-to-Stop-ChatGPT-Plugins-from-Scraping-Your-Website-Content
Invicti Web Application Security Scanner – the only solution that delivers automatic verification of vulnerabilities with Proof-Based Scanning™.

Chat Generative Pre-trained Transformer (ChatGPT) has been a common name in the internet space since its launch in November 2022. ChatGPT, created by OpenAI, is a language model that uses deep learning techniques to respond naturally based on user inputs. 

The introduction of ChatGPT and similar artificial intelligence technologies have brought mixed feelings to the internet space. On one side, they are users who deeply appreciate the technology and use it to become more productive and perform different tasks. On the other hand, a group feels threatened that ChatGPT will take their jobs away.

However, our focus will not be on the two groups named above but on content creators/ website owners. This article will discuss why one may consider not giving ChatGPT access to their website, introduce ChatGPT plugins, and how to stop these plugins from accessing a site. 

What ChatGPT plugins?

What-ChatGPT-plugins

When ChatGPT was first introduced, content creators quickly produced a lot of content explaining how to use this new technology. The internet and social media spaces are filled with content on how to use ChatGPT. 

On the other hand, developers at OpenAI and individuals have also been working tirelessly to improve this technology. ChatGPT has various versions; as of this writing, ChatGPT-4 is the most recent. The introduction of the support for plugins is one of the recently introduced improvements. 

ChatGPT plugins are tools or custom modules that can be integrated into this language model to improve its functionality and enhance the experience. The plugins are designed to work with ChatGPT, offer more personalized results, and improve functionality. 

OpenAI has so far created two plugins, a web browser, and a code interpreter. However, it has also allowed selecting developers to create 3rd party plugins based on its documentation. 

How do ChatGPT plugins impact SEO and website owners? 

The “fair” use of website content has been a hot debate since the introduction of ChatGPT in late 2022. This debate is not new, as it has existed since the internet’s invention. 

Some website owners feel ChatGPT is “killing” all their SEO efforts. We have ChatGPT plugins that scrap content on the internet and give a response based on the data collected. The current ChatGPT browser plugin uses Bing API to search the internet, summarize the answers and give links to the sources. 

If the user is satisfied with the answer, he or she might not see the need to visit your website, meaning someone has used your content, but you don’t benefit from ad revenue or even commissions from affiliate links. If the user needs to learn more, they can always visit your website with the link provided. 

Sad, right? Normally, if another website uses your content for research purposes, it should indicate it on its platform and link back to your site. 

How to use robots.txt to stop giving website content access to ChatGPT

ChatGPT plugins use the ChatGPT-User bot. Unless instructed otherwise, the ChatGPT-User bot will assume permission to scrap content from your website. It is worth noting that the bot is not designed to crawl content automatically. However, it is designed to take direct actions on behalf of the ChatGPT users. 

OpenAI’s official documentation guides website owners on how to stop ChatGPT plugins from crawling their content. The only changes you need to make are to your robots.txt. 

To check if you have the robots.txt file on your website, add /robots.txt to your domain.

For instance, the robots.txt file for www.example.com can be found as www.example.com/robots.txt. 

How to open robots.txt file for editing

The approach you will take will depend on the nature of the website that you have. We will explore how to edit WordPress-based websites, custom-hosted websites, and web-flow websites;

WordPress-based website

You can use a plugin such as Yoast SEO. Follow these steps:

  • When logged in to your WordPress site, click “Yoast SEO” from the left menu.
  • Click on “Tools” on its drop-down menu.
  • Click on “File Editor”
  • If you already have a robots.txt, you will see it there, click on it, and you are ready to edit it. 

Custom hosted website

If you have a website created from scratch, you can also edit its robots.txt file. Follow these steps;

  • Access the website’s Files Manager through a web interface like FTP, Plesk, or cPanel.
  • Navigate to the root folder. 
  • Create or open a robots.txt file in readiness for the next steps. 

Webflow-based website

Follow these steps;

  • Login into your Webflow website
  • Go to “Settings”
  • Click on the “SEO” tab and then “Indexing”
  • Open the robots.txt file in readiness for the next step.

Once you have found this file, you can now block ChatGPT plugins. You can take two approaches;

  • Block the entire website: This instructs ChatGPT-User not to crawl your entire website. Open the robots.txt file and add the following two lines of code;
User-agent: ChatGPT-User

Disallow: /
  • Block sections of your website: If you have several pages on your website, you can decide to allow ChatGPT plugins to access only certain sections. You can implement this by following this example;
User-agent: ChatGPT-User

Disallow:

Allow: /directory-1/

Allow: /directory-2/

In the above example, ChatGPT plugins, which use ChatGPT-User, can crawl directory-1 and directory-2. However, all the other sections of your website will not be crawled by ChatGPT plugins. 

Note: When you open robots.txt, don’t delete the contents but add the provided lines of code at the end. 

Should you block ChatGPT plugins from accessing your site?

The plagiarism and copyright debate will never end. As such, whether to allow the OpenAI bot to access your website is a personal choice. People will always have mixed reactions when a new technology is introduced. 

On one side, you may feel that chat plugins deny you traffic, which translates to less money. On the other side, OpenAI says on its website that it will cite all the sources when its plugins pull data from third-party websites. 

I typed this search query, “What are ChatGPT plugins,” using Microsoft New Bing and got the results shown in this screenshot;

Screenshot-from-2023-04-04-12-01-45

As you can see, the results are summarized and give five sources. 

Frequently Asked Questions

What is the difference between ChatGPT plugins and third-party plugins?

ChatGPT supports two types of plugins, its own plugins and third-party plugins. Just as the names suggest, ChatGPT’s own plugins have been created by engineers from OpenAI. 
So far, the team has created web browsers and code interpreter plugins.
On the other hand, third-party plugins are created by third-party developers from different companies. These plugins have been pre-approved and are meant to add extra functionalities to ChatGPT. 

Can I access ChatGPT plugins on the free plan?

No. ChatGPT plugins are only available to ChatGPT Plus subscribers. On top of access to plugins, the paid package offers a faster response rate, accessibility even during peak hours, and priority access to new features. 

What are web scrapers?

Web scrapers are scripts/ programs that automate extracting data from the web. Also known as web crawlers, web scrapers visit websites and analyze their data and extra relevant information. These programs can be used for market research, content aggregation, data mining, and price comparisons. 

Is web scraping legal?

The legality of web scraping is often a hotly-discussed topic that attracts divergent views. As a rule of thumb, scrapping publicly available information for personal use is not illegal.
However, scrapping information for commercial gain or scraping copyrighted content is illegal. Whether it is legal or illegal to scrap website content will depend on the nature of the content in question. 

I have disabled ChatGPT plugins from crawling my website. Will this impact SEO? 

No. However, you should ensure that you don’t disable search engine bots such as Bingbot and Googlebot, as that is what search engines use to crawl your content. You can also disable other unwanted bots to increase load speed and prevent your content from theft. 

Who can use ChatGPT? 

There is no restriction on who can use ChatGPT as of the time of writing. You can go to openai.com website, create a free account and start using the playground. You can use this technology to write code, interpret code, generate copy for websites and social media pages, and write poems, songs, and speeches. However, the nature of the output will depend on the inputs you give to this language model.

Conclusion 

We now hope you understand ChatGPT plugins, how they work, how to stop them from crawling your website, and the implications. Changes have far-reaching effects and will always break users into groups, the proponents and opposers. 

Artificial intelligence has been with us for years. However, most people have never realized this. Examples of programs using artificial intelligence are Siri, available on Apple devices, and Grammarly, a writing assistant that checks grammatical errors and plagiarism. 

The truth is that ChatGPT, Bard, and other similar artificial intelligence are not going away any time soon. From our tests, you can use ChatGPT-4 on various use instances. However, you must know how to guide such technologies to get the desired outputs with the right inputs. 

Thanks to our Sponsors
More great readings on Development
Power Your Business
Some of the tools and services to help your business grow.
  • Invicti uses the Proof-Based Scanning™ to automatically verify the identified vulnerabilities and generate actionable results within just hours.
    Try Invicti
  • Web scraping, residential proxy, proxy manager, web unlocker, search engine crawler, and all you need to collect web data.
    Try Brightdata
  • Semrush is an all-in-one digital marketing solution with more than 50 tools in SEO, social media, and content marketing.
    Try Semrush
  • Intruder is an online vulnerability scanner that finds cyber security weaknesses in your infrastructure, to avoid costly data breaches.
    Try Intruder