Learning how to Create and Configure the Robots.txt File will help you tell search engine spiders which parts of your website to crawl.
“Spiders” are nothing more than search robots or bots (crawlers) that crawl all types of web pages in order to search for a specific characteristic in each of them according to the type of related bot. Bots can crawl everything from new content to changes that have been made to link structures. But sometimes there are certain types of changes that you want to hide from the tracking of these bots. And it is in these cases when the txt file is useful.
But before going straight to configuring it, the ideal is to know a little more about what the robots.txt file is and what it is for.
WHAT IS IT AND WHAT IS IT FOR?
The robots.txt file is actually a file with plain text features that can be created very easily using notepad under the name robots.txt.
All the information that is read by search engine spiders and that allows parts of a website to be crawled is found within these files.
Thanks to this type of files, the robots know which pages to crawl or index. In addition, this file contains information about which areas of the website are allowed and which areas should not be entered.
Among the functionalities that the robots.txt file fulfills we have:
- Indicates the map directory of your system, facilitating web indexing.
- Prevents deleted URLs that give 404 errors from being crawled.
- Block specific bots from entering your website and prevent them from accessing your files.
- Deny search engines access to some directories and pages on your site.
- Reduces the amount of resources consumed by the server.
One vital piece of information to keep in mind about this type of file is that some malicious and illegal bots can overlook it and try to access the information. After all, it is a public file that you can see reflected on other websites by placing a good Hire a biography writer it at the end of the /robots.tx domain. So it is best not to use it to hide any type of private information from search engines, since there is a risk that someone who enters your file could see what you are trying to hide.
CREATE THE ROBOTS.TXT FILE IN WORDPRESS
To create the robots.txt file in WordPress you just have to open a notepad file and save it under the name robots.txt. After this you simply have to upload it using cPanel or Filezilla to the root directory of the domain.
Usually, this type of file already comes with a creator, but another option to do it is through the Yoast SEO plugin. You have to go to the “Tools” option, then click on “File Editor”, and from there you can create or modify robots.txt.
STEPS TO CONFIGURE ROBOTS.TXT FILE
There is no perfect general robots.txt file setup for all pages. In reality, each website will use this file to block what suits them best. However, by downloading the standard robots.txt file for WordPress and modifying it based on the following:
In #First block we removed the ability to track folders, fedd, tags, comments, searches and more.
Within this First block we make the specifications to allow all User-agent: * (bots) to access AJAX, while at the same time denying the possibility to directories that we do not want to be crawled such as search pages or internal WordPress type. In this block you will find for modification:
- User-agent: *
- Allow: /wp-admin/admin-ajax.php
- Disallow: /wp-login
- Disallow: /*/feed/
- Disallow: /*/trackback/
- Disallow: /wp-admin
- Disallow: /*/attachment/
- Disallow: *?replytocom
- Disallow: /author/
- Disallow: /tag/*/page/
- Disallow: /comments/
- Disallow: /tag/*/feed/
- Disallow: /xmlrpc.php
- Disallow: /*/*/*/feed.xml
- Disallow: /*?s=
- Disallow: /?attachment_id*
- Disallow: /search
In accessing the #Second block we can unlock CCS and JS resources with:
- #Second block
- User-Agent: Googlebot
- Allow: /*.css$
- Allow: /*.js$
And finally, upon reaching the #Third Block we can add the URL of the XML system map file in order to guide the bots to the content they should crawl.
You have the possibility to add more than one option by going to:
- Sitemap: http://www.tudominio.com/sitemap.xml
If you have doubts about whether or not you have a system within your system and the URL that corresponds to it, go to Crawl – Sitemaps in Google Console. And if by chance you don’t have a system, with the Yoast SEO plugin you can create it.
Robots.txt tester
Then finish creating your robots.txt file and save it correctly in the root directory of your page. The last step is to check that everything works well and that the robots access the remaining pages of the website without problems. You can do the verification through the Google Search Console tool. Click on Google Console and access “ebook writers” – “Robot Tester.txt”. There you should see what you placed in the tobot.txt.
If you don’t see anything, you just have to press the send button that appears in the third step, which is the one that requests Google to update the system. After pressing the button and requesting the update, you simply have to click on the red “Try” button. And if now everything works perfectly, then an “Allowed” message will appear.
As a final tip about the robots txt file, you should keep in mind that this file class accepts simple command protocols, which in reality are few and you can review them in Robot Exclusion Standard. Also, do not use commands that are different from those allowed to avoid possible problems. And if you need an expert in programming and robots.txt file, at the digital marketing agency in Sabadell, MarketBoom we have web and programming experts who will optimize all the details of your website 100%.