What is Robots.txt and how to verify robots.txt file?
New bloggers and marketers always feel uncomfortable with the robots.txt because the robots.txt is a vital file which is somewhat confusing for bloggers.
If you are new bloggers then mostly I think you have gone through the robots.txt but do you know how to optimize the robots.txt file and how to verify the robots.txt file.
If you don’t know about robots.txt and its verification process, then don’t worry I will explain in here in this article.
Frankly saying, the robots.txt file is an important file for bloggers, and if you know how to optimize robots.txt, then you can get some good results in indexing your page.
Most of the bloggers do not mess with the robots.txt file because they know it can harm their site if they do anything wrong to that robots.txt file.
If you ever do robots.txt file in a wrong way, then you can simply lose all your present traffic, and also you can’t the benefits of the search engine rankings as your pages will be deindexed.
Even after putting lots of efforts in maintain your site and writing the best user-friendly and the seo helpful article you will not have a good amount of traffic and its benefits because of your wrongly written robots.txt.
Simply robots.txt can block pages and resources from search engines and tells to search engine that please don’t look into this content, if at all you have messed up the robots.txt
So optimizing robots.txt can improve your sites indexability, and also it will have more benefits. So you should know about the robots.txt and how to optimize the robots.txt.
So, let’s dive into the topic
What is robots.txt?
Robots.txt is nothing but a text file of your site, which contains a proper set of instruction for Google bots and googles spiders. This was the file which says Google bots to crawl a web page or to block a web page.
Totally, you can either block a part of your site or you can block your whole site from search engines.
That’s what robots.txt file is, it is the important file for any website when it comes to crawling a website and other seo aspects. If you placed a wrong robots.txt on your site, then you will have a bad time with search engines.
Working of robots.txt
Working of robots.txt is relatively easy. Before any search engine bots or crawlers enter into your site, it will check information whether to check your site or not from your robots.txt file. If your robots.txt files say it crawl your site then Google, crawlers will crawl your site or else it will just leave your site and move away.
With this robotx.txt you can keep restriction to the Google search engines to restrict some of your private pages to get indexed in Google.
Ex: If you block your “about us page” in the robots.txt file then Google can’t access the robots.txt, and it simply can’t index you’re about us page to show in search results.
Each and every site will have the robots.txt file. If you don’t know how to see that then here is it
How to find robots.txt?
Robots.txt file is located at the “http://www.tipsclear.com/robots.txt/” by this link you can see the robots.txt file of any site.
But you should replace the domain name with the website name so see their robots.txt files. This command can help you see robots.txt files of any internet site.
Primary robots.txt files examples:
Here are some of the basic robots.txt setups
Allows full access to search engines.
User-agent: *
Disallow:
Disallows full access to search engines
User-agent: *
Disallow: /
These are two frequently seen robots.txt file. Now let’s see how to make a robots.txt.
How to make a robots.txt?
The robots.txt is a text file so that you can use any basic text editor like notepad to make your robots.txt file. You can also create the robots.txt file in a code editor.
You can either type or copy and paste so that you can make a robots.txt file.
Basically, what should you say to robots.txt will totally depend on you?
If you to allow Google bots to access your complete site, then you can keep
User-agent: *
Disallow:
This allows all bots to all the files on your website, which means you have given access to each and every post and page on your site for the search engines bots to crawl.
But if you want to restrict Google bots then
User-agent: *
Disallow: /
This restricts access to all the bots from all files. This command states that Google and other search engines will not index or display your web pages on their search results.
It is not recommended because you can’t get your site indexed in search engines with this result.
To have a clear view over robots.txt you have to know some terms.
Important terms in robots.txt
User-agent
The user agent is a term which is used to instruct the rules and regulations that web crawlers should follow.
Ex: User-agent: *
This rule instructs the crawlers to crawl your website. Here * refers to all to “all user agents.”
Disallow
Disallow is the command which says the crawlers to restrict and not to index those parts.
User-agent: *
Disallow: /wp-admin/
This command says all the web crawlers to disallow and not to index all the files that are in “wp-admin.”
Allow: Allow is the command which is used to give permission to access particular file or folder.
Sitemap
If you include a sitemap in robots.txt, it might be helpful for search engines to crawl your sitemap.
Ex: If you want to show your sitemap then you should be using sitemap command.
Sitemap: http://www.tipsclear.com/sitemap.xml
Sitemap: http://www.tipsclear.com/sitemap_index.xml
How to verify robots.txt?
To verify robots.txt, just hover over to the Google search console, and you’ll find a tool to test your robots.txt file.
If you have any restricting problems in your robots.txt, it will find them as errors.
So, that you can recheck and optimize your robots.txt
To check that go to google search console dashboard – Crawl Option — Robots.txt tester.
Optimized robots.txt
Sitemap: http://www.domainame.com/sitemap.xml
Sitemap: http://www.doaminame.com/sitemap-image.xml
User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
Disallow: /wp-content/plugins/
User-agent: NinjaBot
Allow: /
User-agent: Mediapartners-Google*
Allow: /
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
This is the optimized robots.txt where it contains every command which makes your site better, the sitemap is included at the top to notify Google bots, it is best for faster indexing.
Conclusion:
This is all about the robots.txt; I hope you have a clear view on robots.txt now if you have anything to say about this post you can comment us below.