Tuesday, 15 July 2014

Add Custom Robots.txt File in Blogger

how to Add Custom Robots.txt File in Blogger

 

What is Robots.txt?

With every blog that you create/post on your site, a related Robots.txt file is auto-generated by Blogger. The purpose of this file is to inform incoming robots (spiders, crawlers etc. sent by search engines like Google, Yahoo) about your blog, its structure and to tell whether or not to crawl pages on your blog. You as a blogger would like certain pages of your site to be indexed and crawled by search engines, while others you might prefer not to be indexed, like a label page, demo page or any other irrelevant page.


How do they see Robots.txt?

Well, Robots.txt is the first thing these spiders view as soon as they reach your site. Your Robots.txt is like a hour flight attendant, that directs you to your seat and keep checking that you don't enter private areas. Therefore, all the incoming spiders would only index files that Robots.txt would tell to, keeping others saved from indexing.


Where is Robots.txt located?

You can easily view your Robots.txt file either on your browser by adding /robots.txt to your blog address like http://myblog.blogspot.com/robots.txt or by simply signing into your blog and choosing Settings > Search engine Preference > Crawlers and indexing and selecting Edit next to Custom robots.txt.


blogger custom robots


How Robots.txt does looks like?

If you haven't touched your robots.txt file yet, it should look something like this:

User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://myblog.blogspot.com/feeds/posts/default?orderby=UPDATED
Don't worry if it isn't colored or there isn't any line breaks in code, I colored it and placed line breaks so that you may understand what these words mean.

User-agent:Media partners-Google:
Mediapartners-Google is Google's AdSense robot that would often crawl your site looking for relevant ads to serve on your blog or site. If you disallow this option, they won't be able to see any ads on your specified posts or pages. Similarly, if you are not using Google AdSense ads on your site, simply remove both these lines.

User-agent: *
Those of you with little programming experience must have guessed the symbolic nature of character '*' (wildcard). For others, it specifies that this portion (and the lines beneath) is for all of you incoming spiders, robots, and crawlers.

Disallow: /search
Keyword Disallow, specifies the 'not to' do things for your blog. Add /search next to it, and that means you are guiding robots not to crawl the search pages /search results of your site. Therefore, a page result like http://myblog.blogspot.com/search/label/mylabel will never be crawled and indexed.

Allow: /
Keyword Allow specifies 'to do' things for your blog. Adding '/' means that the robot may crawl your homepage.

Sitemap:
Keyword Sitemap refers to our blogs sitemap; the given code here tells robots to index every new post. By specifying it with a link, we are optimizing it for efficient crawling for incoming guests, through which incoming robots will find path to our entire blog posts links, ensuring none of our posted blog posts will be left out from the SEO perspective.

However by default, the robot will index only 25 posts, so if you want to increase the number of index files, then replace the sitemap link with this one:

Sitemap: http://myblog.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500
And if you have more than 500 published posts, then you can use these two sitemaps like below:

Sitemap: http://myblog.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: http://myblog.blogspot.com/atom.xml?redirect=false&start-index=500&max-results=1000


How to prevent posts/pages from being indexed and crawled?

In case you haven't yet discovered yourself, here is how to stop spiders from crawling and indexing particular pages or posts:

Disallow Particular Post

Disallow: /yyyy/mm/post-url.html
The /yyy/mm part specifies your blog posts publishing year and month and /post-url.html is the page you want them not to crawl. To prevent a post from being indexed/crawled simply copy the URL of your post that you want to exclude from indexing and remove the blog address from the beginning.

Disallow Particular Page

To disallow a particular page, you can use the same method as above. Just copy the page URL and remove your blog address from it, so that it will look something like this:

Disallow: /p/page-url.html


Adding Custom Robots.Txt to Blogger

Now let's see how exactly you can add Custom Robots.txt file in Blogger:

1. Sign in to you blogger account and click on your blog.
2. Go to Settings > Search Preferences > Crawlers and indexing.


blogger custom robots.txt

3. Select 'Edit' next to Custom robots.txt and check the 'Yes' check box.
4. Paste your code or make changes as per your needs.


custom robots.txt

5. Once you are done, press Save Changes button.
6. And congratulations, you are done!


How to see if changes are being made to Robots.txt?

As explained above, simply type your blog address in the url bar of your browser and add /robots.txt at the end of your url as you can see in this example below:

http://helplogger.blogspot.com/robots.txt
Once you visit the robots.txt file, you will see the code which you are using in your custom robots.txt file. See the below screenshot:


custom robots


Final Words:

Are we through then bloggers? Are you done adding the Custom Robots.txt in Blogger? It was easy, once you knew what those code words meant. If you couldn't get it for the first time, just go again through the tutorial and before long, you will be customizing your friends' robots.txt files.

 


Respected Readers:
As a 20 year old student, the only way I make income is through pocket money . Rearing and running costs of doing research on blogging has been difficult for me. We educate thousands of bloggers a week through our tutorials . to help us go forward with the same spirit, A contribution from your side will highly be appreciated.

Sifiso Nkwanyana

Author & Editor

Has laoreet percipitur ad. Vide interesset in mei, no his legimus verterem. Et nostrum imperdiet appellantur usu, mnesarchum referrentur id vim.

0 comments:

Post a Comment

Although every comment is appreciated, due to time limitations I might not be able to respond to everyone. Keep it in mind that we respect all comments by our readers but all spam comments and spam links will be deleted. Thanks for understanding!