Importance Of Using Robots.txt File

June 25, 2009

If you have a well designed and well optimized website with keyword rich content to attract visitors and search engines that is really great but you are missing something that is very important. Do you know what that is? That is Robots.txt file.
Robots.txt file has lots of importance as it allows spiders or crawlers to allow or disallow to crawl all pages of a website or a particular webpage. Sometimes people have some confidential data on their website and by using robots.txt file they can restrict crawlers or spider to not crawl or index that particular page so no one can reach on that page and in this way confidential data on that page will be secure.
While going to crawl a website or a webpage search engine spiders or crawlers often look for a special file that is called Robots.txt file because through robots.txt file search engine spiders or crawlers come to know about which web pages of that website have to crawl or index and which web pages have to ignore.
Robots.txt file is a simple text file that must be placed in root directory of a website. For Example:
Robots.txt file must be like as-
http://www.abc.com/robots.txt

Creating Robots.txt File:
As mentioned above, robots.txt file is a simple text file and you can create it by opening a simple text editor like notepad. The data or a command mentioned in robots.txt file is called as "records".
A record includes the information of a particular search engine and each record have two fields- User agent where you mention the robots or spider name and other field is disallow lines that may be one or more where you have to mention that which pages or files have to be ignored. For Example:
User-agent: googlebotDisallow: /cgi-bin/
In above example robots.txt file allows "googlebot" that is the search engine spider of major search engine Google to crawl each and every page of the website except of files from "cgi-bin" directory. Means googlebot have to ignore all files from "cgi-bin"directory.
And if you enter like below:
User-agent: googlebotDisallow: /support
Googlebot will not crawl any file from support directory as robots.txt file has instructions to googlebot to not crawl any file from support directory.
In case you leave disallow field blank then it will indicate to googlebot to crawl all files of the website. But in any case you must have a disallow field for every user agent.
The all above example were only for googlebot but if you want to give same rights to all other search engine's spiders then use asterisk (*) instead of googlebot in user agent field. For example:
User-agent: *Disallow: /cgi-bin/
In above example * represents all search engine spiders and robots.txt file above allows all search engine spiders to crawl each and every page of the website except of files from "cgi-bin" directory. Means all spiders from different search engines have to ignore all files from "cgi-bin"directory.
If you want to know user agent names for other search engines then you can find it in your log files by checking for requests to robots.txt. Most often, all search engine spiders should be given the same rights. in that case, use User-agent: * as mentioned above.

Comments

sunilJuly 3, 2009 at 2:39 AM
Hi

I m seo and your blog is very nice content is very good.
i told my friends about your blogs

thanks
search-seo-info
ReplyDelete
Replies

Add comment

Search This Blog

SEO Freelancer in Delhi, India : Narender SEO

Importance Of Using Robots.txt File

Comments

Post a Comment

Popular posts from this blog

Social media Optimization Secrets

Google Pushing Out Panda Update 3.9

SEO suggestion: canonicalization issues