Thursday, June 19, 2008

How to Write a Robots.txt File

Here's How:

1. In a text editor, open a file named robots.txt. Note that the name must be all lower case, even if your Web pages are hosted on a Windows Web server. You'll need to save this file to the root of your Web server. For example:

http://searchengine--optimization.blogspot.com/robots.txt

2. The format of the robots.txt file is

User-agent: robot
Disallow: files or directories

3. You can use wildcards to indicate all robots, or all robots of a certain type. For example:
To specify all robots:

User-agent: *

To specify all robots that start with the letter A:

User-agent: A*

4. The disallow lines can specify files or directories:
Don't allow robots to view any files on the site:

Disallow: /

Don't allow robots to view the index.html file

Disallow: /index.html

5. If you leave the Disallow blank, that means that all files can be retrieved, for example, you might want the Googlebot to see everything on your site:

User-agent: Googlebot
Disallow:

6. If you disallow a directory, then all files below it will be disallowed as well.

Disallow: /norobots/

7. You can also use multiple Disallows for one User-agent, to deny access to multiple areas:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/

8. You can include comments in your robots.txt file, by putting a pound-sign (#) at the front of the line to be commented:

# Allow Googlebot anywhere
User-agent: Googlebot
Disallow:

9. Robots follow the rules in order. For example, if you set googlebot specifically in one of your first directives, it will then ignore a directive lower down that is set to a wildcard.

# Allow Googlebot anywhere
User-agent: Googlebot
Disallow:
# Allow no other bots on the site
User-agent: *
Disallow: /

Tips:

1. Find robot User-agent names in your Web log
2. Always follow the capitalization of the agent names and the file and directories. If you disallow /IMAGES the robots will spider your /images folder
3. Put your most specific directives first, and your more inclusive ones (with wildcards) last

for more detail visit: http://webdesign.about.com/od/promotion/ht/htrobotstxt.htm

No comments:

Post a Comment