Welcome Login

Robots.txt file


robots.txt file give instructions about their site to web robots, using 'the robots exclusion protocol'.

How it works:

When a robot visits a website, it first checks for a robots.txt file in the root directory.


User-agent: are search engine robots/crawlers/spiders
Disallow: lists the files and directories to be excluded from indexing.
Allow: lists allowed files and dir, but not all robots use this!


Two important things to be mindful about:

1. robots, esp malware robots can ignore your /robots.txt file.

2. this file is publically available.  Anyone can see which sections of your website you don't want robots to use.

Don't try to use /robots.txt to hide information.


To exclude all robots from the entire site (Hides website from search engines):

User-agent: *
Disallow: /


To allow all robots complete access:

User-agent: *

(or just create an empty "/robots.txt" file, or don't use one at all!)


To exclude all robots from some directories on server:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/


To exclude a single robot:

User-agent: MalwareBot
Disallow: /


To allow a single robot:

User-agent: Google

User-agent: *
Disallow: /


You can explicitly disallow some pages:

User-agent: *
Disallow: /~andrew/a.html
Disallow: /~andrew/b.html

Created on: Wednesday, November 9, 2011 by Andrew Sin