Robots.txt

  • Thread starter Thread starter Fred
  • Start date Start date
F

Fred

Could any kind person show me EXACTLY what one should put on the Robots.txt
(notepad) to ALLOW all search engine spiders to crawl and index the whole
website. Fred
 
http://www.searchengineworld.com/robots/robots_tutorial.htm
--
===
Tom "Pepper" Willett
Microsoft MVP - FrontPage
---
About FrontPage 2003:
http://office.microsoft.com/home/office.aspx?assetid=FX01085802
FrontPage 2003 Product Information:
http://www.microsoft.com/office/frontpage/prodinfo/default.mspx
Understanding FrontPage:
http://msdn.microsoft.com/office/understanding/frontpage/
FrontPage 2002 Server Extensions Support Center:
http://support.microsoft.com/default.aspx?scid=fh;en-us;fp10se
===
| Could any kind person show me EXACTLY what one should put on the
Robots.txt
| (notepad) to ALLOW all search engine spiders to crawl and index the whole
| website. Fred
|
|
 
The purpose of a robots.txt file is to exclude robots from your site or from indexing certain
content. If you don't have one then your site is open to all robots.

--
==============================================
Thomas A. Rowe (Microsoft MVP - FrontPage)
WebMaster Resources(tm)

FrontPage Resources, WebCircle, MS KB Quick Links, etc.
==============================================
If you feel your current issue is a results of installing
a Service Pack or security update, please contact
Microsoft Product Support Services:
http://support.microsoft.com
If the problem can be shown to have been caused by a
security update, then there is usually no charge for the call.
==============================================
 
Exactly. If you want all searchbots to access and index your site freely:
don't configure a robots.txt
 
Hi,

You probably wouldn't to let every spider go anywhere on your site, as a
first step it's a good idea to block your /images directory just to save the
bandwidth. Also there's a lot of spiders out there who will crawl you
without any benifit to you - it's a good idea to block them both to save the
bandwidth and to speed up your site for paying customers.
 
However, "bad" spiders generally do not honor the robots.txt file even when then read it.

--
==============================================
Thomas A. Rowe (Microsoft MVP - FrontPage)
WebMaster Resources(tm)

FrontPage Resources, WebCircle, MS KB Quick Links, etc.
==============================================
If you feel your current issue is a results of installing
a Service Pack or security update, please contact
Microsoft Product Support Services:
http://support.microsoft.com
If the problem can be shown to have been caused by a
security update, then there is usually no charge for the call.
==============================================
 
User-agent: *
Disallow:

These two lines will "disallow nothing", i.e., "allow everything" for SE
spiders.

If you'd rather prevent spiders from indexing some of your directories
or files:

User-agent: *
Disallow: error.htm
Disallow: /cgi-bin/

The first line above disallows (excludes) an individual file from being
indexed.
The second line excludes a directory off the root directory (cgi-bin).
 
Back
Top