From Archiveteam
Revision as of 12:51, 10 May 2011 by Karlcow (talk | contribs) (Discussing the decision to ditch robots.txt)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

There are a few issues with this decision which have strong consequences.

  • Robots.txt is an established protocol. Changing the meaning of it will lead to failed expectations from users.
  • Robots.txt is a simple (dumb) protocol which targets indexing, harvesting bots without having to rely on user agent sniffing.
  • Robots.txt is a very simple mechanism for managing a certain level of opacity.

The only way to move forward is not to ditch robots.txt but to create something better and develop tools and protocols which help people move forward. Let's give a better control to user. We could start something with a W3C community group.

Issues to solve

  • Robots.txt exist only in the root of a website which makes it unusable in multi-owned web site. For example, it doesn't address the differences between and
  • Robots.txt makes the directory visible, when it should not be necessary the case. Reveal to hide.
  • A protocol should help to target classes of user-agents (browser, bots, etc.) in a way that the Web site can be public and at the same time not indexable.