Difference between revisions of "Talk:Robots.txt"

From Archiveteam
Jump to navigation Jump to search
(Discussing the decision to ditch robots.txt)
 
Line 9: Line 9:
== Issues to solve ==
== Issues to solve ==


* Robots.txt exist only in the root of a website which makes it unusable in multi-owned web site. For example, it doesn't address the differences between example.org/mike and example.org/suzie
* Robots.txt exists only in the root of a website which makes it unusable in multi-owned web site. For example, it doesn't address the differences between example.org/mike and example.org/suzie
* Robots.txt makes the directory visible, when it should not be necessary the case. Reveal to hide.
* Robots.txt makes the directory visible, when it should not be necessary the case. Reveal to hide.
* A protocol should help to target classes of user-agents (browser, bots, etc.) in a way that the Web site can be public and at the same time not indexable.  
* A protocol should help to target classes of user-agents (browser, bots, etc.) in a way that the Web site can be public and at the same time not indexable.  


[[User:Karlcow|Karlcow]]
[[User:Karlcow|Karlcow]]

Revision as of 13:26, 10 May 2011

There are a few issues with this decision which have strong consequences.

  • Robots.txt is an established protocol. Changing the meaning of it will lead to failed expectations from users.
  • Robots.txt is a simple (dumb) protocol which targets indexing, harvesting bots without having to rely on user agent sniffing.
  • Robots.txt is a very simple mechanism for managing a certain level of opacity.

The only way to move forward is not to ditch robots.txt but to create something better and develop tools and protocols which help people move forward. Let's give a better control to user. We could start something with a W3C community group.

Issues to solve

  • Robots.txt exists only in the root of a website which makes it unusable in multi-owned web site. For example, it doesn't address the differences between example.org/mike and example.org/suzie
  • Robots.txt makes the directory visible, when it should not be necessary the case. Reveal to hide.
  • A protocol should help to target classes of user-agents (browser, bots, etc.) in a way that the Web site can be public and at the same time not indexable.

Karlcow