Difference between revisions of "List of websites excluded from the Wayback Machine"

From Archiveteam
Jump to navigation Jump to search
(add revcode.se)
(Added MediaFire.com at the bottom. Should be sorted into the list automatically by user:JAABot. + adding introduction.)
Line 1: Line 1:
There are two ways webmasters keep the Wayback Machine out of their website: Through [[robots.txt]] ia_archiver exclusion (“user-agent:ia_archiver  disallow:/”) or through a manual exclusion request.
While the first, more common way of exclusion shows “This page cannot be crawled or displayed due to Robots.txt” when trying to access it through the Wayback Machine, the second way displays “This page has been excluded from the Wayback Machine”.
* https://www.11alive.com/
* https://www.11alive.com/
* https://www.12news.com/
* https://www.12news.com/
Line 652: Line 656:
* https://www.zippyshare.com/
* https://www.zippyshare.com/
* http://zmx.jp/ <!--dead (Mar 8 2019)-->
* http://zmx.jp/ <!--dead (Mar 8 2019)-->
* http://mediafire.com/robots.txt (robots.txt exclusion only) <!-- Maybe put robots.txt Archive enemies into a separate feud list?-->

Revision as of 13:48, 26 April 2019

There are two ways webmasters keep the Wayback Machine out of their website: Through robots.txt ia_archiver exclusion (“user-agent:ia_archiver disallow:/”) or through a manual exclusion request.

While the first, more common way of exclusion shows “This page cannot be crawled or displayed due to Robots.txt” when trying to access it through the Wayback Machine, the second way displays “This page has been excluded from the Wayback Machine”.