Difference between revisions of "List of websites excluded from the Wayback Machine"

From Archiveteam
Jump to navigation Jump to search
Line 1: Line 1:
There are two ways webmasters keep the Wayback Machine out of their website: through [[robots.txt]] ia_archiver exclusion (“user-agent:ia_archiver  disallow:/”) or through a manual exclusion request.
This page collects sites that are manually excluded from the Wayback Machine. When a site is manually excluded, attempting to access it returns the error “This page has been excluded from the Wayback Machine”. This page does not track websites that disallow IA crawlers in their robots.txt file or block them.
 
While the first, more common way of exclusion shows “This page cannot be crawled or displayed due to Robots.txt” when trying to access it through the Wayback Machine, the second way displays “This page has been excluded from the Wayback Machine”.
 
This page collects only the latter cases.


* https://www.11alive.com/
* https://www.11alive.com/

Revision as of 14:31, 26 April 2019

This page collects sites that are manually excluded from the Wayback Machine. When a site is manually excluded, attempting to access it returns the error “This page has been excluded from the Wayback Machine”. This page does not track websites that disallow IA crawlers in their robots.txt file or block them.