Difference between revisions of "User:Start"

From Archiveteam
Jump to navigation Jump to search
m (→‎Items: test)
 
(4 intermediate revisions by the same user not shown)
Line 41: Line 41:
Searching <code>intitle:"index of /" inurl:"ftp"</code> on Google gives millions of results.
Searching <code>intitle:"index of /" inurl:"ftp"</code> on Google gives millions of results.


*[ftp://ftp.3drealms.com/ ftp://ftp.3drealms.com/] - 3D Realms
*[ftp://ftp.adobe.com/ ftp://ftp.adobe.com/] - Adobe
*[ftp://ftp.adobe.com/ ftp://ftp.adobe.com/] - Adobe
*[ftp://ftp.amanda.org/ ftp://ftp.amanda.org/] - Amanda Network Backup
*[ftp://ftp.amanda.org/ ftp://ftp.amanda.org/] - Amanda Network Backup
*[http://staticky.com/mirrors/ftp.apple.com/developer/ http://staticky.com/mirrors/ftp.apple.com/developer/] - Apple's former developer FTP (mirror)
*[http://staticky.com/mirrors/ftp.apple.com/developer/ http://staticky.com/mirrors/ftp.apple.com/developer/] - Apple's former developer FTP (mirror)
*[ftp://ftp.atari.com/ ftp://ftp.atari.com/] - Atari
*[http://ftp.blizzard.com/pub/ http://ftp.blizzard.com/pub/] - Blizzard (only works through HTTP)
*[http://ftp.blizzard.com/pub/ http://ftp.blizzard.com/pub/] - Blizzard (only works through HTTP)
*[ftp://ftp.mrunix.net/ ftp://ftp.mrunix.net/] - Borg: The Collective
*[ftp://ftp.mrunix.net/ ftp://ftp.mrunix.net/] - Borg: The Collective
Line 90: Line 92:


== blah blah blah ignore ==
== blah blah blah ignore ==


=== Items ===
=== Items ===
Line 103: Line 104:
* TODO: Scrape the Wayback Machine
* TODO: Scrape the Wayback Machine
* TODO: Scrape URLTeam dumps
* TODO: Scrape URLTeam dumps
* TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)

Latest revision as of 06:52, 29 November 2015

I like preserving the web.

I also go by Start+Select and Pressstart.

Archives

Website Crawls

Public HTTP/FTP Server List

Searching intitle:"index of /" inurl:"ftp" on Google gives millions of results.

blah blah blah ignore

Items

  • TODO: Scrape Google
  • TODO: Scrape Bing
  • TODO: Scrape DuckDuckGo
  • TODO: Scrape Twitter
  • TODO: Scrape Reddit
  • TODO: Scrape links from MediaWiki wikis
  • TODO: Scrape the Open Directory Project
  • TODO: Scrape the Common Crawl Index
  • TODO: Scrape the Wayback Machine
  • TODO: Scrape URLTeam dumps
  • TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)