Difference between revisions of "User:Start"

From Archiveteam
Jump to navigation Jump to search
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
I like preserving the web.
I like preserving the web.
I also go by Start+Select and Pressstart.


==Archives==
==Archives==
Line 5: Line 7:
*[https://archive.org/details/safeway.ca-panicgrab-20140707 safeway.ca]
*[https://archive.org/details/safeway.ca-panicgrab-20140707 safeway.ca]
*[https://archive.org/details/emulation-zone-archive Emulation Zone]
*[https://archive.org/details/emulation-zone-archive Emulation Zone]
*[https://archive.org/details/www.battleforthenet.com-panicgrab-20140718 Battle for the Net]
*Battle for the Net ([https://archive.org/details/www.battleforthenet.com-panicgrab-20140718 July 18, 2014], [https://archive.org/details/www.battleforthenet.com-panicgrab-20140912 September 12, 2014])
*[https://archive.org/details/theopeninter.net-panicgrab-20140718 The Open Internet]
*[https://archive.org/details/theopeninter.net-panicgrab-20140718 The Open Internet]
*[https://archive.org/details/startupsfornetneutrality.org-panicgrab-20140718 Startups for Net Neutrality]
*[https://archive.org/details/startupsfornetneutrality.org-panicgrab-20140718 Startups for Net Neutrality]
Line 20: Line 22:
*<nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link) - The Original Cutting Room Floor
*<nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link) - The Original Cutting Room Floor
*[https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808 TPPX logs]
*[https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808 TPPX logs]
*[https://archive.org/details/nintendo-warcs Misc. Nintendo sites]
*[https://archive.org/details/mojang.com-notch-panicgrab-20140912 mojang.com/notch]
*[https://archive.org/details/legowracers.4t2portfolio.co.uk-panicgrab-20141007 legowracers.4t2portfolio.co.uk]


==Website Crawls==
==Website Crawls==
Line 26: Line 31:
**[http://paste.archivingyoursh.it/vosoqudavo.avrasm google crawl]
**[http://paste.archivingyoursh.it/vosoqudavo.avrasm google crawl]
**[http://paste.archivingyoursh.it/dagacapovu.avrasm combined crawl]
**[http://paste.archivingyoursh.it/dagacapovu.avrasm combined crawl]
*[[Easel]]
**[http://paste.archivingyoursh.it/lojasegeke.avrasm bing crawl]
**[http://paste.archivingyoursh.it/warisukoka.avrasm google crawl]
**[http://paste.archivingyoursh.it/xitoxufuki.avrasm combined crawl]


==Public HTTP/FTP Server List==
==Public HTTP/FTP Server List==
Line 31: Line 41:
Searching <code>intitle:"index of /" inurl:"ftp"</code> on Google gives millions of results.
Searching <code>intitle:"index of /" inurl:"ftp"</code> on Google gives millions of results.


*[ftp://ftp.3drealms.com/ ftp://ftp.3drealms.com/] - 3D Realms
*[ftp://ftp.adobe.com/ ftp://ftp.adobe.com/] - Adobe
*[ftp://ftp.adobe.com/ ftp://ftp.adobe.com/] - Adobe
*[ftp://ftp.amanda.org/ ftp://ftp.amanda.org/] - Amanda Network Backup
*[ftp://ftp.amanda.org/ ftp://ftp.amanda.org/] - Amanda Network Backup
*[http://staticky.com/mirrors/ftp.apple.com/developer/ http://staticky.com/mirrors/ftp.apple.com/developer/] - Apple's former developer FTP (mirror)
*[http://staticky.com/mirrors/ftp.apple.com/developer/ http://staticky.com/mirrors/ftp.apple.com/developer/] - Apple's former developer FTP (mirror)
*[ftp://ftp.atari.com/ ftp://ftp.atari.com/] - Atari
*[http://ftp.blizzard.com/pub/ http://ftp.blizzard.com/pub/] - Blizzard (only works through HTTP)
*[http://ftp.blizzard.com/pub/ http://ftp.blizzard.com/pub/] - Blizzard (only works through HTTP)
*[ftp://ftp.mrunix.net/ ftp://ftp.mrunix.net/] - Borg: The Collective
*[ftp://ftp.mrunix.net/ ftp://ftp.mrunix.net/] - Borg: The Collective
Line 53: Line 65:
***[ftp://ftp.research.microsoft.com/downloads ftp://ftp.research.microsoft.com/downloads] - hidden directory
***[ftp://ftp.research.microsoft.com/downloads ftp://ftp.research.microsoft.com/downloads] - hidden directory
*[http://assets.minecraft.net/ http://assets.minecraft.net/] - Minecraft (no longer used)
*[http://assets.minecraft.net/ http://assets.minecraft.net/] - Minecraft (no longer used)
*[http://releases.mozilla.org/pub/mozilla.org/ http://releases.mozilla.org/pub/mozilla.org/] - Mozilla
*[ftp://ftp.mozilla.org/] - Mozilla
**[http://releases.mozilla.org/pub/mozilla.org/ http://releases.mozilla.org/pub/mozilla.org/]
**[http://download.cdn.mozilla.net/pub/ http://download.cdn.mozilla.net/pub/] - Mozilla (older software)
**[http://download.cdn.mozilla.net/pub/ http://download.cdn.mozilla.net/pub/] - Mozilla (older software)
*[ftp://ftp.ncftp.com/ ftp://ftp.ncftp.com/] - NcFTP
*[ftp://ftp.ncftp.com/ ftp://ftp.ncftp.com/] - NcFTP
Line 77: Line 90:
*[ftp://ftp.westwood.com/ ftp://ftp.westwood.com/] - Westwood
*[ftp://ftp.westwood.com/ ftp://ftp.westwood.com/] - Westwood
*[http://wdl2.winworldpc.com http://wdl2.winworldpc.com] - WinWorld
*[http://wdl2.winworldpc.com http://wdl2.winworldpc.com] - WinWorld
== blah blah blah ignore ==
=== Items ===
* TODO: Scrape Google
* TODO: Scrape Bing
* TODO: Scrape DuckDuckGo
* TODO: Scrape Twitter
* TODO: Scrape Reddit
* TODO: Scrape links from MediaWiki wikis
* TODO: Scrape the Open Directory Project
* TODO: Scrape the Common Crawl Index
* TODO: Scrape the Wayback Machine
* TODO: Scrape URLTeam dumps
* TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)

Latest revision as of 06:52, 29 November 2015

I like preserving the web.

I also go by Start+Select and Pressstart.

Archives

Website Crawls

Public HTTP/FTP Server List

Searching intitle:"index of /" inurl:"ftp" on Google gives millions of results.

blah blah blah ignore

Items

  • TODO: Scrape Google
  • TODO: Scrape Bing
  • TODO: Scrape DuckDuckGo
  • TODO: Scrape Twitter
  • TODO: Scrape Reddit
  • TODO: Scrape links from MediaWiki wikis
  • TODO: Scrape the Open Directory Project
  • TODO: Scrape the Common Crawl Index
  • TODO: Scrape the Wayback Machine
  • TODO: Scrape URLTeam dumps
  • TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)