Difference between revisions of "User:Start"
Jump to navigation
Jump to search
(29 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
I like preserving the web. | I like preserving the web. | ||
I also go by Start+Select and Pressstart. | |||
== | ==Archives== | ||
* | *[https://archive.org/details/foxytunes.com-panicgrab-20130704 FoxyTunes] | ||
** | *[https://archive.org/details/safeway.ca-panicgrab-20140707 safeway.ca] | ||
* | *[https://archive.org/details/emulation-zone-archive Emulation Zone] | ||
*** | *Battle for the Net ([https://archive.org/details/www.battleforthenet.com-panicgrab-20140718 July 18, 2014], [https://archive.org/details/www.battleforthenet.com-panicgrab-20140912 September 12, 2014]) | ||
* | *[https://archive.org/details/theopeninter.net-panicgrab-20140718 The Open Internet] | ||
*** | *[https://archive.org/details/startupsfornetneutrality.org-panicgrab-20140718 Startups for Net Neutrality] | ||
** | *[https://archive.org/details/net.net-panicgrab-20140718 net.net] | ||
** | *[https://archive.org/details/wwdctimer.com-panicgrab-20140731 WWDC Timer] | ||
*[https://archive.org/details/xn--19g.com-panicgrab-20140731 Option V Mac] | |||
*[https://archive.org/details/chromercise.com-panicgrab-20140731 Chromercise] | |||
*[https://archive.org/details/hiddenfromgoogle.com-panicgrab-20140731 Hidden From Google] | |||
*[https://archive.org/details/orteil.dashnet.org-panicgrab-20140731 orteil.dashnet.org] | |||
*[https://archive.org/details/pingus.seul.org-panicgrab-20140731 Pingus] | |||
*[https://archive.org/details/tux4kids.alioth.debian.org-panicgrab-20140731 Tux4Kids] | |||
*[https://archive.org/details/tuxkart.sourceforge.net-panicgrab-20140731 TuxKart] | |||
*[https://archive.org/details/assets.minecraft.net-panicgrab-20140807 Minecraft Assets Server] | |||
*<nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link) - The Original Cutting Room Floor | |||
*[https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808 TPPX logs] | |||
*[https://archive.org/details/nintendo-warcs Misc. Nintendo sites] | |||
*[https://archive.org/details/mojang.com-notch-panicgrab-20140912 mojang.com/notch] | |||
*[https://archive.org/details/legowracers.4t2portfolio.co.uk-panicgrab-20141007 legowracers.4t2portfolio.co.uk] | |||
== | ==Website Crawls== | ||
*cache.lego.com | |||
**[http://paste.archivingyoursh.it/fawofacari.avrasm bing crawl] | |||
**[http://paste.archivingyoursh.it/vosoqudavo.avrasm google crawl] | |||
**[http://paste.archivingyoursh.it/dagacapovu.avrasm combined crawl] | |||
*[[Easel]] | |||
**[http://paste.archivingyoursh.it/lojasegeke.avrasm bing crawl] | |||
**[http://paste.archivingyoursh.it/warisukoka.avrasm google crawl] | |||
**[http://paste.archivingyoursh.it/xitoxufuki.avrasm combined crawl] | |||
==Public HTTP/FTP Server List== | |||
Searching <code>intitle:"index of /" inurl:"ftp"</code> on Google gives millions of results. | |||
*[ftp://ftp.3drealms.com/ ftp://ftp.3drealms.com/] - 3D Realms | |||
*[ftp://ftp.adobe.com/ ftp://ftp.adobe.com/] - Adobe | |||
*[ftp://ftp.amanda.org/ ftp://ftp.amanda.org/] - Amanda Network Backup | |||
*[http://staticky.com/mirrors/ftp.apple.com/developer/ http://staticky.com/mirrors/ftp.apple.com/developer/] - Apple's former developer FTP (mirror) | |||
*[ftp://ftp.atari.com/ ftp://ftp.atari.com/] - Atari | |||
*[http://ftp.blizzard.com/pub/ http://ftp.blizzard.com/pub/] - Blizzard (only works through HTTP) | |||
*[ftp://ftp.mrunix.net/ ftp://ftp.mrunix.net/] - Borg: The Collective | |||
*[http://media.codeweavers.com/ http://media.codeweavers.com/] - CodeWeavers | |||
*[ftp://ftp.debian.org/ ftp://ftp.debian.org/] - Debian | |||
*[ftp://ftp.eggheads.org/ ftp://ftp.eggheads.org/] - EggDrop | |||
*[ftp://ftp.ea.com/ ftp://ftp.ea.com/] - Electronic Arts | |||
**[http://largedownloads.ea.com http://largedownloads.ea.com] - Electronic Arts (large downloads) | |||
*[ftp://ftp.gnu.org/ ftp://ftp.gnu.org/] - GNU | |||
*[ftp://ftp.gnus.org/ ftp://ftp.gnus.org/] - GNUS | |||
*[ftp://ftp.software.ibm.com/ ftp://ftp.software.ibm.com/] - IBM | |||
*[ftp://ftp.idsoftware.com/ ftp://ftp.idsoftware.com/] - iD Software | |||
*[ftp://ftp.isc.org/ ftp://ftp.isc.org/] - Internet Systems Consortium | |||
*[ftp://ftp.kochmedia.com/ ftp://ftp.kochmedia.com/] - Koch Media | |||
*[ftp://ftp.kernel.org/ ftp://ftp.kernel.org/] - Linux Kernel Archives | |||
*[ftp://ftp.lyx.org/ ftp://ftp.lyx.org/] - LyX | |||
*[ftp://ftp.microsoft.com/ ftp://ftp.microsoft.com/] - Microsoft (sometimes up, sometimes down) | |||
**[ftp://ftp.research.microsoft.com/ ftp://ftp.research.microsoft.com/] - Microsoft Research | |||
***[ftp://ftp.research.microsoft.com/downloads ftp://ftp.research.microsoft.com/downloads] - hidden directory | |||
*[http://assets.minecraft.net/ http://assets.minecraft.net/] - Minecraft (no longer used) | |||
*[ftp://ftp.mozilla.org/] - Mozilla | |||
**[http://releases.mozilla.org/pub/mozilla.org/ http://releases.mozilla.org/pub/mozilla.org/] | |||
**[http://download.cdn.mozilla.net/pub/ http://download.cdn.mozilla.net/pub/] - Mozilla (older software) | |||
*[ftp://ftp.ncftp.com/ ftp://ftp.ncftp.com/] - NcFTP | |||
*[ftp://ftp.netscape.com/ ftp://ftp.netscape.com/] - Netscape | |||
*[ftp://ftp.oldskool.org/ ftp://ftp.oldskool.org/] - Oldskool PC Network | |||
*[ftp://ftp.opera.com/pub/ ftp://ftp.opera.com/pub/] - Opera | |||
**[http://get.geo.opera.com/ http://get.geo.opera.com/] - Opera (alt) | |||
*[ftp://pingus.seul.org ftp://pingus.seul.org] - Pingus | |||
*[ftp://ftp.pgpi.com/ ftp://ftp.pgpi.com/] - PGP | |||
*[ftp://ftp.iso.pld-linux.org/ ftp://ftp.iso.pld-linux.org/] - PLD Linux | |||
*[ftp://ftp.povray.org/ ftp://ftp.povray.org/] - POV-Ray | |||
*[ftp://ftp.sangoma.com/ ftp://ftp.sangoma.com/] - Sangoma | |||
*[ftp://ftp.scriptics.com/ ftp://ftp.scriptics.com/] - Scriptics | |||
*[ftp://ftp.slackware.com/ ftp://ftp.slackware.com/] - Slackware Linux | |||
*[http://download.sonymediasoftware.com/ http://download.sonymediasoftware.com/] - Sony Creative Software | |||
*[ftp://ftp.sunet.se/ ftp://ftp.sunet.se/] - Sunet | |||
*[ftp://ftp.suse.com/ ftp://ftp.suse.com/] - SUSE Linux | |||
*[ftp://ftp.ubisoft.com/ ftp://ftp.ubisoft.com/] - Ubisoft | |||
**[ftp://ftp.bluebyte.com/ ftp://ftp.bluebyte.com/] - Ubisoft Blue Byte | |||
*[http://releases.ubuntu.com/ http://releases.ubuntu.com/] - Ubuntu | |||
**[http://cdimage.ubuntu.com/ http://cdimage.ubuntu.com/] - "Unsupported Ubuntu Images" | |||
*[ftp://ftp.snt.utwente.nl/ ftp://ftp.snt.utwente.nl/] - University of Twente | |||
*[ftp://ftp.westwood.com/ ftp://ftp.westwood.com/] - Westwood | |||
*[http://wdl2.winworldpc.com http://wdl2.winworldpc.com] - WinWorld | |||
== blah blah blah ignore == | |||
=== Items === | |||
* TODO: Scrape Google | |||
* TODO: Scrape Bing | |||
* TODO: Scrape DuckDuckGo | |||
* TODO: Scrape Twitter | |||
* TODO: Scrape Reddit | |||
* TODO: Scrape links from MediaWiki wikis | |||
* TODO: Scrape the Open Directory Project | |||
* TODO: Scrape the Common Crawl Index | |||
* TODO: Scrape the Wayback Machine | |||
* TODO: Scrape URLTeam dumps | |||
* TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable) |
Latest revision as of 06:52, 29 November 2015
I like preserving the web.
I also go by Start+Select and Pressstart.
Archives
- FoxyTunes
- safeway.ca
- Emulation Zone
- Battle for the Net (July 18, 2014, September 12, 2014)
- The Open Internet
- Startups for Net Neutrality
- net.net
- WWDC Timer
- Option V Mac
- Chromercise
- Hidden From Google
- orteil.dashnet.org
- Pingus
- Tux4Kids
- TuxKart
- Minecraft Assets Server
- https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808 (remove asterisk, spam filter doesn't like this link) - The Original Cutting Room Floor
- TPPX logs
- Misc. Nintendo sites
- mojang.com/notch
- legowracers.4t2portfolio.co.uk
Website Crawls
- cache.lego.com
Public HTTP/FTP Server List
Searching intitle:"index of /" inurl:"ftp"
on Google gives millions of results.
- ftp://ftp.3drealms.com/ - 3D Realms
- ftp://ftp.adobe.com/ - Adobe
- ftp://ftp.amanda.org/ - Amanda Network Backup
- http://staticky.com/mirrors/ftp.apple.com/developer/ - Apple's former developer FTP (mirror)
- ftp://ftp.atari.com/ - Atari
- http://ftp.blizzard.com/pub/ - Blizzard (only works through HTTP)
- ftp://ftp.mrunix.net/ - Borg: The Collective
- http://media.codeweavers.com/ - CodeWeavers
- ftp://ftp.debian.org/ - Debian
- ftp://ftp.eggheads.org/ - EggDrop
- ftp://ftp.ea.com/ - Electronic Arts
- http://largedownloads.ea.com - Electronic Arts (large downloads)
- ftp://ftp.gnu.org/ - GNU
- ftp://ftp.gnus.org/ - GNUS
- ftp://ftp.software.ibm.com/ - IBM
- ftp://ftp.idsoftware.com/ - iD Software
- ftp://ftp.isc.org/ - Internet Systems Consortium
- ftp://ftp.kochmedia.com/ - Koch Media
- ftp://ftp.kernel.org/ - Linux Kernel Archives
- ftp://ftp.lyx.org/ - LyX
- ftp://ftp.microsoft.com/ - Microsoft (sometimes up, sometimes down)
- ftp://ftp.research.microsoft.com/ - Microsoft Research
- ftp://ftp.research.microsoft.com/downloads - hidden directory
- ftp://ftp.research.microsoft.com/ - Microsoft Research
- http://assets.minecraft.net/ - Minecraft (no longer used)
- [1] - Mozilla
- http://releases.mozilla.org/pub/mozilla.org/
- http://download.cdn.mozilla.net/pub/ - Mozilla (older software)
- ftp://ftp.ncftp.com/ - NcFTP
- ftp://ftp.netscape.com/ - Netscape
- ftp://ftp.oldskool.org/ - Oldskool PC Network
- ftp://ftp.opera.com/pub/ - Opera
- http://get.geo.opera.com/ - Opera (alt)
- ftp://pingus.seul.org - Pingus
- ftp://ftp.pgpi.com/ - PGP
- ftp://ftp.iso.pld-linux.org/ - PLD Linux
- ftp://ftp.povray.org/ - POV-Ray
- ftp://ftp.sangoma.com/ - Sangoma
- ftp://ftp.scriptics.com/ - Scriptics
- ftp://ftp.slackware.com/ - Slackware Linux
- http://download.sonymediasoftware.com/ - Sony Creative Software
- ftp://ftp.sunet.se/ - Sunet
- ftp://ftp.suse.com/ - SUSE Linux
- ftp://ftp.ubisoft.com/ - Ubisoft
- ftp://ftp.bluebyte.com/ - Ubisoft Blue Byte
- http://releases.ubuntu.com/ - Ubuntu
- http://cdimage.ubuntu.com/ - "Unsupported Ubuntu Images"
- ftp://ftp.snt.utwente.nl/ - University of Twente
- ftp://ftp.westwood.com/ - Westwood
- http://wdl2.winworldpc.com - WinWorld
blah blah blah ignore
Items
- TODO: Scrape Google
- TODO: Scrape Bing
- TODO: Scrape DuckDuckGo
- TODO: Scrape Twitter
- TODO: Scrape Reddit
- TODO: Scrape links from MediaWiki wikis
- TODO: Scrape the Open Directory Project
- TODO: Scrape the Common Crawl Index
- TODO: Scrape the Wayback Machine
- TODO: Scrape URLTeam dumps
- TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)