Difference between revisions of "User:Start"

From Archiveteam
Jump to navigation Jump to search
 
(35 intermediate revisions by the same user not shown)
Line 1: Line 1:
I like preserving the web.
I like preserving the web.


==Completed Projects==
I also go by Start+Select and Pressstart.


==Current Projects==
==Archives==
* Backup various FTP/HTTP servers
*[https://archive.org/details/foxytunes.com-panicgrab-20130704 FoxyTunes]
** Started:
*[https://archive.org/details/safeway.ca-panicgrab-20140707 safeway.ca]
*** Minecraft Assets Server
*[https://archive.org/details/emulation-zone-archive Emulation Zone]
** Not Started:
*Battle for the Net ([https://archive.org/details/www.battleforthenet.com-panicgrab-20140718 July 18, 2014], [https://archive.org/details/www.battleforthenet.com-panicgrab-20140912 September 12, 2014])
*** Microsoft FTP Server
*[https://archive.org/details/theopeninter.net-panicgrab-20140718 The Open Internet]
*** others will be (eventually) be put here
*[https://archive.org/details/startupsfornetneutrality.org-panicgrab-20140718 Startups for Net Neutrality]
* Figure out how to backup data from [[IFTTT]] and [[Codecademy]].
*[https://archive.org/details/net.net-panicgrab-20140718 net.net]
** '''IFTTT'''
*[https://archive.org/details/wwdctimer.com-panicgrab-20140731 WWDC Timer]
*** recipes are stored at <nowiki>http://ifttt.com/recipes/{recipe id}</nowiki>
*[https://archive.org/details/xn--19g.com-panicgrab-20140731 Option V Mac]
*** your personal recipes are stored at <nowiki>https://ifttt.com/myrecipes/personal/{recipe id}</nowiki>
*[https://archive.org/details/chromercise.com-panicgrab-20140731 Chromercise]
*[https://archive.org/details/hiddenfromgoogle.com-panicgrab-20140731 Hidden From Google]
*[https://archive.org/details/orteil.dashnet.org-panicgrab-20140731 orteil.dashnet.org]
*[https://archive.org/details/pingus.seul.org-panicgrab-20140731 Pingus]
*[https://archive.org/details/tux4kids.alioth.debian.org-panicgrab-20140731 Tux4Kids]
*[https://archive.org/details/tuxkart.sourceforge.net-panicgrab-20140731 TuxKart]
*[https://archive.org/details/assets.minecraft.net-panicgrab-20140807 Minecraft Assets Server]
*<nowiki>https://archive.org/details/bmf.*rustedmagick.com-cr-panicgrab-20140808</nowiki> (remove asterisk, spam filter doesn't like this link) - The Original Cutting Room Floor
*[https://archive.org/details/tppx.herokuapp.com-panicgrab-20140808 TPPX logs]
*[https://archive.org/details/nintendo-warcs Misc. Nintendo sites]
*[https://archive.org/details/mojang.com-notch-panicgrab-20140912 mojang.com/notch]
*[https://archive.org/details/legowracers.4t2portfolio.co.uk-panicgrab-20141007 legowracers.4t2portfolio.co.uk]


==Future Projects==
==Website Crawls==
*cache.lego.com
**[http://paste.archivingyoursh.it/fawofacari.avrasm bing crawl]
**[http://paste.archivingyoursh.it/vosoqudavo.avrasm google crawl]
**[http://paste.archivingyoursh.it/dagacapovu.avrasm combined crawl]
 
*[[Easel]]
**[http://paste.archivingyoursh.it/lojasegeke.avrasm bing crawl]
**[http://paste.archivingyoursh.it/warisukoka.avrasm google crawl]
**[http://paste.archivingyoursh.it/xitoxufuki.avrasm combined crawl]
 
==Public HTTP/FTP Server List==
 
Searching <code>intitle:"index of /" inurl:"ftp"</code> on Google gives millions of results.
 
*[ftp://ftp.3drealms.com/ ftp://ftp.3drealms.com/] - 3D Realms
*[ftp://ftp.adobe.com/ ftp://ftp.adobe.com/] - Adobe
*[ftp://ftp.amanda.org/ ftp://ftp.amanda.org/] - Amanda Network Backup
*[http://staticky.com/mirrors/ftp.apple.com/developer/ http://staticky.com/mirrors/ftp.apple.com/developer/] - Apple's former developer FTP (mirror)
*[ftp://ftp.atari.com/ ftp://ftp.atari.com/] - Atari
*[http://ftp.blizzard.com/pub/ http://ftp.blizzard.com/pub/] - Blizzard (only works through HTTP)
*[ftp://ftp.mrunix.net/ ftp://ftp.mrunix.net/] - Borg: The Collective
*[http://media.codeweavers.com/ http://media.codeweavers.com/] - CodeWeavers
*[ftp://ftp.debian.org/ ftp://ftp.debian.org/] - Debian
*[ftp://ftp.eggheads.org/ ftp://ftp.eggheads.org/] - EggDrop
*[ftp://ftp.ea.com/ ftp://ftp.ea.com/] - Electronic Arts
**[http://largedownloads.ea.com http://largedownloads.ea.com] - Electronic Arts (large downloads)
*[ftp://ftp.gnu.org/ ftp://ftp.gnu.org/] - GNU
*[ftp://ftp.gnus.org/ ftp://ftp.gnus.org/] - GNUS
*[ftp://ftp.software.ibm.com/ ftp://ftp.software.ibm.com/] - IBM
*[ftp://ftp.idsoftware.com/ ftp://ftp.idsoftware.com/] - iD Software
*[ftp://ftp.isc.org/ ftp://ftp.isc.org/] - Internet Systems Consortium
*[ftp://ftp.kochmedia.com/ ftp://ftp.kochmedia.com/] - Koch Media
*[ftp://ftp.kernel.org/ ftp://ftp.kernel.org/] - Linux Kernel Archives
*[ftp://ftp.lyx.org/ ftp://ftp.lyx.org/] - LyX
*[ftp://ftp.microsoft.com/ ftp://ftp.microsoft.com/] - Microsoft (sometimes up, sometimes down)
**[ftp://ftp.research.microsoft.com/ ftp://ftp.research.microsoft.com/] - Microsoft Research
***[ftp://ftp.research.microsoft.com/downloads ftp://ftp.research.microsoft.com/downloads] - hidden directory
*[http://assets.minecraft.net/ http://assets.minecraft.net/] - Minecraft (no longer used)
*[ftp://ftp.mozilla.org/] - Mozilla
**[http://releases.mozilla.org/pub/mozilla.org/ http://releases.mozilla.org/pub/mozilla.org/]
**[http://download.cdn.mozilla.net/pub/ http://download.cdn.mozilla.net/pub/] - Mozilla (older software)
*[ftp://ftp.ncftp.com/ ftp://ftp.ncftp.com/] - NcFTP
*[ftp://ftp.netscape.com/ ftp://ftp.netscape.com/] - Netscape
*[ftp://ftp.oldskool.org/ ftp://ftp.oldskool.org/] - Oldskool PC Network
*[ftp://ftp.opera.com/pub/ ftp://ftp.opera.com/pub/] - Opera
**[http://get.geo.opera.com/ http://get.geo.opera.com/] - Opera (alt)
*[ftp://pingus.seul.org ftp://pingus.seul.org] - Pingus
*[ftp://ftp.pgpi.com/ ftp://ftp.pgpi.com/] - PGP
*[ftp://ftp.iso.pld-linux.org/ ftp://ftp.iso.pld-linux.org/] - PLD Linux
*[ftp://ftp.povray.org/ ftp://ftp.povray.org/] - POV-Ray
*[ftp://ftp.sangoma.com/ ftp://ftp.sangoma.com/] - Sangoma
*[ftp://ftp.scriptics.com/ ftp://ftp.scriptics.com/] - Scriptics
*[ftp://ftp.slackware.com/ ftp://ftp.slackware.com/] - Slackware Linux
*[http://download.sonymediasoftware.com/ http://download.sonymediasoftware.com/] - Sony Creative Software
*[ftp://ftp.sunet.se/ ftp://ftp.sunet.se/] - Sunet
*[ftp://ftp.suse.com/ ftp://ftp.suse.com/] - SUSE Linux
*[ftp://ftp.ubisoft.com/ ftp://ftp.ubisoft.com/] - Ubisoft
**[ftp://ftp.bluebyte.com/ ftp://ftp.bluebyte.com/] - Ubisoft Blue Byte
*[http://releases.ubuntu.com/ http://releases.ubuntu.com/] - Ubuntu
**[http://cdimage.ubuntu.com/ http://cdimage.ubuntu.com/] - "Unsupported Ubuntu Images"
*[ftp://ftp.snt.utwente.nl/ ftp://ftp.snt.utwente.nl/] - University of Twente
*[ftp://ftp.westwood.com/ ftp://ftp.westwood.com/] - Westwood
*[http://wdl2.winworldpc.com http://wdl2.winworldpc.com] - WinWorld
 
== blah blah blah ignore ==
 
=== Items ===
* TODO: Scrape Google
* TODO: Scrape Bing
* TODO: Scrape DuckDuckGo
* TODO: Scrape Twitter
* TODO: Scrape Reddit
* TODO: Scrape links from MediaWiki wikis
* TODO: Scrape the Open Directory Project
* TODO: Scrape the Common Crawl Index
* TODO: Scrape the Wayback Machine
* TODO: Scrape URLTeam dumps
* TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)

Latest revision as of 06:52, 29 November 2015

I like preserving the web.

I also go by Start+Select and Pressstart.

Archives

Website Crawls

Public HTTP/FTP Server List

Searching intitle:"index of /" inurl:"ftp" on Google gives millions of results.

blah blah blah ignore

Items

  • TODO: Scrape Google
  • TODO: Scrape Bing
  • TODO: Scrape DuckDuckGo
  • TODO: Scrape Twitter
  • TODO: Scrape Reddit
  • TODO: Scrape links from MediaWiki wikis
  • TODO: Scrape the Open Directory Project
  • TODO: Scrape the Common Crawl Index
  • TODO: Scrape the Wayback Machine
  • TODO: Scrape URLTeam dumps
  • TODO: Scrape a list of subdomains from DNSdumpster.com (if applicable)