Difference between revisions of "Projects"

From Archiveteam
Jump to navigation Jump to search
m (Reverted edits by Megalanya2 (talk) to last revision by Jscott)
 
(32 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{Projects status}}
{{Projects status}}
Here's where Archive Teamsters can list the '''projects''' they are currently working on and organize new projects.


= Projects =
This page should contain, or directly link to, almost all ArchiveTeam archiving endavours, categorized.
Our [[Current Projects]] page.
* '''[[#Current projects|Current projects]]''': currently active, upcoming and recently finished grandiose ArchiveTeam projects.  (Extract of the next two categories.)
:''See also: [[:Category:In progress]].''
* '''[[#Warrior projects|Warrior projects]]''': projects that utilize(d) ArchiveTeam's distributed archiving system.
* '''[[#Manual projects 2|Manual projects]]''' that need(ed) much more effort than just pushing a button.
* '''[[#Small projects|Small projects]]''': small-scale website archiving projects usually done by a single individual.
* '''[[#Early projects|Early projects]]''': first archiving endavours on the dawn of ArchiveTeam, in a format nobody is apparently able/dare to touch.


== ArchiveTeam Warrior ==
(The box on the top counts projects having dedicated wiki pages, those numbers aren't complete and far don't contain all projects mentioned in the sections below.)
The [[ArchiveTeam Warrior]] is a virtual machine that will allow you to lend a hand on large archiving projects whenever they come up.


== ArchiveBot ==
If you know of a website in danger, let us know that on [[IRC]]. If it's a larger site, please also mention it on the '''[[Deathwatch]]''' page. And, after a decision is made on IRC, or if it doesn't need a decision, then, to help things kept documented and up to date, you are encouraged to add projects, or modify their status
* in the appropriate section(s),
* on the project's dedicated wiki page (if any),
* on [[Deathwatch]] and/or on [[Alive... OR ARE THEY]].


[[ArchiveBot]] is an IRC bot that automates archiving for smaller sites.
The box on the top is generated automatically from projects' dedicated wiki pages, so shouldn't be touched.


== Websites at risk ==
'''Important:''' Contents of sections below are '''embedded''' from other pages, that is, don't edit the section, nor this page, but use the "'''Edit this list'''" link! (That opens the corresponding page for editing, and after editing, you'll be forwarded to the page containing only that list: don't worry, you didn't delete the others.)
=== High Risk ===
{| class="wikitable"
! Website
! Closing date
! Project status
! User
! Archiving Status
! Details
! Archives
! Archive Date
! Archive Format
|-
| rowspan="3" | [[WinAmp]] [http://www.winamp.com/] [http://dev.winamp.com/] [http://forums.winamp.com/] [http://blog.winamp.com/] || rowspan="3" | 2013-12-20 || rowspan="3" | Closing || [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Download full website and other domains ||  ||  || .warc.gz
|-
| [[Archivebot]] || <span style="color:green">Saved</span> || Downloaded  website, dev and blog subdomains ||  ||  || .warc.gz
|-
| Various || <span style="color:green">Saved</span> || Downloaded website, forums, skins/plugins || [https://archive.org/search.php?query=winamp+warc] || 2013-11 || .warc.gz
|-
| [[jajah]] [http://jajah.com/] || 2014-01-31 || Closing || [[User:Arkiver]] || <span style="color:green">Saved</span> || Download full website || COMING || 2013-12-15 - 2013-12-16 || .warc.gz
|-
| [[widgetbox]] [http://www.widgetbox.com/] [http://support.widgetbox.com/] [http://blog.widgetbox.com/] [http://cdn.widgetbox.com/] [http://help.widgetbox.com/] [http://pub.widgetbox.com/] [http://files.widgetbox.com/] || 2014-03-28 || Closing || [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Downloading all the websites ||  || 2013-12-19 - present || .warc.gz
|-
| rowspan="2" | [[ptch]] [http://ptch.com/] || rowspan="2" | 2014-01-02 || rowspan="2" | Closing || [[Archiveteam]] || <span style="color:orange">In progress...</span> || Only downloading the accounts and the things from the accounts ||  ||  || .warc.gz
|-
| [[User:Arkiver]] || <span style="color:green">Saved</span> || Downloading only main website, blog and help website || COMING || 2013-12-14 || .warc.gz
|-
| [[Quick.io]] [http://www.quik.io/] || 2013-12-31 || Closing || [[User:Arkiver]] || <span style="color:green">Saved</span> || Downloaded the main website and the subdomains of the mainwebsite || COMING || 2013-12-13 || .warc.gz
|-
| [[AOL Music]] [http://music.aol.com/] || 2013-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| rowspan="2" | [[Fileplanet]] [http://www.fileplanet.com/] || rowspan="2" | 2013-??-?? || rowspan="2" | Closing || [[Archiveteam]] || <span style="color:green">Saved</span> || Saved only files 00000-229999 || [[Fileplanet#Status]]  || 2012-05-08 - 2012-07-06 || .tar
|-
| [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Downloading full website ||  ||  || .warc.gz
|-
| [[Google Video (Archive)]] [http://video.google.com/] || 2011-04-29 || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| [[1UP.com]] [http://www.1up.com/] || 201?-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| [[UGO]] [http://www.ugo.com/] || 201?-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| [[GameSpy]] [http://www.gamespy.com/] || 201?-??-?? || Closing || ? || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|-
| [[My Opera]] [http://my.opera.com/] || 2014-03-01 || Closing || [[User:Mithrandir]](?) || <span style="color:orange">In progress...</span> || Initial grab of files (6.2 GB) || [https://archive.org/details/files.myopera.com-initialgrab]  ||  || .warc.gz
|-
| [[TechNet]] [http://technet.microsoft.com/] || 201?-??-?? || Closing || [[User:Arkiver]] || <span style="color:orange">In progress...</span> || Downloading full website ||  ||  || .warc.gz
|-
| [[Warhammer Online: Age of Reckoning]] [http://www.warhammeronline.com/] || 2013-12-18 || Closing || [[User:Arkiver]] || <span style="color:green">Saved</span> || Downloading the full main website || COMING || 2013-12-04 - 2013-12-14 || .warc.gz
|-
| [[Wretch]] [http://www.wretch.cc/] || 2013-12-26 || Closing || [[Archiveteam]] || <span style="color:orange">In progress...</span> ||  ||  ||  ||
|}


=== Average Risk ===
= Current projects =
<div class="mw-collapsible" style="width:100%; background-color: #CCFFFF; border: 1px solid; padding: 5px">
Currently active team projects you can get involved in.
<!-- TO EDIT THE LIST, GO BACK AND CLICK "Edit this list". -->
<div class="mw-collapsible-content" style="width:100%">
'''<span class="plainlinks">[http://archiveteam.org/index.php?title=Current_Projects&action=edit Edit this list]</span>'''
{{:Current Projects}}
</div>
</div>


=== Low Risk ===
= Warrior projects =
<div class="mw-collapsible mw-collapsed" style="width:100%; background-color: #99FF99; border: 1px solid; padding: 5px">
ArchiveTeam's past, current and future Warrior projects with details, in a table form.
<!-- TO EDIT THE LIST, GO BACK AND CLICK "Edit this list". -->
<div class="mw-collapsible-content" style="width:100%">
'''<span class="plainlinks">[http://archiveteam.org/index.php?title=Warrior_projects&action=edit Edit this list]</span>'''
{{:Warrior projects}}
</div>
</div>


=== Important Websites ===
= Manual projects =
{| class="wikitable"
<div class="mw-collapsible mw-collapsed" style="width:100%; background-color: #CCFF99; border: 1px solid; padding: 5px">
! Website
Difficult, discussion-intensive, human-resource-intensive and audit projects.
! User
<!-- TO EDIT THE LIST, GO BACK AND CLICK "Edit this list". -->
! Archiving Status
<div class="mw-collapsible-content" style="width:100%">
! Details
'''<span class="plainlinks">[http://archiveteam.org/index.php?title=Manual_projects&action=edit Edit this list]</span>'''
! Archives
{{:Manual projects}}
! Archive Date
</div>
! Archive Format
</div>
|-
| [[Academic Earth]] [http://academicearth.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Codecademy]] [http://www.codecademy.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Delicious]] [https://delicious.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Facebook]] [https://www.facebook.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[FanFiction]] [https://www.fanfiction.net/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Google]] [https://www.google.nl/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[IFTTT]] [https://ifttt.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[infoAnarchy]] [http://www.infoanarchy.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Internet Archive]] [https://archive.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[last.fm]] [http://www.last.fm/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[LiveJournal]] [http://www.livejournal.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| rowspan="2" | [[pastebin]] [http://pastebin.com/] || [[User:Arkiver]] || Aborted || Archive power can better be used for other websites. || COMING || 2013-12-14 - 2013-12-17 || .warc.gz
|-
| [[User:joepie91]] || <span style="color:orange">In progress...</span> || Downloading newest pastes ||  ||  || .warc.gz
|-
| [[reddit]] [http://www.reddit.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[sourceforge]] [http://sourceforge.net/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[Twitter]] [https://twitter.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[WebCite]] [http://www.webcitation.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[the White House]] [http://www.whitehouse.gov/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[wikia]] [http://www.wikia.com/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[WikiLeaks]] [http://wikileaks.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|-
| [[WikipediA]] [http://www.wikipedia.org/] ||  || <span style="color:red">Not saved yet</span> ||  ||  ||  ||
|}


== Ideas for Projects ==
= Small projects =
:''See also [[Deathwatch]] and [[Alive... OR ARE THEY]].''
<div class="mw-collapsible mw-collapsed" style="width:100%; background-color: #FFCCFF; border: 1px solid; padding: 5px">
* Various Image Boards - not the short-lived 4chan clones but the more permanent ones like www.zerochan.net (as of today it has over 1.6 million images, all easily available like this: www.zerochan.net/1627488), Pixiv.net, minitokyo.net
List of smaller website rescuing projects, usually done by single individuals.
* JoshW's video game music archive (links on http://hcs64.com/mboard/forum.php?showthread=26929). Not a "large" site but many many gigs of 7zipped WAVs
<!-- TO EDIT THE LIST, GO BACK AND CLICK "Edit this list". -->
* Suggestion: An archive of .gif and .swf preloaders? [[User:Kuro|Kuro]] 19:49, 29 December 2009 (UTC)
<div class="mw-collapsible-content" style="width:100%">
**We can extract all the .gif files from the GeoCities archive and compare them using md5sum to discard dupes. [[User:Emijrp|Emijrp]] 19:58, 21 December 2010 (UTC)
'''<span class="plainlinks">[http://archiveteam.org/index.php?title=Small_projects&action=edit Edit this list]</span>'''
* '''Set up''' an FTP hub which AT members can access and up/down finished projects.
{{:Small projects}}
** Internet Archive? jason created a section for Archive Team http://www.archive.org/details/archiveteam [[User:Emijrp|Emijrp]] 19:34, 4 June 2011 (UTC)
</div>
</div>


* Track the 100+ top [[twitter]] feeds, as designated by one of these idiot Twitter grading sites, and back up on a regular basis the top twitter people, for posterity.
= Early projects =
* '''[http://www.groklaw.net/ Groklaw]''' has a [http://www.groklaw.net/article.php?story=20090105033126835 project proposal] that we could help with. - [[User:Jscott|Jason]]
<div class="mw-collapsible mw-collapsed" style="width:100%; background-color: lightgray; border: 1px solid; padding: 5px">
** Now that Groklaw is dead, a mirror ought to be made soon. (Especially because their [http://groklaw.net/robots.txt robots.txt] blocks the Wayback Machine.) --[[User:Mithrandir|Mithrandir]] 20:28, 21 August 2013 (EDT)
List of ArchiveTeam's early endavours, for historical interest, not edited.
* '''Archive''' the shutdown announcement pages on dead sites.
<!-- TO EDIT THE LIST, GO BACK AND CLICK "Edit this list". -->
** this is being done in every wiki page, pasting the announcement, and archiving when possible at WebCite. [[User:Emijrp|Emijrp]] 19:33, 4 June 2011 (UTC)
<div class="mw-collapsible-content" style="width:100%">
'''<span class="plainlinks">[http://archiveteam.org/index.php?title=Early_projects&action=edit Edit this list]</span>'''
{{:Early projects}}
</div>
</div>


* '''RSS Feed''' with death notices. - [[User:Jscott|Jason]]
** I'm taking a shot at this with [http://www.deaddyingdamned.com The Dead, the Dying & the Damned]. --[[User:Auguste|Auguste]] 14:34, 4 March 2011 (UTC)
* '''Twitter profile''' might be a good way to broadcast new site obituaries. - psicom
* '''[[TinyURL]]''' and similar services, scraping/backup - [[User:scumola|Steve]]
** highlight services that at least allow exporting data ([[Diigo]] that I know of). Next "best" - services that have registeration and enable viewing your URL / saving them by e.g. saving as HTML ([[tr.im]]). Etc. --[[User:Jaakkoh|Jaakkoh]] 05:39, 4 April 2009 (UTC)
** see [[urlteam]]. [[User:Emijrp|Emijrp]] 19:33, 4 June 2011 (UTC)
* '''[http://symphony21.com/ Symphony]''' could [http://nick-dunn.co.uk/article/symphony-as-a-data-preservation-utility/ potentially be used] for archiving structured XML/RSS feeds to a relational database - [[User:nickdunn|Nick]]
* '''A Firefox plugin''' for redirecting users to our archive when they request a site that's been rescued. - ???
**good idea, the problem is that the archives are not hosted as the original, but packed. [[User:Emijrp|Emijrp]] 19:32, 4 June 2011 (UTC)
**As some like what you propose already exists, this called [[wikipedia:MafiaaFire Redirector|MAFIAAFire Redirector]] (but that only redirects links from domains that have been seized by governments to backup sites) so if anyone wants to do this project, can be start by reviewing how this works extension. Although the files and pages are not hosted on a server as the original, but that all are packed, I read that [[wikipedia:Heritrix|Heritrix]] (the Internet Archive’s web crawler) by default the web resources that inspects are stored in a [[wikipedia:.arc|Arc]] archive, and perhaps could do something similar, but using bzip2, 7z, rar format archives or a combination of the above to manage the resources of a web. --[[User:Swicher|Swicher]] 07:23, 27 July 2011 (UTC)
* Archives of MUD, MUSH, MOO game sites and related information.  They won't all be around forever. --[[User:Auguste|Auguste]] 13:59, 24 February 2011 (UTC)
** I'm keeping an eye out for, and archiving sites like [http://www.lambdamoo.info LambdaMOO.info], which are either closing down or may be at risk. --[[User:Auguste|Auguste]] 13:59, 24 February 2011 (UTC)
* [http://ytmnd.com YTMND] [[User:Zachera|Zachera]] 20:06, 25 March 2011 (UTC)
* [http://c2.com/cgi/wiki?WikiWikiWeb WikiWikiWeb] - The first wiki, is still a valuable source of information on programming patterns and related topics. It's still active, but I'm not sure how much. It's been going since 1995 so its got real historical value. Plus it's all text and wouldn't take much space. The owner Ward Cunningham might be amenable to providing a copy, so I'd suggest contact first.
** I've done this and linked the dump from [[WikiTeam]]. -- [[User:Ca7|Ca7]]
* Electronics datasheets: [http://alldatasheet.com this], [http://datasheetarchive.com this], [http://www.datasheetcatalog.com this] [http://www.htmldatasheet.com and this] for example. Many of these datasheets are already very hard to find (esp. for older and rarer parts, e.g. those required to emulate old computer systems) and the sites are often in China, Russia or other countries that might give problems in the future. Lots of data to grab, and many of these sites only have very slow bandwidth, so it might be good to start archiving them early. --[[User:Darkstar|Darkstar]] 23:47, 9 April 2011 (UTC)
* '''ElfQuest Comics'''. They've recently all been scanned (6500 pages+) and are available [http://www.elfquest.com/gallery/OnlineComics3.html here]. They're hidden behind a Flash-based viewer though so someone would first have to decompile that to get to the links. --[[User:Darkstar|Darkstar]] 20:55, 18 May 2011 (UTC)
**Working on getting this finished up, done downloading all the images, just have to package it up. [[User:Underscor|Underscor]] 22:35, 4 June 2011 (UTC)
* '''TechNet Archive''': [http://www.microsoft.com/technet/archive/default.mspx?mfr=true here] "Technical information about older versions of Microsoft products and technologies. This information is scheduled to be removed soon." --[[User:Marceloantonio1|Marceloantonio1]] 08:24, 9 June 2011 (UTC -3)
**TechNet, and its big cousin, MSDN, are already being archived by other sites. For example, {{url|1=http://betaarchive.com}} has archived a huge pile of them, including older ones from the late 90's)
* '''Usenet''': is it archived somewhere but on Google's servers? How complex it would be to download the whole tree and put it somewhere as an archive? [[User:Nemo bis|Nemo bis]] 21:56, 6 July 2011 (UTC)
* '''[[Jux]]''' was going to get jammed on August 31, 2013, but not anymore. Still might be a good idea to keep them on the radar.
* Archive as many file servers (FTP and HTTP) as possible.
* '''[[Google Answers]]''' has no longer been accepting new questions for a while, and whether it will remain for a while is debatable.
* '''Newgrounds''' is one of the largest collections of Flash games and movies on the Internet. It would be a shame if it all disappeared.
* [[Yahoo!]] has decided to shut down more services, including [[Yahoo! Stars India]], [[Yahoo! Neighbors]], etc. These should be archived before they shut down. Also, yodel.yahoo.com seems to have been replaced by yahoo.tumblr.com, and should be archived too.
* Archive every [http://www.google.com/doodles/ Google Doodle].
* http://atheistpictures.com/
* Not if this goes here, but I have an idea for development an program that facilitates the detection of links that belong to certain sites. What do I mean by this?, Is that in my experience with the work in [[Windows Live Spaces]] archiving (and other projects that I've only checked), a problem that apparently occurs frequently is the search of links to those sites whose content will be archived; for example, the links of a Windows Live Space was whatever.spaces.ive.com or a video on Google Video is video.google.com/videoplay?docid=-[video ID number] and so therefore the problem in question is , where do I find the links to pages, videos, articles or anything of a site X and later archive the contents of the same?. Perhaps the most obvious answer is using the API of one or more search engines, but the [http://code.google.com/apis/ajaxsearch/documentation/reference.html Google Web Search API] is currently depreciated (besides being very limited), the [http://developer.yahoo.com/search/siteexplorer/ Site Explorer API] of Yahoo apparently stop working on Sept. 15 and to use the [http://msdn.microsoft.com/en-us/library/dd251020.aspx Bing's API] is required to have a registered AppId (from other search engines I have not checked, but I mention these because they are the most used). Well, because the APIs of the search engines do come with some problems for this project, then I think a good solution would opt to use the [http://www.google.com/search?q=%28automating|automatic|automation|automatization%29+web+%28browsing|browser%29 automation of the web browser] (that would be done the search/es required in (almost) all web searchers, traverse all the results found and to keep the corresponding links in somewhere). Maybe now some are wondering, why use that automatization if it can do likewise programmatically sending [[wikipedia:Hypertext Transfer Protocol#Request message|HTTP request]] to the server and parsing the HTML with the results?. Answer: It is true, it can also be done, but there is a "small" problem; search engines like Google and Bing have a dynamic HTML that when reviewing the source code of some of its results page, looks basically a mishmash of HTML and Javascript code hard to analyze, but this is solved with browser automatization because through this way the code of the search results page of the site would already be "served" for parsing because the browser interpret the code received from the server and convert this to commonplace HTML in RAM (or something) to illustrate this better I leave an example:
:[[File:Behavior of a dynamic page.PNG|thumb|left|Clicking on the picture can read a very detailed description of the four screenshots that compose this (besides being able to observe the image to full resolution)]]
:With this way of doing this also solves another detail; maintainability and adaptability of the code because the browser using automatization, all you have to do is indicate the search engine results page, the search term (which would something like site:whatever.com, inurl:.whatever.com/ and stuff like that), the tag where are the links results and what is the button "Next" (therefore this reduces the times of development and implementation for each particular search engine and without writing too much code). If anyone is still interested in the idea after that long explanation, then I will tell that between the browser automatization applications on which I have read, there are two that I have called attention, one is [http://watir.com/ Watir] (programmed in Ruby but is cross-platform and multibrowser) and [http://seleniumhq.org/projects/remote-control/ Selenium Remote Control] (also is cross-platform and multibrowser but unlike the previous one, this API supports C#, Java, Perl, PHP, Python and Ruby) so if anyone wants to realize this project, then can choose one of these applications to start (or other similar to the above). --[[User:Swicher|Swicher]] 09:41, 1 August 2011 (UTC)
* [http://www.harmonycentral.com Harmony Central] User (-submitted) Reviews were around for over a decade and covered just about every musical instrument and related accessory commercially sold. Site updates have caused these to be offline, though admins say the data still exists. As far as can be determined, Archive.org has little if any of these reviews. [http://www.harmonycentral.com/t5/Feedback/User-reviews/td-p/34660122 This thread] has the whole story. --[[User:Benbradley|Benbradley]] 20:41, 13 July 2013 (EDT)
== Finished Projects ==
:''See also: [[:Category:Rescued Sites]].''
* [[User:Jscott|Jason]] founded the Archive Team ([http://archiveteam.org/index.php?title=Main_Page&diff=prev&oldid=3 see]).
* [[User:Bbot|bbot]] made [http://thepiratebay.org/user/archiveteam/ an archiveteam TPB user]. Get the password from him or Jason. (Not really a ''project'', per se.)
* '''[[User:Bbot|bbot]]''' has archived [[everything2]], and will continue to make further archives as more content is added.
* [[starwars.yahoo.com]] was successfully archived before it shut downin Dec, 2009
* '''[[User:Sdboyd|Scott]]''' has archived the [http://www.infoanarchy.org Infoanarchy wiki] site. -- The archive is complete and is at: [http://mirrors.sdboyd56.com/infoanarchy/ Infoanarchy wiki '''archive''']. A [http://sdboyd56.com/archives/infoanarchy_archive-201011.tar.gz 5.1 MB gzipped archive] of the wiki is also available. (The Infoanarchy wiki site was down for several months in the first part of 2011, but is back up as of May 2011. There is now very little content updating on the site.)
* '''[[User:Sdboyd|Scott]]''' has archived/mirrored The Cyberpunk Project. (You'll have to Google it - this wiki won't let me edit a page that includes the Russian TLD.) This Russian-based Website is inactive, and hasn't been updated or changed since April 2010. Most pages haven't been changed since 2007. How long will it stay online? Your guess is as good as mine... The mirror is available at: [http://mirrors.sdboyd56.com/cyberpunk_project/ The Cyberpunk Project Mirror].
* As reported on [http://www.boingboing.net/2010/04/29/all-of-gopherspace-a.html boingboing] by Cory Doctorow, all of [[Gopher]]space - scraped in 2007 - needs an archive home. Anybody have 15GB of spare hosted-server space for this project?
::I do, please contact me at admin@emuwiki.com to tell me what to do. [[User:EmuWikiAdmin|EmuWikiAdmin]] 15:17, 2 May 2010 (UTC)
::They are added to iBiblio http://torrent.ibiblio.org/search.php?query=gopher&submit=search [[User:Emijrp|Emijrp]] 11:34, 2 November 2010 (UTC)
::It was added to Internet Archive by Jason too http://www.archive.org/details/2007-gopher-mirror [[User:Emijrp|Emijrp]] 19:23, 4 June 2011 (UTC)
* The data being hosted in Kasabi was retrieved and uploaded to [http://www.archive.org/details/kasabi Internet Archive]. [[User:Edsu|Edsu]] 13:03, 19 July 2012 (EDT)
* '''[[Splinder]]''' is being copied before it shuts down in early 2012.
* '''[[MobileMe]]''' - me.com.  Closed June 30th, 2012.
* [[User:Start|Start]] grabbed [https://www.dropbox.com/s/iok7mgvyxm3rvfj/FoxyTunes.zip FoxyTunes] (it's less than 1MB!) right before it shut down.
== Other Projects ==
* '''[[FanFiction.Net]]''' is being pre-emptively archived.
* '''[[User:ip2k|seanp2k]]''' is running [http://somaseek.com somaseek.com] and tracking all the song history for all of the internet radio stations on [http://somafm.com somafm.com] since March 2010.
* '''[[User:Ross|Ross]]''' is interviewing the sites of 2008.
* '''[[User:LesOrchard|l.m.orchard]]''' is starting work on some self-hosted web apps that will migrate and archive from other sites. (ie. [http://github.com/lmorchard/friendfeedarchiver FriendFeed], [http://github.com/lmorchard/memex/ Delicious])
* '''[[User:Sungo|sungo]]''' is archiving etherpad.
* '''[[User:Tsp|Tsp]]''' is attempting to archive the stories from fanfiction.net and fictionpress.
* '''[[User:Emijrp|emijrp]]''' is a member of [[WikiTeam]]. Also, downloading albums from [[Jamendo]]. You can know more about his projects in his userpage.
* '''[[User:jcbradley|Jean-Claude Bradley]]''' and '''[[User:romney|Andrew Lang]]''' are archiving the [http://onsbooks.wikispaces.com/ Open Notebook Science projects Reaction Attempts and the ONS Solubility Challenge].  This includes the lab notebooks and all associated raw data files.
* '''[[User:Hydriz|Hydriz]]''' is currently archiving all [http://dumps.wikimedia.org available dumps and downloads] generated by Wikimedia and uploading them to the Internet Archive (see [http://www.archive.org/details/wikimediadownloads collection]).
* '''[[User:Start|Start]]''' is archiving Emulation Zone.
== Dead Projects ==
* [[User:EmuWikiAdmin|EmuWikiAdmin]] created [http://www.emuwiki.com EmuWiki], a collection of all emulators, emulator documents, and hardware information that exists, regrouped in a referenced database.  Unfortunately, it [http://gbatemp.net/t230096-emuwiki-com-closes-down shut down] in May 2010 due to copyright issues.  A 20GB torrent of the site is apparently floating around somewhere.
== Tools ==
* [[Software]]
* [[httrack options]]
== See also ==
* [[Archives]]


{{Navigation pager
{{Navigation pager

Latest revision as of 16:38, 17 January 2017

Projects status
Online (331) · Special cases (51) · Endangered (70) · Closing (16) · Offline (424)
Rescued Sites (498) · Self-Saved (17) · Partially Rescued Sites (213) · In Progress (43) · Upcoming (11) · Not Saved Yet (409) · On hiatus (12) · Lost Sites (91)
Unknown Status (65)

This page should contain, or directly link to, almost all ArchiveTeam archiving endavours, categorized.

  • Current projects: currently active, upcoming and recently finished grandiose ArchiveTeam projects. (Extract of the next two categories.)
  • Warrior projects: projects that utilize(d) ArchiveTeam's distributed archiving system.
  • Manual projects that need(ed) much more effort than just pushing a button.
  • Small projects: small-scale website archiving projects usually done by a single individual.
  • Early projects: first archiving endavours on the dawn of ArchiveTeam, in a format nobody is apparently able/dare to touch.

(The box on the top counts projects having dedicated wiki pages, those numbers aren't complete and far don't contain all projects mentioned in the sections below.)

If you know of a website in danger, let us know that on IRC. If it's a larger site, please also mention it on the Deathwatch page. And, after a decision is made on IRC, or if it doesn't need a decision, then, to help things kept documented and up to date, you are encouraged to add projects, or modify their status

The box on the top is generated automatically from projects' dedicated wiki pages, so shouldn't be touched.

Important: Contents of sections below are embedded from other pages, that is, don't edit the section, nor this page, but use the "Edit this list" link! (That opens the corresponding page for editing, and after editing, you'll be forwarded to the page containing only that list: don't worry, you didn't delete the others.)

Current projects

Currently active team projects you can get involved in.

Edit this list

Archive Team recruiting

Warrior-based projects

ArchiveTeam's Choice: DeviantArt

Short-term, urgent projects

  • DeviantArt: Archiving custom widgets, favorites, group affiliations, countdown timers, admin forums, and admin announcements. IRC Channel #devianttart (on hackint)

Medium-term projects

(none currently)

Long-term projects

An updated Warrior virtual appliance (v3.2) is now available with better support for newer projects that utilize wget-at.

Manual projects

  • ArchiveBot: For those with lots of disk space, bandwidth and long-term commitment. IRC Channel #archivebot (on hackint).
  • Codearchiver: Dumping and archival of source code repositories and associated version control systems. IRC Channel #codearchiver (on hackint).
  • Dead people: When people die, their webpages and/or social media might go "Poof!" due to fees and other knick-knack. IRC Channel #archiveteam (on hackint)
  • WikiTeam: Saving wikis dumps (XML). And their external links for the Wayback Machine (WARC) as well as exporting MediaWiki databases. Permanent effort, everyone can help (you choose the size of your downloads). IRC Channel #wikiteam (on hackint).

Upcoming & proposed projects

Recently finished projects

  • Taringa!: Shut down on 2024-03-24 with barely two weeks lead time. IRC Channel #mataringa (on hackint).


On Hiatus

ArchiveTeam uses the hackint IRC network – ircs://irc.hackint.org:6697 (TLS required) – webchat: https://webirc.hackint.org/More info

Warrior projects

ArchiveTeam's past, current and future Warrior projects with details, in a table form.

Edit this list

Project IRC channel Status Began Finished Result Archive Location
Fotoalbum (script-only) #lookatthisfotograph (on hackint) Active
Google Sites (script-only) #nearlylostmygoogles (on hackint) Active
Github (script-only) #gitgud (on hackint) Active
Bitbucket (Mercurial repositories) #kickthebucket (on hackint) In Development
Reddit #shreddit (on hackint) In Development
Pastebin #pastalavista (on hackint) Active May 30, 2020
Google+ #googleminus (on EFnet) (abandoned) Downloads Finished March 5, 2019 April 2, 2019 Qualified Success archive
Flickr #flickrfckr (on hackint) Active January 9, 2019 archive
Tumblr #tumbledown (on hackint) Archive Posted December 8, 2018 December 17, 2018 Qualified Success archive
NUjij Archive Posted August 25, 2016 Success archive
Yahoo! Answers #noanswers (on hackint) Archive Posted August 21, 2016 archive
Orkut #throatkut (on EFnet) (abandoned) Archive Posted August 6, 2016 archive
Portalgraphics.net Archive Posted July 23, 2016 July 27, 2016 Success archive
DNS History #greatlookup (on EFnet) (abandoned) Aborted July 4, 2016 August 22, 2016 Failure
THOMAS Archive Posted July 3, 2016 July 5, 2016 Qualified Success archive
Coursera #cursera (on EFnet) (abandoned) Archive Posted June 26, 2016 June 30, 2016 Success archive
Olympe Downloads Finished June 5, 2016 June 6, 2016 Qualified Success
ZippCast Archive Posted June 3, 2016 June 10, 2016 Qualified Success archive
Arto Archive Posted May 8, 2016 June 29, 2016 Success archive
Bayimg Archive Posted April 28, 2016 archive
PDF 2016 #pdflush (on EFnet) (abandoned) Active April 8, 2016 archive
Virgin Media #virginsacrifice (on EFnet) (abandoned) Downloads Finished March 30, 2016 April 28, 2016 Qualified Success
LiveJournal #recordedjournal (on EFnet) (abandoned) Active March 12, 2016
GameTrailers #unhitchedtrailer (on EFnet) (abandoned) Archive Posted February 9, 2016 February 18, 2016 Qualified Success archive
Fotolog.com #fotologout (on EFnet) (abandoned) Active February 8, 2016 archive
Friends Reunited #friendsununited (on EFnet) (abandoned) Archive Posted February 5, 2016 February 26, 2016 Qualified Success archive
myVIP
(script-only)
#byevip (on EFnet) (abandoned) Archive Posted January 24, 2016 August 30, 2016 Success archive
MusicBrainz (external links) Archive Posted January 8, 2016 January 9, 2016 Success archive
OldFriends Archive Posted December 29, 2015 January 20, 2016 Success archive
Google Code #googlecodeblue (on EFnet) (abandoned) Active December 18, 2015 archive
Docstoc #docstop (on EFnet) (abandoned) Archive Posted November 24, 2015 December 1, 2015 Qualified Success archive
FTP (script-only) #effteepee (on hackint) Active November 30, 2015 archive
aDrive #bdrive (on EFnet) (abandoned) Archive Posted November 15, 2015 November 16, 2015 Qualified Success archive
Telenor personal websites #nohome (on EFnet) (abandoned) Archive Posted October 29, 2015 October 31, 2015 Qualified Success archive
WikiTeam (WARC format) #wikiteam (on hackint) Active October 26, 2015 archive
Yuku Active October 25, 2015 archive
GameFront #grillfront (on EFnet) (abandoned) Archive Posted October 20, 2015 April 29, 2016 Success archive
RuTracker #rutrasher (on EFnet) (abandoned) Archive Posted October 5, 2015 May 31, 2016 Success archive
Thingiverse Archive Posted September 23, 2015 January 24, 2016 Success archive
Skillfeed #skillessfeed (on EFnet) (abandoned) Archive Posted September 14, 2015 September 20, 2015 Success archive
Blingee #tragedee (on EFnet) (abandoned) Archive Posted August 16, 2015 October 8, 2015 Qualified Success archive
Google Moderator #moderhater (on EFnet) (abandoned) Archive Posted July 21, 2015 July 22, 2015 Success archive
Toshiba Support #toshibah (on EFnet) (abandoned) Archive Posted June 24, 2015 July 5, 2015 Success archive
Xfire Social Website #xfired (on EFnet) (abandoned) Archive Posted June 19, 2015 July 9, 2015 Qualified Success archive
Zoocasa #zoohouse (on EFnet) (abandoned) Archive Posted June 18, 2015 June 25, 2015 Success archive
SourceForge #coldstorage (on EFnet) (abandoned) Aborted June 17, 2015 June 19, 2015
Pomf.se #pomfret (on EFnet) (abandoned) Archive Posted June 9, 2015 June 17, 2015 Success archive
Google Baraza #bonanza (on EFnet) (abandoned) Archive Posted April 28, 2015 May 7, 2015 Success archive
Google Helpouts #helpus (on EFnet) (abandoned) Archive Posted April 16, 2015 April 21, 2015 Success archive
LayerVault #layersalt (on EFnet) (abandoned) Archive Posted April 6, 2015 April 11, 2015 Success archive
FriendFeed #humancentifeed (on EFnet) (abandoned) Archive Posted April 2, 2015 April 9, 2015 Qualified Success archive
Last.fm #lastchance.fm (on EFnet) (abandoned) Archive Posted March 30, 2015 August 28, 2015 Qualified Success archive
FurAffinity #iceking (on EFnet) (abandoned) Archive Posted March 26, 2015 June 15, 2015 Success archive
Madden GIFERATOR #jiferator (on EFnet) (abandoned) Archive Posted March 21, 2015 March 23, 2015 Success archive
RapidShare #rapidscare (on EFnet) (abandoned) Archive Posted March 20, 2015 March 29, 2015 Qualified Success archive
Trovebox #treasuretrove (on EFnet) (abandoned) Archive Posted March 14, 2015 June 27, 2015 Success archive
Google Business Sitebuilder #sitebreaker (on EFnet) (abandoned) Archive Posted March 9, 2015 March 10, 2015 Success archive
Blogger #frogger (on EFnet) (abandoned) Aborted February 25, 2015 May 6, 2015
TestFlight #crashed (on EFnet) (abandoned) Archive Posted February 13, 2015 February 25, 2015 Success archive
Cobook #cookbook (on EFnet) (abandoned) Archive Posted February 9, 2015 February 11, 2015 Success archive
Ovi Store #downlovi (on EFnet) (abandoned) Archive Posted February 3, 2015 February 15, 2015 Qualified Success archive
Inkblazers #inkerasers (on EFnet) (abandoned) Archive Posted January 18, 2015 January 31, 2015 Success archive
Brace.io #braceyourself (on EFnet) (abandoned) Archive Posted January 12, 2015 January 18, 2015 Success archive
Vstreamers #destreamers (on EFnet) (abandoned) Archive Posted January 6, 2015 January 10, 2015 Success archive
Nokia Memories #backtorubber (on EFnet) (abandoned) Archive Posted December 30, 2014 December 30, 2014 Success archive
Microsoft Clip Art #clipfart (on EFnet) (abandoned) Archive Posted December 23, 2014 December 29, 2014 Success archive
Roon #rooined (on EFnet) (abandoned) Archive Posted December 20, 2014 December 21, 2014 Success archive
ZipList #zipyourlips (on EFnet) (abandoned) Archive Posted December 2, 2014 December 4, 2014 Success archive
Viddy #viddiot (on EFnet) (abandoned) Archive Posted December 2, 2014 December 15, 2014 Success archive
Halo
(Halo 2 & 3 stuff)
#yolohalo (on EFnet) (abandoned) Archive Posted November 6, 2014 June 23, 2015 Success archive
GameMaker Sandbox Archive Posted October 15, 2014 October 19, 2014 Success archive
Qwiki #quickie (on EFnet) (abandoned) Archive Posted September 28, 2014 November 1, 2014 Qualified Success archive
Quizilla #fizzilla (on EFnet) (abandoned) Archive Posted September 4, 2014 October 1, 2014 Success archive
Ancestry.com #ancienthistory (on EFnet) (abandoned) Archive Posted September 19, 2014 November 5, 2014 Success archive
TwitPic #quitpic (on EFnet) (abandoned) Archive Posted September 4, 2014 January 2, 2015 Qualified Success archive
Verizon Personal Web Space #verizoff (on EFnet) (abandoned) Archive Posted September 2, 2014 October 1, 2014 Qualified Success archive
Swipnet #swiped (on EFnet) (abandoned) Archive Posted August 19, 2014 September 1, 2014 Success archive
Canv.as #canvas (on EFnet) (abandoned) Archive Posted August 11, 2014 August 12, 2014 Success archive
Twitch.tv #burnthetwitch (on EFnet) (abandoned) Archive Posted August 9, 2014 August 24, 2014 Qualified Success archive
Fotopedia #fotofinished (on EFnet) (abandoned) Archive Posted August 5, 2014 August 7, 2014 Success archive
Yahoo! Voices #shutup (on EFnet) (abandoned) Archive Posted July 28, 2014 July 31, 2014 Success archive
Justin.tv #justouttv (on EFnet) (abandoned) Archive Posted June 5, 2014 June 15, 2014 Success archive
Viddler #fiddler (on EFnet) (abandoned) Cancelled February 21, 2014 February 27, 2014 Qualified Success archive
Bebo #cockandballs (on EFnet) (abandoned) Hiatus February 18, 2014 archive
My Opera #fatlady (on EFnet) (abandoned) Archive Posted February 16, 2014 March 3, 2014 Success archive
Dogster #rawdogster (on EFnet) (abandoned) Archive Posted February 7, 2014 February 16, 2014 Success archive
Wretch & Yahoo! Blog #shipwretched (on EFnet) (abandoned) Archive Posted December 17, 2013 January 9, 2014 Qualified Success archives: Wretch, Yahoo Blog
Hyves #angerthehyve (on EFnet) (abandoned) Archive Posted November 10, 2013 December 2, 2013 Success archive
Blip.tv #blooper.tv (on EFnet) (abandoned) Archive Posted October 11, 2013 August 27, 2015 Qualified Success archive 1 archive 2
Zapd #crapd (on EFnet) (abandoned) Archive Posted October 1, 2013 October 8, 2013 Success archive
Xanga #jenga (on EFnet) (abandoned) Downloads Paused June 21, 2013 August 31, 2013 archive
Streetfiles.org #streetsoffire (on EFnet) (abandoned) Archive Posted April 28, 2013 April 30, 2013 Qualified Success archive
Yahoo! Upcoming #outgong (on EFnet) (abandoned) Archive Posted April 20, 2013 April 25, 2013 archive
Formspring #firespring (on EFnet) (abandoned) Archive Posted March 24, 2013 September 19, 2013 Success archive
Yahoo! Messages #BurnTheMessenger (on EFnet) (abandoned) Archive Posted March 20, 2013 March 31, 2013 archive
Storylane Archive Posted March 8, 2013 March 15, 2013 archive
Posterous #preposterous (on EFnet) (abandoned) Archive Posted February 23, 2013 June 29, 2013 archive
Xanga #jenga (on EFnet) (abandoned) Downloads Paused January 22, 2013 February 16, 2013 archive, user lookup, user list
Punchfork Archive Posted January 11, 2013 March 6, 2013 archive, user lookup
URLTeam #urlteam (on hackint) Active all releases
weblog.nl Archive Posted January 19, 2013 February 2, 2013 archive, user lookup
Yahoo! Blog #yahooblah (on EFnet) (abandoned) Archive Posted January 8, 2013 January 19, 2013 archive
GitHub Downloads Archive Posted December 13, 2012 December 17, 2012 Success archive, index
Daily Booth Archive Posted November 19, 2012 December 29, 2012 archive, user lookup
BT Internet Archive Posted October 10, 2012 November 2, 2012 Success archive
Webshots #webshots (on EFnet) (abandoned) Archive Posted October 4, 2012 November 18, 2012 archive, user lookup
City of Heroes Archive Posted September 3, 2012 December 1, 2012 Success archive
Cinch.FM Archive Posted August 20, 2012 August 22, 2012 Success archive
Tumblr (test project) Archive Posted August 9, 2012 August 19, 2012 archive (tar), archive (warc)
Picplz Archive Posted June 3, 2012 June 15, 2012 archive, user lookup, index
Tabblo Archive Posted May 23, 2012 May 26, 2012 Success archive, user lookup
FortuneCity #fortuneshitty (on EFnet) (abandoned) Archive Posted April 4, 2012 April 11, 2012 Qualified Success archive, user lookup
MobileMe Archive Posted April 3, 2012 Aug 8, 2012 Success archive, user lookup, index

Status

In Development
a future project
Active
start up a Warrior and join the fun; this one is in progress right now
Active (paused)
not running currently but stay tuned!
On Hold
project suspended indefinitely but not given up
Downloads Finished
we've finished downloading the data
Archived
the collected data has been properly archived
Archive Posted
the archive is available for download

Result

Success
downloaded all of the data and posted the archive publicly
Qualified Success
either we couldn't get all of the data, or the archive can't be made public
Failure
the site closed before we could download anything

Manual projects

Difficult, discussion-intensive, human-resource-intensive and audit projects.

Edit this list

Project IRC channel Description Status Started Finished Archives/Results
Yahoogroups-joiner #yahoosucks (on hackint) Filling out captchas to archive Yahoo Groups Active 2019-10-19 leaderboard
Project Newsletter #projectnewsletter (on hackint) Archiving all the email newsletters Active 2015-03-27
Woohoo #woohoo (on EFnet) (abandoned) Doing a census of all of Yahoo!'s products Active 2015-03-13 result
Froogle #froogle (on EFnet) (abandoned) Doing a census of all of Google's products Active 2015-03-13 result
INTERNETARCHIVE.BAK #internetarchive.bak (on hackint) Backing up the Internet Archive Active 2015-03-02 stats
ISP Hosting #webroasting (on hackint) Finding ISP web hosting services before the Grim Reaper finds them. Active 2014-12-30 see there
Project Valhalla #huntinggrounds (on hackint) Discussing where and how to store archives that are too big for the Internet Archive at the moment. Active 2014-09-18 see there
Audit2014 #auditteam (on hackint) We've uploaded a bunch of stuff. Let's go through the list and make sure it's categorized, has decent metadata, etc. Active 2014-07-16 list,
the content
ArchiveBot #archivebot (on hackint) IRC bot designed to automate the archival of smaller websites Active 2013-09-06 archives,
search
AOL #aohell (on hackint) Archiving the original AOL, not AOL's current website Active 2013-01-28 [1]
WikiTeam #wikiteam (on hackint) Exporting Mediawiki databases in XML dumps Active 2011-04-05 [2]
FTP #effteepee (on hackint) Downloading all the FTP sites Active e.g. [3]

Small projects

List of smaller website rescuing projects, usually done by single individuals.

Edit this list

See also what's been crawled by ArchiveBot: browse here.

For Hungarian websites, see bzc6p's userpage.

You should also try searching on http://archive.org including keyword archiveteam, or for browsing, directly in the Wayback Machine.

Website Site status Closure date Archiving status Archived by Started Finished Archives
Wikispot Closed 2015-07-27 Partially saved bzc6p 2015-06-30 2015-07-31 [4]
Pastebin Online In progress... joepie91 2014-09-09
TechNet Closing 2014-03-28 Partially saved Arkiver, Mithrandir, Darkstar
Widgetbox Closed 2014-09-30 Saved Arkiver 2013-12-19
Quick.io Closed 2013-12-31

Saved

Arkiver 2013-12-13 2013-12-13
winamp.com

Saved

2013-11 2013-11 [5]

Early projects

List of ArchiveTeam's early endavours, for historical interest, not edited.

Edit this list

Archiveteam1.png Historical content

This page or section is not really edited any more, probably because the project got abandoned, information is collected somewhere else in a different form etc.

However, this is a good and important record of ArchiveTeam's ancient times, thus must be preserved, but merging it into an other article would be difficult and/or some pieces of information are missing for a new form.

So feel free to read this, but it has probably nothing to be added now. However, if you resurrect the project or find a way to move this data to a fresh place, you can remove this template.


Look at Archive Team Collection at Internet Archive too

Some archives available for downloading, by Archive Team or by other volunteers or groups.

Look at Archive Team Collection at Internet Archive too.


Available for download

Title/Download link Description Size
Geocities - The PATCHED Torrent (IA) The popular web hosting service founded in 1994. It was closed by Yahoo! in 2009 641.4 GB
URL Shortener Backup Torrent v4 URLTeam compressed backups of various URL shorteners (README) 75 GB
URL Shortener Backup Torrent v3 outdated, use v4 URLTeam compressed backups of various URL shorteners (README) 50 GB
URL Shortener Backup Torrent v2 outdated, use v4 URLTeam compressed backups of various URL shorteners (README) 48 GB
URL Shortener Backup Torrent v1 outdated, use v4 URLTeam compressed backups of various URL shorteners (README) 41.1 GB
Papers from Philosophical Transactions of the Royal Society This archive contains 18,592 scientific publications totaling 33GiB, all from Philosophical Transactions of the Royal Society and which should be available to everyone at no cost, but most have previously only been made available at high prices through paywall gatekeepers like JSTOR. 32.48 GB
The May 2011 Calufa Twitter Scrape 90+ million tweets from more than 6 million users 14.9 GB
Internet Gopher Archive 2007 (IA) Archive of gopher sites 14.8 GB
Encyclopedia Dramatica January 2010 Mirror lulz 11.7 GB
The TEXTFILES.COM Time Capsule This collection comprises all the major text-based sets of the TEXTFILES.COM site 11 GB
Salon Table Talk Threads of this talk site +6.0 GB
Usenet Archive of UTZOO Tapes Collection of .TGZ files of very early USENET posted data 2.0 GB
Quux.org Gopher Mirror Collection 2006 (IA) This is a collection of mirrors maintained by gopher.quux.org. These mirrors were taken offline in 2006 due to bandwidth constraints 1.5 GB
full-history-linux.git.tar GIT repository of Linux Kernel from 1991 to 2010 (details) 594 MB
Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape Almost 10 million tweets 425 MB
The 2010 Reddit Research Project Dataset on affinities of 60,000+ Reddit users, recorded in 2010 ~360 MB
Archive Team Starwars.Yahoo.Com Panic Download This is a panic download of the starwars.yahoo.com forums and profiles, done before the closure of same by Yahoo on December 15, 2009. This includes as many messages, profiles, and pages related to the site as could be easily brought in. ~250 MB
Social Structure of Facebook Networks Facebook Data Scrape Facebook data scrape related to paper "The Social Structure of Facebook Networks", by Amanda L. Traud, Peter J. Mucha, Mason A. Porter 197 MB
Archive Team's Etherpad Time Capsule This archive contains roughly 6,400 Etherpads, in their final state 125 MB
WikiTeam archives Archives about wikis. See WikiTeam +100 MB
Archive Team Archive Team.org Site Rip from August 03, 2011 75 MB
Boing Boing Posts Archive (2000-2011) Two collections of Boing Boing postings provided by the cultural website boingboing.net on its 5th and 11th anniversaries 42 MB
Archive Team Quotes Database Backup Amusing snatches of conversation from IRC and other online gathering places 5 MB
Mirror of Revelation Passage Series Website wget of a small author's website. ~500kb
Archive Team Powerblogs Shutdown Snapshot This is a 108-blog snapshot of the final month of Powerblogs, before their shutdown ?
BBC Closing Panic Archives Some BBC sites ?
stillflying.net A firefly fan fiction site that maded the rest of season 1 and season 2 pdf scripts for what would have been if firefly wasn't canceled. 408.1mb
Google Reader Text for 46M feeds, per-feed statistics, Reader Directory search results ~8800GB
Earbits Website, ~130,000 MP3s and metadata. ~650GB
SciMag 38 million scientific articles ~28TB
Google Video
Yahoo! Video

Archived but not available




The following three sections have been moved here without modification from the old Projects page.

Finished projects

This is a list of completed projects which do not have their own page on this wiki.

See Category:Rescued Sites for projects which do have their own page on this wiki.


  • (mirror | 4.5MB archive) The infoAnarchy wiki was archived by Scott.
    • infoAnarchy was down for several months in the first part of 2011, but is back up as of May 2011. There is now very little content updating on the site. As of 2014-06-02, infoAnarchy has a "Revive infoanarchy.org blog & wiki" notice and a request for donations, suggesting it may not have a future. As of 2014-06-02, a "database is locked" message will be given to logged-in users.
    • If there are future updates to that archive, they may be found at http://sdboyd56.com/archives/
    • FIXME - This archive has non-relative links, requiring it to be in /infoanarchy. It needs to be redone or edited to have relative links.
    • FIXME - This archive does not include the complete history, which is absolutely essential in this case, as significant editing history exists.
  • (mirror) The Cyberpunk Project was archived by Scott
    • Note that this wiki does not allow the Russian TLD, so the URL will have to be edited to be visited.
    • Most pages haven't been changed since 2007. It hasn't been updated or changed since April 2010.
    • FIXME - this mirror is incomplete, or its links are pointing to the live website.
  • (archive) Kasabi's data was retrieved and uploaded to archive.org by Edsu.
  • (archive) FoxyTunes was archived by Start
    • (it's less than 1MB!)
  • (archive) Emulation Zone was archived by Start
    • FIXME - vgaa.emulationzone.org-2014-0708.warc.gz got interrupted by a crash and needs to be re-archived

Other projects

Dead projects



Some more

You'll find traces of some other old projects on the historical IRC channel list: IRC/Old.


Fire DrillProjectsPhilosophy