Revision as of 18:46, 11 January 2012 by Darkstar
| Online (97) · Closing (14) · Offline (38)
| Rescued Sites (27) · In progress (38) · Not saved yet (73) · Lost Sites (10)
|Unknown status (142)|
Here's where Archive Teamsters can list the projects they are currently working on and organize new projects.
- See also: Category:In progress.
Projects with BASH scripts that need more people running them
- MobileMe - me.com. Closing June 30th, 2012. ~200 TB to download.
- Splinder is being copied before it shuts down in early 2012.
- FanFiction.Net is being pre-emptively archived.
- seanp2k is running somaseek.com and tracking all the song history for all of the internet radio stations on somafm.com since March 2010.
- Ross is interviewing the sites of 2008.
- l.m.orchard is starting work on some self-hosted web apps that will migrate and archive from other sites. (ie. FriendFeed, Delicious)
- sungo is archiving etherpad.
- Tsp is attempting to archive the stories from fanfiction.net and fictionpress.
- emijrp is a member of WikiTeam. Also, downloading albums from Jamendo. You can know more about his projects in his userpage.
- Jean-Claude Bradley and Andrew Lang are archiving the Open Notebook Science projects Reaction Attempts and the ONS Solubility Challenge. This includes the lab notebooks and all associated raw data files.
- Hydriz is currently archiving all available dumps and downloads generated by Wikimedia and uploading them to the Internet Archive (see collection).
Ideas for Projects
- Suggestion: An archive of .gif and .swf preloaders? Kuro 19:49, 29 December 2009 (UTC)
- We can extract all the .gif files from the GeoCities archive and compare them using md5sum to discard dupes. Emijrp 19:58, 21 December 2010 (UTC)
- Set up an FTP hub which AT members can access and up/down finished projects.
- Track the 100+ top twitter feeds, as designated by one of these idiot Twitter grading sites, and back up on a regular basis the top twitter people, for posterity.
- Groklaw has a project proposal that we could help with. - Jason
- Archive the shutdown announcement pages on dead sites.
- this is being done in every wiki page, pasting the announcement, and archiving when possible at WebCite. Emijrp 19:33, 4 June 2011 (UTC)
- RSS Feed with death notices. - Jason
- Twitter profile might be a good way to broadcast new site obituaries. - psicom
- TinyURL and similar services, scraping/backup - Steve
- Symphony could potentially be used for archiving structured XML/RSS feeds to a relational database - Nick
- A Firefox plugin for redirecting users to our archive when they request a site that's been rescued. - ???
- good idea, the problem is that the archives are not hosted as the original, but packed. Emijrp 19:32, 4 June 2011 (UTC)
- As some like what you propose already exists, this called MAFIAAFire Redirector (but that only redirects links from domains that have been seized by governments to backup sites) so if anyone wants to do this project, can be start by reviewing how this works extension. Although the files and pages are not hosted on a server as the original, but that all are packed, I read that Heritrix (the Internet Archive’s web crawler) by default the web resources that inspects are stored in a Arc archive, and perhaps could do something similar, but using bzip2, 7z, rar format archives or a combination of the above to manage the resources of a web. --Swicher 07:23, 27 July 2011 (UTC)
- Archives of MUD, MUSH, MOO game sites and related information. They won't all be around forever. --Auguste 13:59, 24 February 2011 (UTC)
- YTMND Zachera 20:06, 25 March 2011 (UTC)
- WikiWikiWeb - The first wiki, is still a valuable source of information on programming patterns and related topics. It's still active, but I'm not sure how much. It's been going since 1995 so its got real historical value. Plus it's all text and wouldn't take much space. The owner Ward Cunningham might be amenable to providing a copy, so I'd suggest contact first.
- Electronics datasheets: this, this, this and this for example. Many of these datasheets are already very hard to find (esp. for older and rarer parts, e.g. those required to emulate old computer systems) and the sites are often in China, Russia or other countries that might give problems in the future. Lots of data to grab, and many of these sites only have very slow bandwidth, so it might be good to start archiving them early. --Darkstar 23:47, 9 April 2011 (UTC)
- ElfQuest Comics. They've recently all been scanned (6500 pages+) and are available here. They're hidden behind a Flash-based viewer though so someone would first have to decompile that to get to the links. --Darkstar 20:55, 18 May 2011 (UTC)
- Working on getting this finished up, done downloading all the images, just have to package it up. Underscor 22:35, 4 June 2011 (UTC)
- TechNet Archive: here "Technical information about older versions of Microsoft products and technologies. This information is scheduled to be removed soon." --Marceloantonio1 08:24, 9 June 2011 (UTC -3)
- TechNet, and its big cousin, MSDN, are already being archived by other sites. For example, has archived a huge pile of them, including older ones from the late 90's)
- Usenet: is it archived somewhere but on Google's servers? How complex it would be to download the whole tree and put it somewhere as an archive? Nemo bis 21:56, 6 July 2011 (UTC)
- With this way of doing this also solves another detail; maintainability and adaptability of the code because the browser using automatization, all you have to do is indicate the search engine results page, the search term (which would something like site:whatever.com, inurl:.whatever.com/ and stuff like that), the tag where are the links results and what is the button "Next" (therefore this reduces the times of development and implementation for each particular search engine and without writing too much code). If anyone is still interested in the idea after that long explanation, then I will tell that between the browser automatization applications on which I have read, there are two that I have called attention, one is Watir (programmed in Ruby but is cross-platform and multibrowser) and Selenium Remote Control (also is cross-platform and multibrowser but unlike the previous one, this API supports C#, Java, Perl, PHP, Python and Ruby) so if anyone wants to realize this project, then can choose one of these applications to start (or other similar to the above). --Swicher 09:41, 1 August 2011 (UTC)
- See also: Category:Rescued Sites.
- Jason founded the Archive Team (see).
- bbot made an archiveteam TPB user. Get the password from him or Jason. (Not really a project, per se.)
- bbot has archived everything2, and will continue to make further archives as more content is added.
- starwars.yahoo.com was successfully archived before it shut downin Dec, 2009
- Scott has archived the Infoanarchy wiki site. -- The archive is complete and is at: Infoanarchy wiki archive. A 5.1 MB gzipped archive of the wiki is also available. (The Infoanarchy wiki site was down for several months in the first part of 2011, but is back up as of May 2011. There is now very little content updating on the site.)
- Scott has archived/mirrored The Cyberpunk Project. (You'll have to Google it - this wiki won't let me edit a page that includes the Russian TLD.) This Russian-based Website is inactive, and hasn't been updated or changed since April 2010. Most pages haven't been changed since 2007. How long will it stay online? Your guess is as good as mine... The mirror is available at: The Cyberpunk Project Mirror.
- As reported on boingboing by Cory Doctorow, all of Gopherspace - scraped in 2007 - needs an archive home. Anybody have 15GB of spare hosted-server space for this project?
- I do, please contact me at firstname.lastname@example.org to tell me what to do. EmuWikiAdmin 15:17, 2 May 2010 (UTC)
- They are added to iBiblio http://torrent.ibiblio.org/search.php?query=gopher&submit=search Emijrp 11:34, 2 November 2010 (UTC)
- It was added to Internet Archive by Jason too http://www.archive.org/details/2007-gopher-mirror Emijrp 19:23, 4 June 2011 (UTC)
- EmuWikiAdmin created EmuWiki, a collection of all emulators, emulator documents, and hardware information that exists, regrouped in a referenced database. Unfortunately, it shut down in May 2010 due to copyright issues. A 20GB torrent of the site is apparently floating around somewhere.