Alive... OR ARE THEY

From Archiveteam
Revision as of 18:49, 27 October 2009 by Muscleman (talk | contribs) (Undo revision 1141 by Ertyu (Talk))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Like many sites before them, these places indicate a sunny outlook, a clean bill of health and a total sense of "all systems go". But as we've found out from those many sites before them, fortunes can change overnight.

Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.

Sites

  • Wikia (www.wikia.com), the for-pay arm of Wikipedia (just kidding, it's a different company, but shares a lot of people) is a repository of directed, unsubject-to-wikipolitics wikis, many of them intense and completist. It'd be bad for them to go away.
  • fanfiction.net represents many thousands of user-generated stories, essays and huge amounts of work.
  • SourceForge is a critical repository of open source code, information, and webpages. It is mirrored and maintained, but there are sure to be parts that are neither.
  • Facebook seems stable at the moment.
  • Friendfeed is a happy clam who recently shacked up with Facebook.
  • Google wants you to think they will be here forever.
  • Twitter is tweaking away.
  • Wikipedia will surely be here forever and ever! Fortunately, we don't have to take their word for it as they offer dumps of the data minus the photos. However no-one has verified that Wikipedia can actually be restored from these dumps. If disaster strikes then we could discover a serious problem.
  • Delicious loves to change their API, which has a side effect of making it difficult to back up.
  • whitehouse.gov is up and running for #44, but we've lost all info for #43. (See also: kottke and Read Write Web.) and #43 is available at http://georgewbush-whitehouse.archives.gov/ thanks to the Presidential Records Act. We also want to watch out for site changes / disappeared pages that were embarassing or whatnot.
  • Infoanarchy The site is functioning again. Might be worth backing up, though. For months, a simple database error that could be fixed with one command KO'd this site unexpectedly with a wealth of P2P information lost. [1]
  • LiveJournal fired a bunch of US-based developers, but is still serving from its new (presumably cheaper) data center in Montana.
  • Last.fm is being cloned by free software developers in the form of Libre.fm -- they have a tool, Lastscrape which can get all your listening data out into a tab delimited text file.
  • WikiLeaks is a valuable site that will be making enemies.
  • Archive.org seems stable at the moment but it's 2 petabytes of data aren't mirrored anywhere else, the code for their system isn't open source and generally they're a single point of failure for a large amount of the web's history. Why should there be only 1 internet archive?