https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Bbot&feedformat=atomArchiveteam - User contributions [en]2024-03-28T09:47:51ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Everything2&diff=7647Everything22012-04-24T20:58:13Z<p>Bbot: Updated torrent link</p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{rescued}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
The torrent of the first two million nodes is [http://bbot.org/everything2-2M-v2.tbz.torrent here,] (magnet:?xt=urn:btih:3da079e5932acdacfdf183ff6de1698b6f1a24b7&dn=Everything2+%280-2M%29&tr=http%3A%2F%2Fdenis.stalker.h3q.com%3A6969%2Fannounce) and a blog post on the tedious technical details is [http://bbot.org/blog/archives/2011/01/17/more_fun_with_wget/ here].<br />
<br />
Since E2 is an operational site, additional backups will be made as more content is added.<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=7565User:Bbot2012-04-09T21:55:15Z<p>Bbot: updated key</p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.<br />
<br />
==Who are you?==<br />
<br />
I just said I'm bbot, man. Pay attention.<br />
<br />
==Yeah, well, what's your public key?==<br />
<br />
My [http://bbot.org/publickey.asc public key]'s fingerprint is:<br />
<br />
pub 4096R/3BF0717D 2010-12-09 Samuel Bierwagen (z)<br />
Primary key fingerprint: 71DF 47A4 DFA5 0604 3A86 FCC2 4FA1 7276 3BF0 717D<br />
<br />
If that link is nonfunctional, you can obtain it from any quality keyserver.</div>Bbothttps://wiki.archiveteam.org/index.php?title=Wget&diff=5708Wget2011-06-02T05:16:57Z<p>Bbot: </p>
<hr />
<div>[http://www.gnu.org/software/wget/ GNU Wget] is a free utility for non-interactive download of files from the Web. Using Wget, it is possible to grab a large chunk of data, or mirror an entire website with it's complete directory tree using a single command. In the tool belt of the renegade archivist, Wget tends to get an awful lot of use. (Note: Some people prefer to use [http://curl.haxx.se/ cURL]. If it can back up data, it's useful).<br />
<br />
This guide will not attempt to explain all possible uses of Wget; rather, this is intended to be a concise intro to using Wget, specifically geared towards using the tool to archive data such as podcasts, PDF documents, or entire websites. Issues such as using Wget to circumvent user-agent checks, or robots.txt restrictions, will be outlined as well.<br />
<br />
== Mirroring a website ==<br />
<br />
When you run something like this:<br />
<pre><br />
wget http://icanhascheezburger.com/<br />
</pre><br />
...Wget will just grab the first page it hits, usually something like index.html. If you give it the -m flag:<br />
<pre><br />
wget -m http://icanhascheezburger.com/<br />
</pre><br />
...then Wget will happily slurp down anything within reach of its greedy claws, putting files in a complete directory structure. Go make a sandwich or something.<br />
<br />
You'll probably want to pair -m with -c (which tells Wget to continue partially-complete downloads) and -b (which tells wget to fork to the background, logging to wget-log).<br />
<br />
If you want to grab everything in a specific directory - say, the SICP directory on the mitpress web site - use the -np flag:<br />
<pre><br />
wget -mbc -np http://mitpress.mit.edu/sicp<br />
</pre><br />
<br />
This will tell Wget to not go up the directory tree, only downwards.<br />
<br />
== User-agents and robots.txt ==<br />
<br />
By default, Wget plays nicely with a website's robots.txt. This can lead to situations where Wget won't grab anything, since the robots.txt disallows Wget.<br />
<br />
To avoid this: first, you should try using the --user-agent option:<br />
<pre><br />
wget -mbc --user-agent="" http://website.com/<br />
</pre><br />
This instructs Wget to not send any user agent string at all. Another option for this is:<br />
<pre><br />
wget -mbc -e robots=off http://website.com/<br />
</pre><br />
...which tells Wget to ignore robots.txt directives altogether.<br />
<br />
You can put --wait 1 to add a delay, to be nice with server.<br />
<br />
== Compression ==<br />
<br />
Wget doesn't use compression by default! This can make a big difference when you're downloading easily compressible data, like human-language HTML text, but doesn't help at all when downloading material that is already compressed, like JPEG or PNG files. To enable compression, use:<br />
<pre><br />
wget --header="accept-encoding: gzip"<br />
</pre><br />
This will produce a file (if the remote server supports gzip compression) that uses the .html extension, but is actually gzip-encoded, which can be confusing.<br />
<br />
Any vaguely modern server can sustain thousands of simultaneous text downloads, with video or large images being the big ticket items. But sites using outdated hardware, or run by habitual whiners, will complain when a site scraping uses 200 megabytes of transfer when it could have used 100.<br />
<br />
== Tricks and Traps ==<br />
<br />
* A standard methodology to prevent scraping of websites is to block access via user agent string. Wget is a good web citizen and identifies itself. Renegade archivists are not good web citizens in this sense. The '''--user-agent''' option will allow you to act like something else.<br />
* Some websites are actually aggregates of multiple machines and subdomains, working together. (For example, a site called ''dyingwebsite.com'' will have additional machines like ''download.dyingwebsite.com'' or ''mp3.dyingwebsite.com'') To account for this, add the following options: '''-H -Ddomain.com'''<br />
<br />
== Wget for Windows ==<br />
Windows users can download [http://gnuwin32.sourceforge.net/packages/wget.htm Wget for Windows], part of the [http://gnuwin32.sourceforge.net/ GNUWin32 project]. After installation, you will probably want to add it to your Path so that you can run it directly from the command prompt instead of specifying its absolute file path (i.e. "wget" instead of "C:\Program Files\GNUWin32\bin\wget.exe").<br />
<br />
These are the instructions for Windows 7 users. Prior versions should be relatively similar.<br />
#Install Wget<br />
#Right-click My Computer and select Properties<br />
#Select Advanced System Settings from the left<br />
#Click the Environment Variables button in the bottom-right corner<br />
#Under System Variables, find the Path variable and click Edit<br />
#Carefully insert the path to Wget's bin folder followed by a semi-colon. Getting this wrong could cause some nasty system problems<br />
#*Your Wget path should be inserted like this: C:\Program Files\GnuWin32\bin;<br />
#When done, click OK through all the dialog boxes you opened<br />
#The changes should apply immediately under Windows 7. Older versions may require a reboot<br />
#To test the settings, open a command prompt and enter "wget"<br />
<br />
== Parallel downloading ==<br />
http://keramida.wordpress.com/2010/01/19/parallel-downloads-with-python-and-gnu-wget/<br />
<br />
== Essays and Reading on the Use of WGET ==<br />
<br />
* [http://lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php Mastering WGET] by Gina Trapani<br />
* [http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html Using Wget or curl to download web sites for archival] by Phil Sung<br />
* [http://linux.about.com/od/commands/l/blcmdl1_wget.htm about.com Wget] list of commands<br />
* [http://www.delorie.com/gnu/docs/wget/wget.html#SEC_Top GNU Wget manual]<br />
<br />
[[Category:Tools]]</div>Bbothttps://wiki.archiveteam.org/index.php?title=Wget&diff=5707Wget2011-06-02T05:10:47Z<p>Bbot: added note on compression</p>
<hr />
<div>[http://www.gnu.org/software/wget/ GNU Wget] is a free utility for non-interactive download of files from the Web. Using Wget, it is possible to grab a large chunk of data, or mirror an entire website with it's complete directory tree using a single command. In the tool belt of the renegade archivist, Wget tends to get an awful lot of use. (Note: Some people prefer to use [http://curl.haxx.se/ cURL]. If it can back up data, it's useful).<br />
<br />
This guide will not attempt to explain all possible uses of Wget; rather, this is intended to be a concise intro to using Wget, specifically geared towards using the tool to archive data such as podcasts, PDF documents, or entire websites. Issues such as using Wget to circumvent user-agent checks, or robots.txt restrictions, will be outlined as well.<br />
<br />
== Mirroring a website ==<br />
<br />
When you run something like this:<br />
<pre><br />
wget http://icanhascheezburger.com/<br />
</pre><br />
...Wget will just grab the first page it hits, usually something like index.html. If you give it the -m flag:<br />
<pre><br />
wget -m http://icanhascheezburger.com/<br />
</pre><br />
...then Wget will happily slurp down anything within reach of its greedy claws, putting files in a complete directory structure. Go make a sandwich or something.<br />
<br />
You'll probably want to pair -m with -c (which tells Wget to continue partially-complete downloads) and -b (which tells wget to fork to the background, logging to wget-log).<br />
<br />
If you want to grab everything in a specific directory - say, the SICP directory on the mitpress web site - use the -np flag:<br />
<pre><br />
wget -mbc -np http://mitpress.mit.edu/sicp<br />
</pre><br />
<br />
This will tell Wget to not go up the directory tree, only downwards.<br />
<br />
== User-agents and robots.txt ==<br />
<br />
By default, Wget plays nicely with a website's robots.txt. This can lead to situations where Wget won't grab anything, since the robots.txt disallows Wget.<br />
<br />
To avoid this: first, you should try using the --user-agent option:<br />
<pre><br />
wget -mbc --user-agent="" http://website.com/<br />
</pre><br />
This instructs Wget to not send any user agent string at all. Another option for this is:<br />
<pre><br />
wget -mbc -e robots=off http://website.com/<br />
</pre><br />
...which tells Wget to ignore robots.txt directives altogether.<br />
<br />
You can put --wait 1 to add a delay, to be nice with server.<br />
<br />
== Tricks and Traps ==<br />
<br />
* A standard methodology to prevent scraping of websites is to block access via user agent string. Wget is a good web citizen and identifies itself. Renegade archivists are not good web citizens in this sense. The '''--user-agent''' option will allow you to act like something else.<br />
* Some websites are actually aggregates of multiple machines and subdomains, working together. (For example, a site called ''dyingwebsite.com'' will have additional machines like ''download.dyingwebsite.com'' or ''mp3.dyingwebsite.com'') To account for this, add the following options: '''-H -Ddomain.com'''<br />
<br />
== Compression ==<br />
<br />
Wget doesn't use compression by default! This can make a big difference when you're downloading easily compressible data, like human-language HTML text, but doesn't help at all when downloading material that is already compressed, like JPEG or PNG files. To enable compression, use:<br />
<pre><br />
wget --header="accept-encoding: gzip"<br />
</pre><br />
<br />
== Wget for Windows ==<br />
Windows users can download [http://gnuwin32.sourceforge.net/packages/wget.htm Wget for Windows], part of the [http://gnuwin32.sourceforge.net/ GNUWin32 project]. After installation, you will probably want to add it to your Path so that you can run it directly from the command prompt instead of specifying its absolute file path (i.e. "wget" instead of "C:\Program Files\GNUWin32\bin\wget.exe").<br />
<br />
These are the instructions for Windows 7 users. Prior versions should be relatively similar.<br />
#Install Wget<br />
#Right-click My Computer and select Properties<br />
#Select Advanced System Settings from the left<br />
#Click the Environment Variables button in the bottom-right corner<br />
#Under System Variables, find the Path variable and click Edit<br />
#Carefully insert the path to Wget's bin folder followed by a semi-colon. Getting this wrong could cause some nasty system problems<br />
#*Your Wget path should be inserted like this: C:\Program Files\GnuWin32\bin;<br />
#When done, click OK through all the dialog boxes you opened<br />
#The changes should apply immediately under Windows 7. Older versions may require a reboot<br />
#To test the settings, open a command prompt and enter "wget"<br />
<br />
== Parallel downloading ==<br />
http://keramida.wordpress.com/2010/01/19/parallel-downloads-with-python-and-gnu-wget/<br />
<br />
== Essays and Reading on the Use of WGET ==<br />
<br />
* [http://lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php Mastering WGET] by Gina Trapani<br />
* [http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html Using Wget or curl to download web sites for archival] by Phil Sung<br />
* [http://linux.about.com/od/commands/l/blcmdl1_wget.htm about.com Wget] list of commands<br />
* [http://www.delorie.com/gnu/docs/wget/wget.html#SEC_Top GNU Wget manual]<br />
<br />
[[Category:Tools]]</div>Bbothttps://wiki.archiveteam.org/index.php?title=WebCite&diff=4650WebCite2011-04-27T23:49:47Z<p>Bbot: detailed</p>
<hr />
<div>{{Infobox project<br />
| title = WebCite<br />
| image = WebCite 1303510291663.png<br />
| description = <br />
| URL = {{url|1=http://www.webcitation.org/}}<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''WebCite''' is a archiving site.<br />
<br />
Unlike most of the targets on AT's hitlist, Webcite is a nonprofit consortium of about a hundred scholarly journals and universities, as well as Wikipedia and Archive.org; with a specific mandate to preserve submitted content indefinitely. Webcite is so bulletproof that Archive Team uses it on this very page, and elsewhere, to archive content.<br />
<br />
But Archive Team trusts no man nor consortium! Fortunately, Webcite has a convenient online form where you can [http://www.webcitation.org/mailform apply to host a mirror.]<br />
<br />
== External links ==<br />
* {{url|1=http://www.webcitation.org/|2=WebCite}}<br />
* {{url|1=http://www.webcitation.org/faq}}<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Everything2&diff=2236Everything22011-01-17T14:54:03Z<p>Bbot: </p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{rescued}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
The torrent of the first two million nodes is [http://thepiratebay.org/torrent/6108859 here,] (magnet:?xt=urn:btih:3da079e5932acdacfdf183ff6de1698b6f1a24b7&dn=Everything2+%280-2M%29&tr=http%3A%2F%2Fdenis.stalker.h3q.com%3A6969%2Fannounce) and a blog post on the tedious technical details is [http://bbot.org/blog/archives/2011/01/17/more_fun_with_wget/ here].<br />
<br />
Since E2 is an operational site, additional backups will be made as more content is added.<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Everything2&diff=2235Everything22011-01-17T14:53:21Z<p>Bbot: </p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{complete}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
The torrent of the first two million nodes is [http://thepiratebay.org/torrent/6108859 here,] (magnet:?xt=urn:btih:3da079e5932acdacfdf183ff6de1698b6f1a24b7&dn=Everything2+%280-2M%29&tr=http%3A%2F%2Fdenis.stalker.h3q.com%3A6969%2Fannounce) and a blog post on the tedious technical details is [http://bbot.org/blog/archives/2011/01/17/more_fun_with_wget/ here].<br />
<br />
Since E2 is an operational site, additional backups will be made as more content is added.<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Everything2&diff=2234Everything22011-01-17T14:52:42Z<p>Bbot: Added torrent link to completed archive.</p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{complete}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
The torrent of the first two million nodes is [[http://thepiratebay.org/torrent/6108859 here,]] (magnet:?xt=urn:btih:3da079e5932acdacfdf183ff6de1698b6f1a24b7&dn=Everything2+%280-2M%29&tr=http%3A%2F%2Fdenis.stalker.h3q.com%3A6969%2Fannounce) and a blog post on the tedious technical details is [[http://bbot.org/blog/archives/2011/01/17/more_fun_with_wget/ here]].<br />
<br />
Since E2 is an operational site, additional backups will be made as more content is added.<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Everything2&diff=2076Everything22010-12-31T15:25:59Z<p>Bbot: updated</p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
Since everything2.com is both nonprofit, and shows no signs of shutting down in the near future, I limited wget to one process, and one page a second. Since I started on December 21st, I should be at the two million mark by January 13th. As of 1809h on the 22nd, I'm at node 71000.<br />
<br />
02010/12/31 0725h: node 920964<br />
<br />
Since everything2.com URLs are of the form http://everything2.com/index.pl?node_id=NUMBER, it's fairly easy to increment NUMBER, and thus download everything on the site without having to follow links.<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Projects&diff=2052Projects2010-12-23T05:28:48Z<p>Bbot: added e2</p>
<hr />
<div>Here's where Archive Teamsters can list the '''projects''' they are currently working on and organize new projects.<br />
<br />
== Active Projects ==<br />
<br />
* '''[[User:Jscott|Jason Scott]]''' is running [http://www.textfiles.com Textfiles.com] and archiving a ton of things.<br />
* '''[[User:ip2k|seanp2k]]''' is running [http://somaseek.com somaseek.com] and tracking all the song history for all of the internet radio stations on [http://somafm.com somafm.com] since March 2010.<br />
* '''[[User:Ross|Ross]]''' is interviewing the sites of 2008.<br />
* '''[[User:LesOrchard|l.m.orchard]]''' is starting work on some self-hosted web apps that will migrate and archive from other sites. (ie. [http://github.com/lmorchard/friendfeedarchiver FriendFeed], [http://github.com/lmorchard/memex/ Delicious])<br />
* '''[[starwars.yahoo.com]]''' taken down in Dec, 2009<br />
* '''[[User:Sungo|sungo]]''' is archiving etherpad.<br />
* '''[[User:Sdboyd|Scott]]''' has archived the [http://www.infoanarchy.org Infoanarchy wiki] site. -- The archive is complete and is at: [http://mirrors.sdboyd56.com/infoanarchy/ Infoanarchy wiki '''archive''']. A few pages are occasionally updated on the Infoanarchy wiki, so they will be updated on the archive every couple of months.<br />
* '''[[User:Tsp|Tsp]]''' is attempting to archive the stories from fanfiction.net and fictionpress.<br />
* '''[[User:EmuWikiAdmin|EmuWikiAdmin]]''' is creating a collection of all emulators, emulator documents, and hardware information that exists, regrouped in a referenced database. [http://www.emuwiki.com EmuWiki.com - The Encyclopedia of Emulation].<br />
* '''[[User:Emijrp|emijrp]]''' is attempting to archive dumps and related info about [[wikis]]: [[Wikipedia]], [[Wikia]], [[Citizendium]], some minor encyclopedias like [[Enciclopedia Libre]] or [[Wikanda]], etc. Also, downloading albums from [[Jamendo]]. You can know more about his projects in his userpage.<br />
* '''[[User:Bbot|bbot]]''' is archiving [[everything2]].<br />
<br />
== Ideas for Projects ==<br />
<br />
* Suggestion: An archive of .gif and .swf preloaders? [[User:Kuro|Kuro]] 19:49, 29 December 2009 (UTC)<br />
**We can extract all the .gif files from the GeoCities archive and compare them using md5sum to discard dupes. [[User:Emijrp|Emijrp]] 19:58, 21 December 2010 (UTC)<br />
* '''Set up''' an FTP hub which AT members can access and up/down finished projects.<br />
* Track the 100+ top [[twitter]] feeds, as designated by one of these idiot Twitter grading sites, and back up on a regular basis the top twitter people, for posterity.<br />
* '''[http://www.groklaw.net/ Groklaw]''' has a [http://www.groklaw.net/article.php?story=20090105033126835 project proposal] that we could help with. - [[User:Jscott|Jason]]<br />
* '''Archive''' the shutdown announcement pages on dead sites.<br />
* '''RSS Feed''' with death notices. - [[User:Jscott|Jason]]<br />
* '''Twitter profile''' might be a good way to broadcast new site obituaries. - psicom<br />
* '''[[TinyURL]]''' and similar services, scraping/backup - [[User:scumola|Steve]]<br />
** highlight services that at least allow exporting data ([[Diigo]] that I know of). Next "best" - services that have registeration and enable viewing your URL / saving them by e.g. saving as HTML ([[tr.im]]). Etc. --[[User:Jaakkoh|Jaakkoh]] 05:39, 4 April 2009 (UTC)<br />
* '''[http://symphony21.com/ Symphony]''' could [http://nick-dunn.co.uk/article/symphony-as-a-data-preservation-utility/ potentially be used] for archiving structured XML/RSS feeds to a relational database - [[User:nickdunn|Nick]]<br />
* '''A Firefox plugin''' for redirecting users to our archive when they request a site that's been rescued. - ???<br />
* As reported on [http://www.boingboing.net/2010/04/29/all-of-gopherspace-a.html boingboing] by Cory Doctorow, all of [[Gopher]]space - scraped in 2007 - needs an archive home. Anybody have 15GB of spare hosted-server space for this project?<br />
::I do, please contact me at admin@emuwiki.com to tell me what to do. [[User:EmuWikiAdmin|EmuWikiAdmin]] 15:17, 2 May 2010 (UTC)<br />
::They are added to iBiblio http://torrent.ibiblio.org/search.php?query=gopher&submit=search [[User:Emijrp|Emijrp]] 11:34, 2 November 2010 (UTC)<br />
<br />
== Finished Projects ==<br />
<br />
* [[User:Jscott|Jason]] founded the Archive Team.<br />
* [[User:Bbot|bbot]] made [http://thepiratebay.org/user/archiveteam/ an archiveteam TPB user]. Get the password from him or Jason. (Not really a ''project'', per se.)<br />
<br />
== Tools ==<br />
* [[Software]]<br />
* [[httrack options]]<br />
<br />
== See also ==<br />
* [[Archives]]<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Talk:Everything2&diff=2051Talk:Everything22010-12-23T05:25:39Z<p>Bbot: Created page with 'I'm downloading e2 slowly, to keep from eating all their bandwidth, and because this is not a time critical project. So there's really no need for anyone else to help out, unless…'</p>
<hr />
<div>I'm downloading e2 slowly, to keep from eating all their bandwidth, and because this is not a time critical project. So there's really no need for anyone else to help out, unless they block all my IPs.[[User:Bbot|Bbot]] 05:25, 23 December 2010 (UTC)</div>Bbothttps://wiki.archiveteam.org/index.php?title=Everything2&diff=2050Everything22010-12-23T02:13:48Z<p>Bbot: Added navigation box</p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
Since everything2.com is both nonprofit, and shows no signs of shutting down in the near future, I limited wget to one process, and one page a second. Since I started on December 21st, I should be at the two million mark by January 13th. As of 1809h on the 22nd, I'm at node 71000.<br />
<br />
Since everything2.com URLs are of the form http://everything2.com/index.pl?node_id=NUMBER, it's fairly easy to increment NUMBER, and thus download everything on the site without having to follow links.<br />
<br />
{{Navigation box}}</div>Bbothttps://wiki.archiveteam.org/index.php?title=Everything2&diff=2049Everything22010-12-23T02:11:54Z<p>Bbot: Created page</p>
<hr />
<div>{{Infobox project<br />
| title = EVERYTHING2.COM<br />
| description = <br />
| URL = http://everything2.com/<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
}}<br />
<br />
'''EVERYTHING2.COM''' is a kind of proto-wiki, dating from 1999. Never as popular as [[wikipedia]], it still has about 2 million pages, many of which show up nowhere else in the internet.<br />
<br />
Since everything2.com is both nonprofit, and shows no signs of shutting down in the near future, I limited wget to one process, and one page a second. Since I started on December 21st, I should be at the two million mark by January 13th. As of 1809h on the 22nd, I'm at node 71000.<br />
<br />
Since everything2.com URLs are of the form http://everything2.com/index.pl?node_id=NUMBER, it's fairly easy to increment NUMBER, and thus download everything on the site without having to follow links.</div>Bbothttps://wiki.archiveteam.org/index.php?title=Wget&diff=1753Wget2010-12-10T14:23:07Z<p>Bbot: -e robots, not -erobots</p>
<hr />
<div>[http://www.gnu.org/software/wget/ GNU Wget] is a free utility for non-interactive download of files from the Web. Using wget, it is possible to grab a large chunk of data, or mirror an entire website with complete directory tree, with a single command. In the tool belt of the renegade archivist, Wget tends to get an awful lot of use. (Note: Some people prefer to use [http://curl.haxx.se/ cURL])<br />
<br />
This guide will not attempt to explain all possible uses of wget; rather, this is intended to be a concise intro to using wget, specifically geared towards using the tool to archive data such as podcasts, pdfs, or entire websites. Issues such as using wget to circumvent user-agent checks, or robots.txt restrictions, will be outlined as well.<br />
<br />
== Mirroring a website ==<br />
<br />
When you run something like this:<br />
<pre><br />
wget http://icanhascheezburger.com/<br />
</pre><br />
...wget will just grab the first page it hits, usually something like index.html. If you give it the -m flag:<br />
<pre><br />
wget -m http://icanhascheezburger.com/<br />
</pre><br />
...then wget will happily slurp down anything within reach of its greedy claws, putting files in a complete directory structure. Go make a sandwich or something.<br />
<br />
You'll probably want to pair -m with -c (which tells wget to continue partially-complete downloads) and -b (which tells wget to fork to the background, logging to wget-log).<br />
<br />
If you want to grab everything in a specific directory - say, the SICP directory on the mitpress web site - use the -np flag:<br />
<pre><br />
wget -mbc -np http://mitpress.mit.edu/sicp<br />
</pre><br />
<br />
This will tell wget to not go up the directory tree, only downwards.<br />
<br />
== User-agents and robots.txt ==<br />
<br />
By default, wget plays nicely with a website's robots.txt. This can lead to situations where wget won't grab anything, since the robots.txt disallows wget.<br />
<br />
To avoid this: first, you should try using the --user-agent option:<br />
<pre><br />
wget -mbc --user-agent="" http://website.com/<br />
</pre><br />
This instructs wget to not send any user agent string at all. Another option for this is:<br />
<pre><br />
wget -mbc -e robots=off http://website.com/<br />
</pre><br />
...which tells wget to ignore robots.txt directives altogether.<br />
<br />
You can put --wait 1 to add a delay, to be nice with server.<br />
<br />
== Tricks and Traps ==<br />
<br />
* A standard methodology to prevent scraping of websites is to block access via user agent string. Wget is a good web citizen and identifies itself. Renegade archivists are not good web citizens in this sense. The '''--user-agent''' option will allow you to act like something else.<br />
* Some websites are actually aggregates of multiple machines and subdomains, working together. (For example, a site called ''dyingwebsite.com'' will have additional machines like ''download.dyingwebsite.com'' or ''mp3.dyingwebsite.com'') To account for this, add the following options: '''-H -Ddomain.com'''<br />
<br />
== Parallel downloading ==<br />
http://keramida.wordpress.com/2010/01/19/parallel-downloads-with-python-and-gnu-wget/<br />
<br />
== Essays and Reading on the Use of WGET ==<br />
<br />
* [http://lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php Mastering WGET] by Gina Trapani<br />
* [http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html Using wget or curl to download web sites for archival] by Phil Sung<br />
* [http://linux.about.com/od/commands/l/blcmdl1_wget.htm about.com Wget] list of commands<br />
<br />
[[Category:Tools]]</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=1248User:Bbot2009-11-04T17:18:43Z<p>Bbot: Dicking around</p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.<br />
<br />
==Who are you?==<br />
<br />
I just said I'm bbot, man. Pay attention.<br />
<br />
==Yeah, well, what's your public key?==<br />
<br />
My [http://bbot.org/publickey.asc public key]'s fingerprint is:<br />
<br />
pub 4096R/5503C075 2009-05-11 Samuel Bierwagen <bbot@bbot.org><br />
Primary key fingerprint: FCDF 0C25 8ACD 7C3F 36D2 72C8 6AA0 B5B5 5503 C075<br />
<br />
If that link is nonfunctional, you can obtain it from any quality keyserver.</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=774User:Bbot2009-05-11T22:26:26Z<p>Bbot: fff</p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.<br />
<br />
My [http://bbot.org/publickey.asc public key]'s fingerprint is:<br />
<br />
pub 4096R/5503C075 2009-05-11 Samuel Bierwagen <bbot@bbot.org><br />
Primary key fingerprint: FCDF 0C25 8ACD 7C3F 36D2 72C8 6AA0 B5B5 5503 C075<br />
<br />
If that link is nonfunctional, you can obtain it from any quality keyserver.</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=773User:Bbot2009-05-11T22:23:09Z<p>Bbot: new key</p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.<br />
<br />
My [http://bbot.org/publickey.asc public key]'s fingerprint is:<br />
<br />
pub 4096R/5503C075 2009-05-11 Samuel Bierwagen <bbot@bbot.org><br />
Primary key fingerprint: FCDF 0C25 8ACD 7C3F 36D2 72C8 6AA0 B5B5 5503 C075<br />
<br />
If that link is nonfunctional, you can obtain it from any quality keyserver.</div>Bbothttps://wiki.archiveteam.org/index.php?title=TEMP:Columned_frontpage_prototype&diff=758TEMP:Columned frontpage prototype2009-05-01T17:37:04Z<p>Bbot: Abridging.</p>
<hr />
<div><center><br />
===History is our future.===<br />
</center><br />
<br />
<center><br />
''And we've been trashing our history''<br />
</center><br />
<br />
<!-- Current events. --><br />
{| id="mp-upper" style="margin:0 0 0 0; background:none;"<br />
| class="MainPageBG" style="width:45%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |<br />
{| id="mp-left" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5fffa;"<br />
! <h2 id="mp-tfa-h2" style="margin:0; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">What's going down</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-tfa">{{Archiveteam:Current events}}</div><br />
|}<br />
<!-- Site information. --><br />
| class="MainPageBG" style="width:55%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|<br />
{| id="mp-right" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5faff;"<br />
! <h2 id="mp-itn-h2" style="margin:0; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">About Archive Team</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-itn"><br />
The Archive Team saves shit that companies abandon.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<br />
* [[Deathwatch]] is a list of dying sites.<br />
<br />
* [[Fire Drill]] are sites that ''appear'' healthy.<br />
<br />
* [[Who We Are]] and how you can join us.<br />
<br />
* [[Projects]] is a list of Archive Team projects.<br />
<br />
* [[Philosophy]] describes the ideas behind our work.<br />
<br />
''DIY Data Rescue''<br />
<br />
* [[Introduction|The Introduction]] is an overview of archiving methods.<br />
<br />
* The [[Frequently Asked Questions|FAQ]] is where we answer common questions<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] is all about the tools you need to save your stuff. <br />
<br />
* [[Formats]] concerns formats, and how they die.<br />
<br />
* [[Storage Media]] is about where, what, and how to get it.<br />
<br />
* [[Recommended Reading]] links to other sites.</div><br />
|}<br />
|}<br />
<!-- Decolumnifier --><br />
<div id="mp-other" style="padding-top:4px; padding-bottom:2px;"><br />
[[Image:Archiveteam.jpg|center|400px]]<br />
</div></div>Bbothttps://wiki.archiveteam.org/index.php?title=TEMP:Columned_frontpage_prototype&diff=757TEMP:Columned frontpage prototype2009-05-01T17:13:58Z<p>Bbot: Betterfying. Also: I should be kept away from web colors.</p>
<hr />
<div><center><br />
===History is our future.===<br />
</center><br />
<br />
<center><br />
''And we've been trashing our history''<br />
</center><br />
<br />
<!-- Current events. --><br />
{| id="mp-upper" style="margin:0 0 0 0; background:none;"<br />
| class="MainPageBG" style="width:45%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |<br />
{| id="mp-left" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5fffa;"<br />
! <h2 id="mp-tfa-h2" style="margin:0; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">What's going down</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-tfa">{{Archiveteam:Current events}}</div><br />
|}<br />
<!-- Site information. --><br />
| class="MainPageBG" style="width:55%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|<br />
{| id="mp-right" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5faff;"<br />
! <h2 id="mp-itn-h2" style="margin:0; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">About Archive Team</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-itn"><br />
The Archive Team saves shit that companies abandon.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<br />
* [[Who We Are]] and how you can join us.<br />
<br />
* [[Deathwatch]] is where we track the walking dead. Websites, not zombies.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is to keep track of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
''DIY Data Rescue''<br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] is all about the tools you need to save your stuff. <br />
<br />
* [[Formats]] concerns formats, and how they die.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.</div><br />
|}<br />
|}<br />
<!-- Decolumnifier --><br />
<div id="mp-other" style="padding-top:4px; padding-bottom:2px;"><br />
[[Image:Archiveteam.jpg|center|400px]]<br />
</div></div>Bbothttps://wiki.archiveteam.org/index.php?title=Archiveteam:Current_events&diff=756Archiveteam:Current events2009-05-01T16:55:36Z<p>Bbot: updating</p>
<hr />
<div>*'''May 1st:''' We got [http://tech.slashdot.org/article.pl?sid=09/04/27/2252227 slashdotted!] This has roughly doubled the number of people in the [[IRC_Channel]], which you should totally join. There is now a [[Geocities_FAQ|FAQ]] up about the [[Geocities|Geocities project.]]<br />
<br />
*'''April 28th:''' [[Geocities]] is dying!</div>Bbothttps://wiki.archiveteam.org/index.php?title=User_talk:Jscott&diff=755User talk:Jscott2009-05-01T16:50:08Z<p>Bbot: Front page prototype, more whining.</p>
<hr />
<div>Jason, figured might initiate dialogue regarding design here. Of course we'd like someone professional to do this, but it might be good to look into what's out there as well.<br />
<br />
[http://meta.wikimedia.org/wiki/Gallery_of_user_styles some wiki styles]<br />
<br />
--[[User:Ross|Ross]] 16:37, 9 January 2009 (UTC)<br />
<br />
== testing how this works? ==<br />
<br />
confused. <br />
<br />
anyways, here's what my students said as they walked by me editing the wiki today at lunch:<br />
<br />
"That' looks boring."<br />
<br />
and<br />
<br />
"Is that Wikipedia?"<br />
<br />
HAHA. we should definitely get on the redesign.<br />
<br />
== 500 Internal Server Error ==<br />
<br />
What the fuck is with all of them? ''Seriously''. [[User:Bbot|Bbot]] 01:40, 29 April 2009 (UTC)<br />
<br />
Good question. I will see if the logs say anything. --[[User:Jscott|Jscott]] 15:29, 29 April 2009 (UTC)<br />
<br />
== Front page prototype, more whining ==<br />
<br />
I [[TEMP:Columned frontpage prototype|made a columned version of the front page]]. The about the site column needs to be abridged a bit more, and the current events column is actually [[Archiveteam:Current_events]], and needs to be greatly expanded. It also is a ''direct'' copy of the wikipedia front page, and could stand to be a bit less pastel-ly.<br />
<br />
But it works, and is already more elegant at conveying information than the current front page.<br />
<br />
Also! I get 500 errors whenever I try to add a vanity picture to my userpage. My enormous ego demands an end to these errors! --[[User:Bbot|Bbot]] 16:50, 1 May 2009 (UTC)</div>Bbothttps://wiki.archiveteam.org/index.php?title=TEMP:Columned_frontpage_prototype&diff=754TEMP:Columned frontpage prototype2009-05-01T16:42:11Z<p>Bbot: Right, that's better.</p>
<hr />
<div><!-- Current events. --><br />
{| id="mp-upper" style="margin:0 0 0 0; background:none;"<br />
| class="MainPageBG" style="width:45%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |<br />
{| id="mp-left" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5fffa;"<br />
! <h2 id="mp-tfa-h2" style="margin:0; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">What's going down</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-tfa">{{Archiveteam:Current events}}</div><br />
|}<br />
<!-- Site information. --><br />
| class="MainPageBG" style="width:55%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|<br />
{| id="mp-right" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5faff;"<br />
! <h2 id="mp-itn-h2" style="margin:0; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">In the news</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-itn">History is our future.<br />
<br />
''And we've been trashing our history''<br />
<br />
The Archive team is dedicated to preserving the contents of "web 2.0" sites that shut down without returning their users data.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is to keep track of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
''DIY Data Rescue''<br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions</div><br />
|}<br />
|}<br />
<!-- Decolumnifier --><br />
<div id="mp-other" style="padding-top:4px; padding-bottom:2px;"><br />
[[Image:Archiveteam.jpg|center|400px]]<br />
</div></div>Bbothttps://wiki.archiveteam.org/index.php?title=TEMP:Columned_frontpage_prototype&diff=753TEMP:Columned frontpage prototype2009-05-01T16:32:54Z<p>Bbot: Trying to hack together a columned version of the front page</p>
<hr />
<div><!-- Current events. --><br />
{| id="mp-upper" style="margin:0 0 0 0; background:none;"<br />
| class="MainPageBG" style="width:55%; border:1px solid #cef2e0; background:#f5fffa; vertical-align:top; color:#000;" |<br />
{| id="mp-left" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5fffa;"<br />
! <h2 id="mp-tfa-h2" style="margin:0; background:#cef2e0; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; text-align:left; color:#000; padding:0.2em 0.4em;">What's going down</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-tfa">{{Archiveteam:Current events}}</div><br />
<br />
<!-- Site information. --><br />
| class="MainPageBG" style="width:45%; border:1px solid #cedff2; background:#f5faff; vertical-align:top;"|<br />
{| id="mp-right" cellpadding="2" cellspacing="5" style="width:100%; vertical-align:top; background:#f5faff;"<br />
! <h2 id="mp-itn-h2" style="margin:0; background:#cedff2; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; text-align:left; color:#000; padding:0.2em 0.4em;">In the news</h2><br />
|-<br />
| style="color:#000;" | <div id="mp-itn">History is our future.<br />
<br />
''And we've been trashing our history''<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is to keep track of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
''DIY Data Rescue''<br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions</div><br />
<br />
<br />
<center><br />
=== EVERYONE IS ALL HOT AND BOTHERED BY [[Geocities|GEOCITIES]]. CLICK [[Geocities|HERE]] FOR INFO. ===<br />
</center><br />
<br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<br />
===What's here===<br />
<br />
''Archive Team''<br />
<br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is to keep track of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
''DIY Data Rescue''<br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions<br />
<br />
The site is still very new. Please be patient with the missing bits or help us fill them in.<br />
<br />
<br />
[[Image:Archiveteam.jpg|center|300px]]</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=752User:Bbot2009-05-01T16:21:45Z<p>Bbot: </p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC_Channel&diff=546IRC Channel2009-04-29T15:55:18Z<p>Bbot: formatting</p>
<hr />
<div>The official archive team IRC channel is #archiveteam @ irc.efnet.org.<br />
<br />
Starring:<br />
*[[User:Jscott|Jscott]] as @SketchCow!<br />
*[[User:LesOrchard|LesOrchard]] as @lmorchard!<br />
*[[User:Morbus_Iff|Morbus_Iff]] as @MorbusIff!<br />
*[[User:Bbot|bbot]] as bierwagen, since EFnet services stole "bbot"!<br />
*[[User:Cassilda|Cassilda]] as Cassilda!<br />
*[[User:Liam|Liam]] as Inky!<br />
*[[User:Scumola|Scumola]] as swebbs 1 through 3!<br />
*[[User:Soult|Soult]] as soultcer!<br />
*[[User:Mattl|Mattl]] as mattl!<br />
*[[User:geneb|geneb]] as geneb!</div>Bbothttps://wiki.archiveteam.org/index.php?title=User_talk:Jscott&diff=538User talk:Jscott2009-04-29T01:40:18Z<p>Bbot: Whining</p>
<hr />
<div>Jason, figured might initiate dialogue regarding design here. Of course we'd like someone professional to do this, but it might be good to look into what's out there as well.<br />
<br />
[http://meta.wikimedia.org/wiki/Gallery_of_user_styles some wiki styles]<br />
<br />
--[[User:Ross|Ross]] 16:37, 9 January 2009 (UTC)<br />
<br />
== testing how this works? ==<br />
<br />
confused. <br />
<br />
anyways, here's what my students said as they walked by me editing the wiki today at lunch:<br />
<br />
"That' looks boring."<br />
<br />
and<br />
<br />
"Is that Wikipedia?"<br />
<br />
HAHA. we should definitely get on the redesign.<br />
<br />
== 500 Internal Server Error ==<br />
<br />
What the fuck is with all of them? ''Seriously''. [[User:Bbot|Bbot]] 01:40, 29 April 2009 (UTC)</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=537User:Bbot2009-04-29T01:18:19Z<p>Bbot: </p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.<br />
[[Image:Bbot.jpg]]<br />
Look at that smiley son of a bitch.</div>Bbothttps://wiki.archiveteam.org/index.php?title=File:Bbot.jpg&diff=536File:Bbot.jpg2009-04-29T01:14:55Z<p>Bbot: That's bbot, right there.</p>
<hr />
<div>That's bbot, right there.</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC:bierwagen&diff=535IRC:bierwagen2009-04-29T00:59:17Z<p>Bbot: Suggested by Sevens.</p>
<hr />
<div>#REDIRECT [[User:Bbot]]</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC_Channel&diff=524IRC Channel2009-04-28T18:21:10Z<p>Bbot: more</p>
<hr />
<div>The official archive team IRC channel is #archiveteam @ irc.efnet.org.<br />
<br />
Starring:<br />
*[[User:Jscott|Jscott]] as @SketchCow!<br />
*[[User:LesOrchard|LesOrchard]] as @lmorchard!<br />
*[[User:Morbus_Iff|Morbus_Iff]] as @MorbusIff!<br />
*[[User:Bbot|bbot]] as bierwagen, since EFnet services stole "bbot"!<br />
*[[User:Cassilda|Cassilda]] as Cassilda!<br />
*[[User:Liam|Liam]] as Inky!<br />
*[[User:Scumola|Scumola]] as swebbs 1 through 3!<br />
*[[User:Soult|Soult]] as soultcer!</div>Bbothttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=523Main Page2009-04-28T18:20:06Z<p>Bbot: </p>
<hr />
<div>[[Image:Archiveteam.jpg|center|300px]]<br />
<br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
''And we've been trashing our history''<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things.<br />
<br />
===What's here===<br />
<br />
''Archive Team''<br />
<br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is to keep track of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
''DIY Data Rescue''<br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions<br />
<br />
The site is still very new. Please be patient with the missing bits or help us fill them in.</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC_Channel&diff=522IRC Channel2009-04-28T18:17:04Z<p>Bbot: added more nicks</p>
<hr />
<div>The official archive team IRC channel is #archiveteam @ irc.efnet.org.<br />
<br />
Starring:<br />
*[[User:Jscott|Jscott]] as @SketchCow!<br />
*[[User:LesOrchard|LesOrchard]] as @lmorchard!<br />
*[[User:Morbus_Iff|Morbus_Iff]] as @MorbusIff!<br />
*[[User:Bbot|bbot]] as bierwagen, since EFnet services stole "bbot"!<br />
*[[User:Cassilda|Cassilda]] as Cassilda!<br />
*[[User:Liam|Liam]] as Inky!<br />
*[[User:Scumola|Scumola]] as swebbs 1 through 3!</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC_Channel&diff=521IRC Channel2009-04-28T18:10:36Z<p>Bbot: all right, here we go</p>
<hr />
<div>The official archive team IRC channel is #archiveteam @ irc.efnet.org.<br />
<br />
Starring:<br />
*[[User:Jscott|Jscott]] as @SketchCow!<br />
*[[User:LesOrchard|LesOrchard]] as @lmorchard!<br />
*[[User:Bbot|bbot]] as bierwagen, since EFnet services stole "bbot"!</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC_Channel&diff=520IRC Channel2009-04-28T18:00:26Z<p>Bbot: frigging mediawiki formatting</p>
<hr />
<div>The official archive team IRC channel is #archiveteam @ irc.efnet.org.<br />
<br />
Starring:<br />
[[User:Jscott]] as @SketchCow<br />
[[User:LesOrchard]] as @lmorchard<br />
[[User:Bbot User:bbot]] as bierwagen, since EFnet services stole "bbot"</div>Bbothttps://wiki.archiveteam.org/index.php?title=IRC_Channel&diff=519IRC Channel2009-04-28T17:58:48Z<p>Bbot: Added users, address</p>
<hr />
<div>The official archive team IRC channel is #archiveteam @ irc.efnet.org.<br />
<br />
Starring:<br />
[User:Jscott] as @SketchCow<br />
[User:LesOrchard] as @lmorchard<br />
[User:Bbot User:bbot] as bierwagen, since EFnet services stole "bbot"</div>Bbothttps://wiki.archiveteam.org/index.php?title=Deathwatch&diff=518Deathwatch2009-04-28T17:44:48Z<p>Bbot: added snarking</p>
<hr />
<div>__NOTOC__<br />
The Deathwatch is meant to be a central indicator of websites and networks that are shutting down, or to serve as an indicator of what happened to particular sites that shut down quickly. New sites should be added in chronological order, newest death date first. Forward-looking death dates should be added to the first list only. Sites large enough to warrant additional information will receive a dedicated page, linked from here.<br />
<br />
=== Pining for the Fjords ===<br />
<br />
* Shock! Repeat Offender '''[[Yahoo]]''' has announced that it will close '''[[Geocities]]''' "later this year...We'll send you more details this summer." [http://help.yahoo.com/l/us/yahoo/geocities/geocities-05.html]<br />
* '''Microsoft Encarta''', the online encyclopedia with a 15+ year history, is being shut down. The US version will shut down on October 31, 2009 and the Japanese version on December 31, 2009. [http://www.reuters.com/article/CMPTRS/idUSLV28230720090331] <br />
* '''[http://www.coghead.com Coghead]''', " a web-based service for building and hosting custom online database applications and a software as a platform ‘utility computing’ company", announced it had closed up on February 20, 2009, and that the site would go down permanently on April 20, 2009. [http://blogs.zdnet.com/collaboration/?p=349]. <br />
* '''[http://www.videosift.com Videosift]''' had a combination database and backup failure, losing: "All votes, ever. All member usernames who registered later than around 12 months ago. All member rankings. Your member profile info (e.g., bio, favorite sift, etc.), if any. All activity that happened on the site yesterday, March 11." This is unlikely to kill the site, but an awful lot of data was lost.<br />
* Going to call this one before it even starts, friends: [https://www.legacylocker.com/ Legacy Locker] promises lifetime control of your data and return of your data to loved ones for just $300 for "lifetime", or $30/year. [http://www.washingtonpost.com/wp-dyn/content/article/2009/03/10/AR2009031001211.html] Archive Team says to just say No.<br />
* Archive Team is declaring '''[[Yahoo]]''' no longer a trustable entity. Prove us different, Yahooligans.<br />
[[Image:HP upline goes offline.jpg|right|300px|Did we say upline? We meant offline.]]<br />
* It doesn't get more ironic than this: '''[https://www.upline.com/ Upline]''', a HP-owned online backup service, is being shut down.[http://news.cnet.com/8301-17939_109-10173136-2.html?part=rss&subj=news&tag=2547-1_3-0-5] ''They almost immediately turned off the backup process,'' and then announced all your restorable data would go offline on March 31, roughly 30 days after announcement. Surprise!<br />
* '''[[Yahoo_Briefcase|Yahoo Briefcase]]''', a positively ancient site run by Yahoo that provided you with 25 free megabytes of storage space for your junk, sent a mail to what were likely years-old contact addresses to tell them they had a little more than a month to get their files out, March 30, 2009. After that, the files would be deleted. What, Yahoo doesn't have a spare memory stick to store what must be the amount of files in this service for the next year?<br />
* '''[http://seattlepi.nwsource.com/ The Seattle Post-Intelligencer]''' is [http://seattlepi.nwsource.com/business/395463_newspapersale10.html up for sale] and if it doesn't find a buyer by March 10, 2009, the print will stop after 146 years. [http://www.thenewstribune.com/news/columnists/zeeck/story/591181.html] Initially, reports indicated it would shut down the website as well as the paper, but a plan was apparently in place to run a "skeleton crew" on an internet-only site. An activist group is trying to motivate a buyer. [http://seattletimes.nwsource.com/html/localnews/2008708649_apwaseattlenewspapersale2ndldwritethru.html] <br />
* '''[http://www.scoopt.com/ Scoopt]''', a "citizen journalism" site run by Getty images to allow the uploading of images by citizen journalists and the chance to be licensed to news organizations, announced they would no longer take any new imagery after February 6, 2009, and will shut down completely on March 6, 2009. Some content uploaders "may" be contacted about being absorbed into the main Getty site.<br />
[[Image:20090227.jpg|right|300px]]<br />
* '''The [http://www.rockymountainnews.com/ Rocky Mountain News]''' has shut down as of February 27, 2009. [http://www.rockymountainnews.com/news/2009/feb/26/rocky-mountain-news-closes-friday-final-edition/] We're watching to see what happens with the website (and the material, and the newspaper itself). With a 150 year history, there's a lot of backstory, and how this chronicler of history will end up, so too will many others. There is an excellent documentary about the last days of the Rocky Mountain News [http://www.vimeo.com/3390739 here].<br />
* '''Several Google services''' have announced that they will be shutting down. [http://www.readwriteweb.com/archives/google_giveth_and_it_taketh_away.php]<br />
<br />
*'''Electronic Gaming Monthly''' has recently shut its doors. [http://multiplayerblog.mtv.com/2009/01/06/egm-closed-ziff-lays-off-30/]<br />
*'''[http://culture11.com/home Culture11]''' ran out of money.[http://www.patrolmag.com/scanner/1263/culture11-is-over]<br />
*'''Filefront.com''' is closing up shop [http://farewell.filefront.com/]. The site will be suspended on March 30, 2009. 1.5 Million files and 48+ TB of space gone just like that. '''UPDATE''' As of April 2, 2009, it looks like there may have been an 11th hour reprieve for Filefront. According to a message reportedly from the original founders of the service [http://welcome.filefront.com/], the site has been re-acquired by them in order to prevent its proposed shuttering. The announcement was posted on April 1st, leading some to speculate that it was all an April Fools' Day hoax (an allegation denied on the website). Time will tell.<br />
<br />
=== Dead as a Doornail ===<br />
<br />
====2009====<br />
* '''[http://furl.net/ Furl]''' was a social bookmarking service that had been around since 2004. It was acquired by [http://diigo.com/ Diigo] (announced on March 9), allowed people to opt into transferring their bookmarks to Diigo, and shut down on April 17. [http://blog.diigo.com/2009/03/16/welcome-furl-users/ Diigo blog post]; [http://www.techcrunch.com/2009/03/09/diigo-buys-web-page-clipping-service-furl-away-from-looksmart/ Techcrunch post].<br />
* '''[http://www.spiralfrog.com Spiralfrog]''', "a FREE service that lets you download over 3 million songs and videos, legally and safely", pulled up stakes in the night and completely shut down on March 20, 2009. [http://arstechnica.com/web/news/2009/03/ad-based-music-service-spiralfrog-croaks.ars] Things looked so promising in 2006: [http://arstechnica.com/old/content/2006/08/7611.ars] Oh, and sadly, all your music you downloaded from them will stop working within 30 days or less. [http://arstechnica.com/old/content/2007/09/spiralfrog-debuts-with-free-ad-supported-music-downloads.ars]<br />
* '''[[Lycos Europe]]''' shut down their '''Tripod''' hosting service on February 28, 2009. [http://www.washingtonpost.com/wp-dyn/content/article/2009/01/18/AR2009011800224.html] [http://www.paidcontent.co.uk/entry/419-lycos-europe-killing-tripod-customers-warned-to-back-up/] Note that Lycos Europe are distinct from Lycos.com. '''[[Lycos Europe]]''' is also shuttering the social networking site '''Jubii''' as of February 15, 2009. [http://www.techcrunch.com/2009/01/18/lycos-kills-jubii-while-theyre-at-it/] A Danish version of the site will remain open for the time being.<br />
* '''Windows Live''' shut down the '''MSN Groups''' on February 23. They extended their original date from February 21st to give Group owners the weekend to prepare. [http://windowslivewire.spaces.live.com/Blog/cns!2F7EB29B42641D59!34861.entry?sa=503427140]<br />
* '''Home of the Underdogs''' went under on Feb 9th[http://flashofsteel.com/index.php/2009/02/13/rip-hotu/]. There has been some passed along words by the site's owner, now working at an NGO, that an attempt to bring it back may happen. (She definitely has backups of the site.) A community-driven effort to revive the site is currently underway [http://www.hotud.org]. As of March 25, 2009, the original reviews are available and searchable, but the file repository has not yet been restored.<br />
* '''[http://ma.gnolia.com/ ma.gnolia.com]''' had a catastrophic disk corruption/failure on January 31, 2009. From the message on the main site: ''"As I evaluate recovery options, I can't provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours."'' Ma.gnolia had an excellent export feature... hope you used it and did the backups they didn't!<br />
* '''[http://dominomag.com/ Domino Magazine]''', a style/interior design magazine, announced that they were shutting down on January 28, 2009. [http://mydecofile.dominomag.com/ My Deco File], one of the site's heavily used social bookmarking features (somewhat like delicious for images) will remain up for a few weeks to allow users to save their stuff.<br />
* '''Yahoo Pets''' was shut down and redirected with absolutely no notice around January 27, 2009. [http://blog.dogster.com/2009/01/28/yahoo-quietly-shutters-yahoo-pets-grin/]<br />
* '''[[totse]].com''' [http://www.totse.com/ closed its doors] on January 17, 2009. As of Jan 20th, a mirror [http://totse.danladds.com/ exists], alongside a [http://totse.danladds.com/text/ repository of the totse text files].<br />
* '''[[Ficlets]].com''' (owned by AOL) has announced they are closing on January 15, 2009. [http://www.peopleconnectionblog.com/2008/12/02/ficlets-will-be-shut-down-permanently/]<br />
* '''[[Circavie]].com''' (owned by AOL) has announced they are closing on January 15, 2009. [http://www.peopleconnectionblog.com/2008/12/03/circavie-will-be-shut-down-permanently/]<br />
* '''[[Co.mments]].com''' closed down on January 11, 2009.<br />
* '''[[AOL_Pictures|AOL Pictures]]''' said so long on January 9, 2009. To their credit, you can still yank your stuff into other photo services until June of 2009. (At least, according to their goodbye letter.)<br />
<br />
====2008====<br />
<br />
* [http://blogs.zdnet.com/BTL/?p=11227 Overview of 2008 Technology News]<br />
<br />
''Biggest Botched Shutdowns of 2008''<br />
* '''[http://www.peopleconnectionblog.com/2008/11/06/hometown-has-been-shutdown AOL Hometown]''' (owned by AOL) was officially killed on October 31, 2008. [http://ascii.textfiles.com/archives/1617 Jason wrote about it.]<br />
* '''Digitalrailroad.net''', a photo hosting site, gave their users a 24-hour eviction notice on October 27, 2008. They shut down 10 hours after the 24-hour notice. [http://news.cnet.com/8301-17939_109-10078042-2.html]<br />
<br />
''Other deaths of 2008''<br />
<br />
* '''[http://www.lively.com/goodbye.html Lively]''', a 3D Avatar space experiment, was killed in a really crappy way by Google on December 31, 2008.<br />
* '''[http://pingmag.jp/ Pingmag]''', the magazine from Tokyo about "Designing and Making things," simultaneously rang in the new year and checked out of existence on December 31, 2008.<br />
* '''[http://blog.mixwit.com/ Mixwit]''' said goodbye on December 27, 2008. [http://news.cnet.com/8301-17939_109-10126057-2.html]<br />
* '''[http://www.castlecops.com/ Castle Cops]''' put away their badges on December 23, 2008. [http://www.idf50.co.uk/clubhouse/computer-room/15996-castle-cops-closed-down.html]<br />
* '''[[Google Research Datasets]]''', shut down on December 19(?), 2008. [http://blog.wired.com/wiredscience/2008/12/googlescienceda.html]<br />
* '''Flip.com''', a social network for teenage girls, shut down on December 16, 2008. Users were advised to print out their digital scrapbooks as backups. [http://news.cnet.com/8301-1023_3-10112021-93.html]<br />
* '''[http://pownce.com/ Pownce]''' was closed on December 15, 2008.<br />
* '''[http://getsatisfaction.com/iwantsandy/topics/a_fork_in_the_road_an_important_announcement_about_i_want_sandy I Want Sandy]''' [http://www.webcitation.org/5eFA58kqN (WEBCITE)] was shut down on December 8, 2008. A lot of people complained about this one, while others thanked the site for shutting down and wished the founder well! <br />
* '''[http://live.yahoo.com/ Yahoo Live!]''' died on December 3, 2008. [http://news.cnet.com/8301-13515_3-10081486-26.html]<br />
* '''[http://ourworld.cs.com/sfrederick2/index.htm?f=fs|Compuserve OurWorld]''' slipped into history on October 31, 2008.<br />
* '''[http://blogrush.com BlogRush.com]''' failed to provide bloggers with the traffic they so desperately desired, and the creator admitted on October 29, 2008 that his 4AM idea may not have been so brilliant. [http://mashable.com/2008/10/29/blogrush-shutdown/]<br />
* '''[http://wallop.com/ Wallop]''', Microsoft's attempt at starting a social network, died on September 18, 2008. All that remains is a few Facebook apps. [http://news.cnet.com/8301-13577_3-10041856-36.html] [http://www.techcrunch.com/2008/09/15/wallop-takes-a-leap-into-the-deadpool/]<br />
* Virtual Magic Kingdom [http://www.intercot.com/discussion/showthread.php?t=130548 closed its gates] on May 21, 2008. [http://www.virtualworldsnews.com/2008/04/disneys-virtual.html] The amount of broken hearts and anguish over this move was amazing, and a warning sign to any family-oriented site that encourages families to join up.<br />
* '''[http://jam.bbc.co.uk/ BBC Jam]''' was [http://news.bbc.co.uk/2/hi/uk_news/education/6449619.stm suspended] March 20, 2007 and [http://www.guardian.co.uk/media/2008/feb/28/bbc.digitalmedia will not be coming back].<br />
* '''[http://en.wikipedia.org/wiki/Think_Secret Think Secret]''' was killed by Apple and shut down on February 14, 2008. [http://blog.wired.com/business/2007/12/apple-and-think.html]<br />
* '''Uber.com''' was a social blog site that died. [http://news.cnet.com/8301-13577_3-10052301-36.html]<br />
* '''Social.fm''' couldn't stand up to Last.fm, and died. [http://news.cnet.com/8301-13577_3-10005554-36.html]<br />
* '''Brijit.com''', a news aggregation site, closed on May 15, 2008. It might be closed for good. [http://news.cnet.com/8301-13577_3-9945059-36.html]<br />
<br />
====2007====<br />
<br />
''Deaths of 2007''<br />
<br />
* '''[http://oink.cd/ OiNK's Pink Palace]''' Music Bittorrent tracker site with huge user community which cared greatly about digital content and music. Would have been a great resource for the industry to research. Shutdown October 23, 2007. [http://www.wired.com/entertainment/music/news/2007/10/oink]<br />
<br />
=== Other Endangered Species ===<br />
<br />
* '''MUDs (Multi User Dungeons)''' are [http://www.offworld.com/2009/01/mud-history-going-down-wikis-m.html losing their history].<br />
*[http://www.astronautix.com Encyclopedia Astronautica] is the most comprehensive collection of the history of space travel. '''Period.''' Seriously, the official NASA history folks will refer you this website if they can't answer your questions. However, Mark Wade (the sole creator/maintainer) abandoned his blog at the end of 2007, and the Encyclopedia has not been updated since May of 2008, despite much happening in the space exploration world since then.<br />
* '''All of the 1UP Network''' and related properties were bought by UGO recently, and should be watched carefully. [http://multiplayerblog.mtv.com/2009/01/06/egm-closed-ziff-lays-off-30/]<br />
<br />
=== Just When You Least Expect It ===<br />
<br />
* Archive Team keeps a list of [[Fire_Drill|Healthy Sites]] that could be fine today and not so hot tomorrow. We focus on ways to back your personal data off these sites so you don't put yourself at unnecessary risk.<br />
<br />
=== Other Sites Remember the Dead ===<br />
<br />
* [http://www.disobey.com/ghostsites/ Ghost Sites of the Web] by Steve Baldwin. [http://www.disobey.com/ghostsites/atom.xml RSS Feed]<br />
* [http://itdied.com/ It Died] by Glenn Fleishman. [http://itdied.com/atom.xml RSS Feed].<br />
* [http://www.techcrunch.com/tag/deadpool/ Techcrunch's Deadpool] is an excellent archive of stories about site closings.<br />
<br />
=== Tragic ===<br />
<br />
* [http://news.cnet.com/8301-13578_3-10029798-38.html "Russia Web site owner killed after arrest" - article at CNET News]<br />
<br />
=== Humorous ===<br />
<br />
* [http://www.nzherald.co.nz/lifestyle/news/article.cfm?c_id=6&objectid=10448650 "Dating website's miscalculated publicity attempt" - article at New Zealand Herald]<br />
<br />
=== Eleventh Hour Reprieves ===<br />
<br />
* '''[[JPG Magazine]]''' announced it would shut down on January 5, 2009 [http://jpgmag.com/blog/2009/01/jpg_magazine_says_goodbye.html], but the site lives [http://jpgmag.com/blog/2009/02/an_exciting_future_for_jpg.html lives on under new ownership]. Feel free to download the [http://thepiratebay.org/torrent/4624703/ torrent]</div>Bbothttps://wiki.archiveteam.org/index.php?title=Projects&diff=517Projects2009-04-28T16:41:55Z<p>Bbot: TPB user</p>
<hr />
<div>Here's where Archive Teamsters can list the projects they are currently working on and organize new projects.<br />
<br />
== Active Projects ==<br />
<br />
* '''[[User:Jscott|Jason Scott]]''' is running [http://www.textfiles.com Textfiles.com] and archiving a ton of things.<br />
* '''[[User:Ross|Ross]]''' is interviewing the sites of 2008.<br />
* '''[[User:LesOrchard|l.m.orchard]]''' is starting work on some self-hosted web apps that will migrate and archive from other sites. (ie. [http://github.com/lmorchard/friendfeedarchiver FriendFeed], [http://github.com/lmorchard/memex/ Delicious])<br />
<br />
== Ideas for Projects ==<br />
<br />
* '''Set up''' an FTP hub which AT members can access and up/down finished projects.<br />
* Track the 100+ top twitter feeds, as designated by one of these idiot Twitter grading sites, and back up on a regular basis the top twitter people, for posterity.<br />
* '''[http://www.groklaw.net/ Groklaw]''' has a [http://www.groklaw.net/article.php?story=20090105033126835 project proposal] that we could help with. - [[User:Jscott|Jason]]<br />
* '''Archive''' the shutdown announcement pages on dead sites.<br />
* '''RSS Feed''' with death notices. - [[User:Jscott|Jason]]<br />
* '''Twitter profile''' might be a good way to broadcast new site obituaries. - psicom<br />
* '''[[TinyURL]]''' and similar services, scraping/backup - [[User:scumola|Steve]]<br />
** highlight services that at least allow exporting data ([[Diigo]] that I know of). Next "best" - services that have registeration and enable viewing your URL / saving them by e.g. saving as HTML ([[tr.im]]). Etc. --[[User:Jaakkoh|Jaakkoh]] 05:39, 4 April 2009 (UTC)<br />
* '''[http://symphony21.com/ Symphony]''' could [http://nick-dunn.co.uk/article/symphony-as-a-data-preservation-utility/ potentially be used] for archiving structured XML/RSS feeds to a relational database - [[User:nickdunn|Nick]]<br />
<br />
== Finished Projects ==<br />
<br />
* [[User:Jscott|Jason]] founded the Archive Team.<br />
* [[User:Bbot|bbot]] made [http://thepiratebay.org/user/archiveteam/ an archiveteam TPB user]. Get the password from him or Jason. (Not really a ''project'', per se.)</div>Bbothttps://wiki.archiveteam.org/index.php?title=User_talk:Bbot&diff=516User talk:Bbot2009-04-28T16:39:01Z<p>Bbot: Boo.</p>
<hr />
<div>Hey, jerkface! Why are you such a jerk? [[User:Bbot|Bbot]] 16:39, 28 April 2009 (UTC)</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=515User:Bbot2009-04-28T16:37:30Z<p>Bbot: Tagfail, again.</p>
<hr />
<div>I'm [http://bbot.org/ bbot]. Not Bbot, as mediawiki would have you think.</div>Bbothttps://wiki.archiveteam.org/index.php?title=User:Bbot&diff=514User:Bbot2009-04-28T16:37:06Z<p>Bbot: Creating my user page.</p>
<hr />
<div>I'm [[http://bbot.org/ bbot]]. Not Bbot, as mediawiki would have you think.</div>Bbothttps://wiki.archiveteam.org/index.php?title=GeoCities&diff=513GeoCities2009-04-28T16:35:25Z<p>Bbot: </p>
<hr />
<div>'''Geocities''' was a once very popular web hosting service founded in 1994 and purchased by [[Yahoo]] in 1999. On April 2009 Yahoo announced they would be closing Geocities "later this year".<br />
<br />
== Press review ==<br />
<br />
: [http://arstechnica.com/web/news/2009/04/geocities-to-close-after-15-years-of-aesthetic-awesomeness.ars Ars Technica]: Started in 1994, Geocities was like the Facebook to Angelfire's MySpace—competing webpage services that '''allowed over-enthused HTML newbies to create artfully horrific webpages to represent themselves in the early days of the Internet'''.<br />
<br />
: [http://www.fool.com/investing/high-growth/2009/04/24/razing-yahoos-geocities.aspx fool.com]: As anyone who has surfed through GeoCities over the years will tell you, an '''Internet without GeoCities is like a world of celluloid without Keanu Reeves flicks'''. The absence of GeoCities won't create a cultural void. Few will miss its passing. It's loaded mostly with hobbyist tribute pages, authored by penny-pinching cybersurfers who put up with primitive tools and gaudy ads in exchange for free hosting. Many of the pages were created years ago, and abandoned like bunny rabbits after Easter Sunday, Ugg boots after winter, and anything Reeves did after the first Matrix movie.<br />
<br />
: [http://www.techcrunch.com/2009/04/23/yahoo-quietly-pulls-the-plug-on-geocities/ TechCrunch]: One of the pioneers of web-hosting sites, GeoCities gave users personal publishing tools and created “neighborhoods” within its web platform for users to be able to create pages, add a picture, text, a guest book and a website counter. '''Long before MySpace, Geocities was known as a place where teenagers, college students, and eventually others could impose their own garish taste upon the rest of the world.'''<br />
<br />
: [http://www.pcworld.com/article/163765/so_long_geocities_we_forgot_you_still_existed.html PC World]: Of the 12 remaining GeoCities users, only one was available for comment. "Holy crap!" said the user, a red-faced fellow named Strong Bad. "'''The scroll buttons and animated GIFs on that site were unbeatable.'''"<br />
<br />
=== Archiveteam mentionings ===<br />
<br />
: [http://tech.slashdot.org/article.pl?sid=09/04/27/2252227 Slashdot]: jamie found this note from Jason Scott, who organizes the Archive Team. They are busy downloading as much of Geocities as they can before it vanishes from the Net after Yahoo pulled the plug.<br />
<br />
: [http://www.reddit.com/r/reddit.com/comments/8fn2u/bring_bandwidth_and_disks_help_me_save_geocities/ reddit.com]<br />
<br />
== Saving Geocities ==<br />
[[Image:Uf009617.gif|center]]<br />
<br />
=== Resources ===<br />
* [http://www.textfiles.com/geocities/WORKSHOP/ URL lists]<br />
* [http://www.textfiles.com/geocities/STUFF/ Neighborhood lists and other stuff]<br />
<br />
=== Users involved ===<br />
* [[User:Jscott]], Joey paulprote and many others are downloading the main www.geocities.com stuff.<br />
* [[User:Soult]] is downloading ''de.geocities.com'' at [http://seron.dyndns.org:8080/]<br />
* [[User:Bbot]] is mirroring downloaded content.</div>Bbothttps://wiki.archiveteam.org/index.php?title=Archiveteam:Current_events&diff=512Archiveteam:Current events2009-04-28T16:32:58Z<p>Bbot: New page: Geocities is dying!</p>
<hr />
<div>[[Geocities]] is dying!</div>Bbothttps://wiki.archiveteam.org/index.php?title=Wikipedia&diff=511Wikipedia2009-04-28T16:29:32Z<p>Bbot: Added dumps info</p>
<hr />
<div>For once, a site that recognizes the importance of third-party backups! They have a [http://download.wikipedia.org/ main downloads page] from which you can get XML dumps from [http://download.wikipedia.org/backup-index.html individual wikis].<br />
<br />
There's an old article dump (2008/03/12) [http://thepiratebay.org/torrent/4794236/enwiki-20080312-pages-articles.xml.bz2 up on the pirate bay], from the [http://thepiratebay.org/user/archiveteam/ archiveteam TPB account].</div>Bbothttps://wiki.archiveteam.org/index.php?title=Alive..._OR_ARE_THEY&diff=510Alive... OR ARE THEY2009-04-28T16:24:51Z<p>Bbot: Updated with link to wikipedia article.</p>
<hr />
<div>Like many sites before them, these places indicate a sunny outlook, a clean bill of health and a total sense of "all systems go". But as we've found out from those many sites before them, fortunes can change overnight.<br />
<br />
Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.<br />
<br />
=== Sites ===<br />
<br />
* '''[[Facebook]]''' seems stable at the moment.<br />
* '''[[Friendfeed]]''' is a happy clam.<br />
* '''[[Google]]''' wants you to think they will be here forever.<br />
* '''[[Twitter]]''' is tweaking away.<br />
* '''[[Wikipedia]]''' will surely be here forever and ever! Fortunately, we don't have to take their word for it/<br />
* '''[[Delicious]]''' loves to change their API, which has a side effect of making it difficult to back up.<br />
* '''[[whitehouse.gov]]''' is up and running for #44, <s>but we've lost all info for #43. (See also: [http://www.kottke.org/09/01/old-whitehousegov-down-the-memory-hole kottke] and [http://www.readwriteweb.com/archives/whitehousegov_president_web_presence.php Read Write Web].)</s> and #43 is available at http://georgewbush-whitehouse.archives.gov/ thanks to the [http://kitenet.net/~joey/blog/entry/ephemera_vs_the_law/ Presidential Records Act]. We also want to watch out for site changes / disappeared pages that were embarassing or whatnot.<br />
* '''[http://www.infoanarchy.org Infoanarchy]''' The site is functioning again. Might be worth backing up, though. For months, a simple database error that could be fixed with one command KO'd this site unexpectedly with a wealth of P2P information lost. [http://eng.anarchopedia.org/infoAnarchy]<br />
* '''[[LiveJournal]]''' fired a bunch of US-based developers, but is still serving from its new (presumably cheaper) data center in Montana.</div>Bbothttps://wiki.archiveteam.org/index.php?title=Wikipedia&diff=509Wikipedia2009-04-28T16:21:43Z<p>Bbot: Fixing my broken ass links</p>
<hr />
<div>There's an old article dump (2008/03/12) [http://thepiratebay.org/torrent/4794236/enwiki-20080312-pages-articles.xml.bz2 up on the pirate bay], from the [http://thepiratebay.org/user/archiveteam/ archiveteam TPB account].</div>Bbothttps://wiki.archiveteam.org/index.php?title=Wikipedia&diff=508Wikipedia2009-04-28T16:20:52Z<p>Bbot: Added TPB dump</p>
<hr />
<div>There's an old article dump (2008/03/12) [[http://thepiratebay.org/torrent/4794236/enwiki-20080312-pages-articles.xml.bz2 up on the pirate bay]], from the {{http://thepiratebay.org/user/archiveteam/ archiveteam TPB account}}.</div>Bbothttps://wiki.archiveteam.org/index.php?title=Alive..._OR_ARE_THEY&diff=392Alive... OR ARE THEY2009-02-19T16:02:43Z<p>Bbot: </p>
<hr />
<div>Like many sites before them, these places indicate a sunny outlook, a clean bill of health and a total sense of "all systems go". But as we've found out from those many sites before them, fortunes can change overnight.<br />
<br />
Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.<br />
<br />
=== Sites ===<br />
<br />
* '''[[Facebook]]''' seems stable at the moment.<br />
* '''[[Friendfeed]]''' is a happy clam.<br />
* '''[[Google]]''' wants you to think they will be here forever.<br />
* '''[[Twitter]]''' is tweaking away.<br />
* '''[http://en.wikipedia.org Wikipedia]''' will surely be here forever and ever! Fortunately, they provide PHP dumps and static HTML dumps on their [http://download.wikipedia.org/ downloads site].<br />
* '''[[Delicious]]''' loves to change their API, which has a side effect of making it difficult to back up.<br />
* '''[[whitehouse.gov]]''' is up and running for #44, but we've lost all info for #43. (See also: [http://www.kottke.org/09/01/old-whitehousegov-down-the-memory-hole kottke] and [http://www.readwriteweb.com/archives/whitehousegov_president_web_presence.php Read Write Web].) We also want to watch out for site changes / disappeared pages that were embarassing or whatnot.<br />
* '''[http://www.infoanarchy.org Infoanarchy]''' The site is functioning again. Might be worth backing up, though. For months, a simple database error that could be fixed with one command KO'd this site unexpectedly with a wealth of P2P information lost. [http://eng.anarchopedia.org/infoAnarchy]</div>Bbothttps://wiki.archiveteam.org/index.php?title=Alive..._OR_ARE_THEY&diff=391Alive... OR ARE THEY2009-02-19T16:00:37Z<p>Bbot: </p>
<hr />
<div>Like many sites before them, these places indicate a sunny outlook, a clean bill of health and a total sense of "all systems go". But as we've found out from those many sites before them, fortunes can change overnight.<br />
<br />
Archive Team considers these sites specifically of interest because they solicit so much content, contain so many works and projects by a wide group of people, or have the internet particularly dependent on them. Consider this a fire drill.. know what you can do to get your data off these sites and back them off for later.<br />
<br />
=== Sites ===<br />
<br />
* '''[[Facebook]]''' seems stable at the moment.<br />
* '''[[Friendfeed]]''' is a happy clam.<br />
* '''[[Google]]''' wants you to think they will be here forever.<br />
* '''[[Twitter]]''' is tweaking away.<br />
* '''[http://en.wikipedia.org Wikipedia]''' will surely be here forever and ever! Fortunately, they provide PHP dumps and static HTML dumps on their [[http://download.wikipedia.org/ downloads site]].<br />
* '''[[Delicious]]''' loves to change their API, which has a side effect of making it difficult to back up.<br />
* '''[[whitehouse.gov]]''' is up and running for #44, but we've lost all info for #43. (See also: [http://www.kottke.org/09/01/old-whitehousegov-down-the-memory-hole kottke] and [http://www.readwriteweb.com/archives/whitehousegov_president_web_presence.php Read Write Web].) We also want to watch out for site changes / disappeared pages that were embarassing or whatnot.<br />
* '''[http://www.infoanarchy.org Infoanarchy]''' The site is functioning again. Might be worth backing up, though. For months, a simple database error that could be fixed with one command KO'd this site unexpectedly with a wealth of P2P information lost. [http://eng.anarchopedia.org/infoAnarchy]</div>Bbot