So this one time in Google Video...
On April 15, Google sent e-mail to anyone who had uploaded video to the Google Video site, informing them that all user content was to be deleted in roughly 30 days. They also announced that after 14 days (to April 29th), they would no longer make the videos available for viewing.
The Internet Archive stepped in with an offer to host the downloading data, providing dozens of terabytes of space to sort things out before they would be added to the stacks and provided online. Team members began synchronizing their collections in earnest; archive.org also set off on a parallel downloading operation, and both groups shared their docid discoveries.
One week in, Google• • • they were no longer doing any of this, and were going to keep Google Video up indefinitely, as well as adding migration tools to move YouTube videos into user accounts.
A Brief History
Within days of the announcement, Jason Scott had thrown together a script, "googlegargle," to automatically download videos identified by scraping links. Volunteers would feed huge lists of scraped DOCID's to this script - in some cases more than 25,000 at a time - in an attempt to download the linked videos. Shortly thereafter, the large lists were broken apart into smaller chunks and people would register a claim to one or more on the wiki. Despite this, there was still a great likelihood of multiple individuals downloading the same videos, something the team were keen to avoid given the impending cutoff date. Efforts were made to create a sqlite3 database against which individuals could deduplicate their DOCID data; then Alex Buie created "listerine," a centrally coordinated, distributed processing system akin to SETI@Home. The listerine client would ask his central server for a video identifier, download it, then report it as finished. It was a fire-and-forget solution for the scores of volunteer downloaders. With this new weapon, The Archive Team was saving Google Video at the rate of 5 terabytes per day. Work continued on search technologies to scrape keywords, subjects and related videos to ensure every video, no matter how obscure, would be found and added to the central database.
By the time of Google's capitulation, over 1 million videos and 18TB had been downloaded by our team. With a reported total of 2.5 - 2.8 million videos, Google Video was already 40% preserved.
Google Cries Uncle
Archiveteam and Archive.org continue to download Google Videos, of course, but at a much slower rate and without pulling in dozens of people.
- April 16 - Boing Boing: • • •
- April 17 - Read Write Web: • • •
- April 18 - Wired: • • •
- April 18 - Laughing Squid: • • •
- April 19 - 404 Tech Support: • • •
- April 20 - Slashdot: • • •
- April 20 - Blog: • • •
- April 23 - Emu Console Exploit News: • • • (I know, sensory overload. But they got the first scoop on the conclusion and they attribute it to us!)