Difference between revisions of "Google Video"

From Archiveteam
Jump to navigation Jump to search
(two brackets.)
m (→‎Google Cries Uncle: Mentioning video title of popular “Google Videos” video 7664206256212725581: “Aircrash Investigations-United Airlines Flight 232”.)
(19 intermediate revisions by 9 users not shown)
Line 3: Line 3:
So this one time in Google Video...
So this one time in Google Video...


On April 15, Google sent e-mail to anyone who had uploaded video to the Google Video site, informing them that all user content was to be deleted in roughly 30 days. They also announced that after 14 days (to April 29th), they would no longer make the videos available for viewing.
On April 15 2011, Google sent e-mail to anyone who had uploaded video to the Google Video site, informing them that all user content was to be deleted in roughly 30 days. They also announced that after 14 days (to April 29th 2011), they would no longer make the videos available for viewing.


Archive Team whipped into action and inspired a cluster of archivists to attempt to download and preserve the whole of Google Video for suffering mankind. Over the course of a few short days the team and technologies evolved from a brute force 'download everything alphabetically' approach to a sophisticated DOCID scraping operation, with keyword and related video searches producing a list of some 2.5 - 2.8 million DOCID's. These were then handed off to a distributed job management system - listerine - which assigned downloads to volunteers from around the world.  
Archive Team whipped into action and inspired a cluster of archivists to attempt to download and preserve the whole of Google Video for suffering mankind. Over the course of a few short days the team and technologies evolved from a brute force 'download everything alphabetically' approach to a sophisticated DOCID scraping operation, with keyword and related video searches producing a list of some 2.5 - 2.8 million DOCID's. These were then handed off to a distributed job management system - listerine - which assigned downloads to volunteers from around the world.  


[[archive.org|The Internet Archive]] stepped in with an offer to host the downloading data, providing dozens of terabytes of space to sort things out before they would be added to the stacks and provided online. Team members began synchronizing their collections in earnest; archive.org also set off on a parallel downloading operation, and both groups shared their docid discoveries.
[http://www.archive.org/index.php The Internet Archive] stepped in with an offer to host the downloading data, providing dozens of terabytes of space to sort things out before they would be added to the stacks and provided online. Team members began synchronizing their collections in earnest; archive.org also set off on a parallel downloading operation, and both groups shared their docid discoveries.


In a couple of days, 18TB of verified video data had been downloaded and the team was on schedule to mirror the entire Google Video archive.
In a couple of days, 18TB of verified video data had been downloaded and the team was on schedule to mirror the entire Google Video archive.


One week in, Google {{url|1=http://googlewebmastercentral.blogspot.com/2011/04/update-on-google-video-finding-easier.html|2=announced}} they were no longer doing any of this, and were going to keep Google Video up indefinitely, as well as adding migration tools to move YouTube videos into user accounts.
One week in (Friday, April 22, 2011), Google announced<ref name=announcement20110422>{{url|1=http://googlewebmastercentral.blogspot.com/2011/04/update-on-google-video-finding-easier.html}}</ref> they were no longer doing any of this, and were going to keep Google Video up indefinitely, as well as adding migration tools to move YouTube videos into user accounts.


* [[Google Video (Archive)|Archive of the Google Video Project]]
* [[Google Video (Archive)|Archive of the Google Video Project]]
* [[Google Video Warroom|Archive of the Google Video Warroom]]
* [[Google Video Warroom|Archive of the Google Video Warroom]]
 
* [https://archive.org/details/googlevideo2011 Collection of WARCs at IA] (not directly downloadable, but presumably findable via the Wayback Machine)
== A Brief History ==
== A Brief History ==
Within days of the announcement, [[User:Jscott|Jason Scott]] had thrown together a script, "googlegargle," to automatically download videos identified by scraping links. Volunteers would feed huge lists of scraped DOCID's to this script - in some cases more than 25,000 at a time - in an attempt to download the linked videos. Shortly thereafter, the large lists were broken apart into smaller chunks and people would register a claim to one or more on the wiki. Despite this, there was still a great likelihood of multiple individuals downloading the same videos, something the team were keen to avoid given the impending cutoff date. Efforts were made to create a sqlite3 database against which individuals could deduplicate their DOCID data; then [[User:Underscor|Alex Buie]] created "listerine," a centrally coordinated, distributed processing system akin to [http://en.wikipedia.org/wiki/SETI@home SETI@Home]. The listerine client would ask his central server for a video identifier, download it, then report it as finished. It was a fire-and-forget solution for the scores of volunteer downloaders. With this new weapon, The Archive Team was saving Google Video at the rate of 5 terabytes per day. Work continued on search technologies to scrape keywords, subjects and related videos to ensure every video, no matter how obscure, would be found and added to the central database.
Within days of the announcement, [[User:Jscott|Jason Scott]] had thrown together a script, "googlegargle," to automatically download videos identified by scraping links. Volunteers would feed huge lists of scraped DOCID's to this script - in some cases more than 25,000 at a time - in an attempt to download the linked videos. Shortly thereafter, the large lists were broken apart into smaller chunks and people would register a claim to one or more on the wiki. Despite this, there was still a great likelihood of multiple individuals downloading the same videos, something the team were keen to avoid given the impending cutoff date. Efforts were made to create a sqlite3 database against which individuals could deduplicate their DOCID data; then [[User:Underscor|Alex Buie]] created "listerine," a centrally coordinated, distributed processing system akin to [http://en.wikipedia.org/wiki/SETI@home SETI@Home]. The listerine client would ask his central server for a video identifier, download it, then report it as finished. It was a fire-and-forget solution for the scores of volunteer downloaders. With this new weapon, The Archive Team was saving Google Video at the rate of 5 terabytes per day. Work continued on search technologies to scrape keywords, subjects and related videos to ensure every video, no matter how obscure, would be found and added to the central database.
Line 27: Line 27:
In response to the persistent criticism and contacts from users, Google Video (technically, YouTube, as the engineers were now part of YouTube) announced that they were removing the deletion date of April 29th, adding a "Migrate to Youtube" function which would push videos to a linked YouTube account (without the time limit restriction) and intending to automatically transition the full back catalog of videos into YouTube. Meanwhile, they have said they will not be removing any user data, whatsoever. A complete victory!
In response to the persistent criticism and contacts from users, Google Video (technically, YouTube, as the engineers were now part of YouTube) announced that they were removing the deletion date of April 29th, adding a "Migrate to Youtube" function which would push videos to a linked YouTube account (without the time limit restriction) and intending to automatically transition the full back catalog of videos into YouTube. Meanwhile, they have said they will not be removing any user data, whatsoever. A complete victory!


Archiveteam and Archive.org continue to download Google Videos, of course, but at a much slower rate and without pulling in dozens of people.  
Archiveteam and Archive.org continued to download Google Videos, of course, but at a much slower rate and without pulling in dozens of people.
 
As of 2017, all old video.google.com URLs are broken and don't redirect to any target, even for popular videos such as {{URL|2=“Aircrash Investigations-United Airlines Flight 232”|1=http://video.google.com/videohosted?docid=7664206256212725581}} found on [https://web.archive.org/web/20090702180608/http://video.google.com/ an archived main page].
 
See also [[wikipedia:Google Videos#Termination of video hosting]].


== Press ==
== Press ==


* April 16 - Boing Boing: {{url|1=http://boingboing.net/submit/2011/04/help-archive-team-save-google-video-content-from-the-abyss.html|2=Help Archive Team save Google Video content from the abyss}}
* April 16 2011 - Boing Boing: {{url|1=http://boingboing.net/submit/2011/04/help-archive-team-save-google-video-content-from-the-abyss.html|2=Help Archive Team save Google Video content from the abyss}}
* April 17 - Read Write Web: {{url|1=http://www.readwriteweb.com/archives/as_google_video_shuts_its_doors_heres_how_to_save.php#more|2=As Google Video Shuts Its Doors, Here's How to Save the Content}}
* April 17 2011 - Read Write Web: {{url|1=http://www.readwriteweb.com/archives/as_google_video_shuts_its_doors_heres_how_to_save.php#more|2=As Google Video Shuts Its Doors, Here's How to Save the Content}}
* April 18 - Wired: {{url|1=http://www.wired.co.uk/news/archive/2011-04/18/google-video-termination|2=Technology Archivists step in as Google Video shuts down for good}}
* April 18 2011 - Wired: {{url|1=http://www.wired.co.uk/news/archive/2011-04/18/google-video-termination|2=Technology Archivists step in as Google Video shuts down for good}}
* April 18 - Laughing Squid: {{url|1=http://laughingsquid.com/archive-team-trying-to-download-google-video-before-it-shuts-down/|2=Archive Team Is Trying To Download Google Video Before It Shuts Down}}
* April 18 2011 - Laughing Squid: {{url|1=http://laughingsquid.com/archive-team-trying-to-download-google-video-before-it-shuts-down/|2=Archive Team Is Trying To Download Google Video Before It Shuts Down}}
* April 19 - 404 Tech Support: {{url|1=http://www.404techsupport.com/2011/04/19/google-video-is-shutting-down-and-one-teams-effort-to-save-the-content/|2=Google Video Is Shutting Down and One Team’s Effort To Save the Content}}
* April 19 2011 - 404 Tech Support: {{url|1=http://www.404techsupport.com/2011/04/19/google-video-is-shutting-down-and-one-teams-effort-to-save-the-content/|2=Google Video Is Shutting Down and One Team’s Effort To Save the Content}}
* April 20 - Slashdot: {{url|1=http://slashdot.org/submission/1535202/Google-Video-Race-Against-Time-Goes-Distributed|2=Google Video Effort Goes Distributed}}
* April 20 2011 - Slashdot: {{url|1=http://slashdot.org/submission/1535202/Google-Video-Race-Against-Time-Goes-Distributed|2=Google Video Effort Goes Distributed}}
* April 20 - Blog: {{url|1=http://nicalderton.com/blog/GoogleGrape/|2=Google Video Effort Goes Distributed}}
* April 20 2011 - Blog: {{url|1=http://nicalderton.com/blog/GoogleGrape/|2=Google Video Effort Goes Distributed}}
* April 23 - Emu Console Exploit News: {{url|1=http://emuconsoleexploitnews.blogspot.com/2011/04/archive-team-won-google-is-going-to.html|2=The Archive Team WON: Google is going to migrate Google Videos to Youtube!}} <small>(I know, sensory overload. But they got the first scoop on the conclusion and they attribute it to us!)</small>
* April 23 2011 - Emu Console Exploit News: {{url|1=http://emuconsoleexploitnews.blogspot.com/2011/04/archive-team-won-google-is-going-to.html|2=The Archive Team WON: Google is going to migrate Google Videos to Youtube!}} <small>(I know, sensory overload. But they got the first scoop on the conclusion and they attribute it to us!)</small>


[[Image:01713.jpg|center|frame|300px|So, what did we learn here? ..don't do it again?]]
[[Image:01713.jpg|center|frame|300px|So, what did we learn here? ..don't do it again?]]
{{Navigation box}}


[[Category:Google]]
[[Category:Google]]
[[Category:Video hostings]]
[[Category:Video hosting]]

Revision as of 20:07, 12 May 2019

Googleparty.jpg

So this one time in Google Video...

On April 15 2011, Google sent e-mail to anyone who had uploaded video to the Google Video site, informing them that all user content was to be deleted in roughly 30 days. They also announced that after 14 days (to April 29th 2011), they would no longer make the videos available for viewing.

Archive Team whipped into action and inspired a cluster of archivists to attempt to download and preserve the whole of Google Video for suffering mankind. Over the course of a few short days the team and technologies evolved from a brute force 'download everything alphabetically' approach to a sophisticated DOCID scraping operation, with keyword and related video searches producing a list of some 2.5 - 2.8 million DOCID's. These were then handed off to a distributed job management system - listerine - which assigned downloads to volunteers from around the world.

The Internet Archive stepped in with an offer to host the downloading data, providing dozens of terabytes of space to sort things out before they would be added to the stacks and provided online. Team members began synchronizing their collections in earnest; archive.org also set off on a parallel downloading operation, and both groups shared their docid discoveries.

In a couple of days, 18TB of verified video data had been downloaded and the team was on schedule to mirror the entire Google Video archive.

One week in (Friday, April 22, 2011), Google announced[1] they were no longer doing any of this, and were going to keep Google Video up indefinitely, as well as adding migration tools to move YouTube videos into user accounts.

A Brief History

Within days of the announcement, Jason Scott had thrown together a script, "googlegargle," to automatically download videos identified by scraping links. Volunteers would feed huge lists of scraped DOCID's to this script - in some cases more than 25,000 at a time - in an attempt to download the linked videos. Shortly thereafter, the large lists were broken apart into smaller chunks and people would register a claim to one or more on the wiki. Despite this, there was still a great likelihood of multiple individuals downloading the same videos, something the team were keen to avoid given the impending cutoff date. Efforts were made to create a sqlite3 database against which individuals could deduplicate their DOCID data; then Alex Buie created "listerine," a centrally coordinated, distributed processing system akin to SETI@Home. The listerine client would ask his central server for a video identifier, download it, then report it as finished. It was a fire-and-forget solution for the scores of volunteer downloaders. With this new weapon, The Archive Team was saving Google Video at the rate of 5 terabytes per day. Work continued on search technologies to scrape keywords, subjects and related videos to ensure every video, no matter how obscure, would be found and added to the central database.

GV dl rate.png

By the time of Google's capitulation, over 1 million videos and 18TB had been downloaded by our team. With a reported total of 2.5 - 2.8 million videos, Google Video was already 40% preserved.

Google Cries Uncle

In response to the persistent criticism and contacts from users, Google Video (technically, YouTube, as the engineers were now part of YouTube) announced that they were removing the deletion date of April 29th, adding a "Migrate to Youtube" function which would push videos to a linked YouTube account (without the time limit restriction) and intending to automatically transition the full back catalog of videos into YouTube. Meanwhile, they have said they will not be removing any user data, whatsoever. A complete victory!

Archiveteam and Archive.org continued to download Google Videos, of course, but at a much slower rate and without pulling in dozens of people.

As of 2017, all old video.google.com URLs are broken and don't redirect to any target, even for popular videos such as “Aircrash Investigations-United Airlines Flight 232”[IAWcite.todayMemWeb] found on an archived main page.

See also wikipedia:Google Videos#Termination of video hosting.

Press

So, what did we learn here? ..don't do it again?