Difference between revisions of "Google Video Warroom"
m (Reverted edits by Megalanya0 (talk) to last revision by Start) |
Megalanya1 (talk | contribs) m (MOTHERFUCKER ! ! !) |
||
Line 19: | Line 19: | ||
In any case, the first thing to do is to please add your name/nickname to [http://piratepad.net/gv-participants this list], along with the storage and bandwidth you have available. | In any case, the first thing to do is to please add your name/nickname to [http://piratepad.net/gv-participants this list], along with the storage and bandwidth you have available. | ||
= | == '''MOTHERFUCKER ! ! !''' == | ||
== '''MOTHERFUCKER ! ! !''' == | |||
== '''MOTHERFUCKER ! ! !''' == | |||
== | |||
= Seed Lists = | = Seed Lists = |
Revision as of 11:01, 17 January 2017
Google Video | |
Google Video logo | |
URL | http://video.google.com |
Status | Offline on 2011-04-29[1] |
Archiving status | Saved! |
Archiving type | Unknown |
IRC channel | #archiveteam-bs (on hackint) |
"Gentlemen. You can't fight in here. This is the War Room!"'
If you want to help archive Google Video, get some machines running and join us in IRC (EFNet #archiveteam / #googlegrape)
The automatic scripts only work on FreeBSD, Linux, Solaris, Windows, OS X, and Cygwin.
Anyone can help out, but we would *really* appreciate it if you'd use an *NIX system over any thoughts of doing it on a Windows system. If you however choose to pursue the Magical World of Windows - please make sure that what you are collecting is not damaged as a consequence of running it on a Windows system.
In any case, the first thing to do is to please add your name/nickname to this list, along with the storage and bandwidth you have available.
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
Seed Lists
Please send any new seedlists to underscor on IRC, rather than embarking on them yourself. He'll add them to the listerine queue.
- Original Lists: http://199.48.254.90/at/seeds/
Custom searches
- PLEASE add your custom searches and their details to this table!
- Words suggestions: public domain, subtitles
- Words already in the table or added to the BOINC client: conference, hack, wiki, linux, creative commons, part, interview, documentary, talk, brain, civilization, evolution, future, language, literature, mind, money, neurolinguistic, singularity
Years
1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999
Countries
AFGHANISTAN, ÅLAND+ISLANDS, ALBANIA, ALGERIA, AMERICAN+SAMOA, ANDORRA, ANGOLA, ANGUILLA, ANTARCTICA, ANTIGUA+AND+BARBUDA, ARGENTINA, ARMENIA, ARUBA, AUSTRALIA, AUSTRIA, AZERBAIJAN, BAHAMAS, BAHRAIN, BANGLADESH, BARBADOS, BELARUS, BELGIUM, BELIZE, BENIN, BERMUDA, BHUTAN, BOLIVIA,+PLURINATIONAL+STATE+OF, BONAIRE,+SAINT+EUSTATIUS+AND+SABA, BOSNIA+AND+HERZEGOVINA, BOTSWANA, BOUVET+ISLAND, BRAZIL, BRITISH+INDIAN+OCEAN+TERRITORY, BRUNEI+DARUSSALAM, BULGARIA, BURKINA+FASO, BURUNDI, CAMBODIA, CAMEROON, CANADA, CAPE+VERDE, CAYMAN+ISLANDS, CENTRAL+AFRICAN+REPUBLIC, CHAD, CHILE, CHINA, CHRISTMAS+ISLAND, COCOS+(KEELING)+ISLANDS, COLOMBIA, COMOROS, CONGO, CONGO, COOK+ISLANDS, COSTA+RICA, CÔTE+D'IVOIRE, CROATIA, CUBA, CURAÇAO, CYPRUS, CZECH+REPUBLIC, DENMARK, DJIBOUTI, DOMINICA, DOMINICAN+REPUBLIC, ECUADOR, EGYPT, EL+SALVADOR, EQUATORIAL+GUINEA, ERITREA, ESTONIA, ETHIOPIA, FALKLAND+ISLANDS+(MALVINAS), FAROE+ISLANDS, FIJI, FINLAND, FRANCE, FRENCH+GUIANA, FRENCH+POLYNESIA, FRENCH+SOUTHERN+TERRITORIES, GABON, GAMBIA, GEORGIA, GERMANY, GHANA, GIBRALTAR, GREECE, GREENLAND, GRENADA, GUADELOUPE, GUAM, GUATEMALA, GUERNSEY, GUINEA, GUINEA-BISSAU, GUYANA, HAITI, HEARD+ISLAND+AND+MCDONALD+ISLANDS, HOLY+SEE+(VATICAN+CITY+STATE), HONDURAS, HONG+KONG, HUNGARY, ICELAND, INDIA, INDONESIA, IRAN, IRAQ, IRELAND, ISLE+OF+MAN, ISRAEL, ITALY, JAMAICA, JAPAN, JERSEY, JORDAN, KAZAKHSTAN, KENYA, KIRIBATI, KOREA, KUWAIT, KYRGYZSTAN, LAO, LATVIA, LEBANON, LESOTHO, LIBERIA, LIBYAN+ARAB+JAMAHIRIYA, LIECHTENSTEIN, LITHUANIA, LUXEMBOURG, MACAO, MACEDONIA, MADAGASCAR, MALAWI, MALAYSIA, MALDIVES, MALI, MALTA, MARSHALL+ISLANDS, MARTINIQUE, MAURITANIA, MAURITIUS, MAYOTTE, MEXICO, MICRONESIA,+FEDERATED+STATES+OF, MOLDOVA, MONACO, MONGOLIA, MONTENEGRO, MONTSERRAT, MOROCCO, MOZAMBIQUE, MYANMAR, NAMIBIA, NAURU, NEPAL, NETHERLANDS, NEW+CALEDONIA, NEW+ZEALAND, NICARAGUA, NIGER, NIGERIA, NIUE, NORFOLK+ISLAND, NORTHERN+MARIANA+ISLANDS, NORWAY, OMAN, PAKISTAN, PALAU, PALESTINIAN+TERRITORY,+OCCUPIED, PANAMA, PAPUA+NEW+GUINEA, PARAGUAY, PERU, PHILIPPINES, PITCAIRN, POLAND, PORTUGAL, PUERTO+RICO, QATAR, RÉUNION, ROMANIA, RUSSIAN+FEDERATION, RWANDA, SAINT+BARTHÉLEMY, SAINT+HELENA,+ASCENSION+AND+TRISTAN+DA+CUNHA, SAINT+KITTS+AND+NEVIS, SAINT+LUCIA, SAINT+MARTIN+(FRENCH+PART), SAINT+PIERRE+AND+MIQUELON, SAINT+VINCENT+AND+THE+GRENADINES, SAMOA, SAN+MARINO, SAO+TOME+AND+PRINCIPE, SAUDI+ARABIA, SENEGAL, SERBIA, SEYCHELLES, SIERRA+LEONE, SINGAPORE, SINT+MAARTEN+(DUTCH+PART), SLOVAKIA, SLOVENIA, SOLOMON+ISLANDS, SOMALIA, SOUTH+AFRICA, SOUTH+GEORGIA+AND+THE+SOUTH+SANDWICH+ISLANDS, SPAIN, SRI+LANKA, SUDAN, SURINAME, SVALBARD+AND+JAN+MAYEN, SWAZILAND, SWEDEN, SWITZERLAND, SYRIA, TAIWAN, TAJIKISTAN, TANZANIA, THAILAND, TIMOR-LESTE, TOGO, TOKELAU, TONGA, TRINIDAD+AND+TOBAGO, TUNISIA, TURKEY, TURKMENISTAN, TURKS+AND+CAICOS+ISLANDS, TUVALU, UGANDA, UKRAINE, UNITED+ARAB+EMIRATES, UNITED+KINGDOM, UNITED+STATES, UNITED+STATES+MINOR+OUTLYING+ISLANDS, URUGUAY, UZBEKISTAN, VANUATU, VENEZUELA, VIETNAM, VIRGIN+ISLANDS,+BRITISH, VIRGIN+ISLANDS,+U.S., WALLIS+AND+FUTUNA, WESTERN+SAHARA, YEMEN, ZAMBIA, ZIMBABWE
Progress
The following table describes the outcome of various seedlists. For the latest Listerine statistics, see #Get_Involved_With_Listerine.
Legend
Uploaded to Archive.org | |
Done/Complete with no errors | |
Done/Complete with errors | |
In progress | |
Partially claimed and in progress | |
Not claimed | |
Moved to listerine | |
Unknown status (If you know please edit) |
Seed list | Videos (lines) | Downloaders | Progress and SIZE |
---|---|---|---|
seed_videos_rhistory | 6949 | Jade Falcon | 7 chunks with 1000 videos each ndurner: aa Jade Falcon: downloading... |
seed_videos_ecology | 890 | crackbab1 | |
seed_videos_meme | 996 | yipdw | Done (12 GB), bad IDs: -7139586667055487256, 744578668610845478, 9027107881335248661 |
seed_videos_defcon | 822 | ndurner | done |
seed_videos_ml_documentary_dedupe | 1975 | Lightblb, Papyrus, NomDuClavier | 3 completed chunks of 4 (4 claimed) Lightblb: aa (Complete:38GB With 1 Fail -> Rsync:Done) Papyrus: ab NomDuClavier: ac (complete), ad (complete) |
seed_videos_ml_lecture_dedupe | 1898 | Lightblb, gribozavr, kn100 | 3 completed chunks of 4 (4 claimed) Lightblb: aa ab (Done: 65G With 2 Failed -> Rsyncing) gribozavr: ad (rsync done, 28Gb) |
seed_videos_ml_atheism_dedupe | 698 | norc, Mqrius | 2 complete of 2 norc: ab done (16G), Mqrius: aa done (41GB). |
seed_videos_l_interview_dedupe | 986 | Pentium100, wgfreewill | aa - Done (136GB) Pentium100: ab - Done (66.7GB) |
seed_videos_evolution_dedupe (Long&Medium) | 1742 | Jade Falcon | downloading... |
seed_videos_talk_dedupe (Long&Medium) | 1795 | Jade Falcon | downloading... |
seed_videos_money_dedupe (Long&Medium) | 1824 | leftfield | |
seed_videos_civilization_dedupe (Long&Medium) | 471 | leftfield | done one broken docid -4727094082505590423 |
seed_videos_2_a | 25,761 | swebb | 61G, 3718/25761 files done (4/19/2011) 89G, 5579/25761 files done (4/20/2011) |
seed_videos_2_k | 19,266 (24,242) | Lightblb, ARc[Clone, crackbab1, Pentium100, Mqrius, arketype, Darkstar | 49 chunks completed of 49 Lightblb: aa ab ac ad ae (Done: 69GB -> Rsync: Done) |
seed_videos_2_l | 22,641 | ndurner, wgfreewill | Split 46 chunks of 500 videos each ndurner: aa done; wgfreewill - More than a TB, rsync to archive.org. |
seed_videos_2_m | 24,465 | Jade Falcon | Jade:Done. 506G, 305 error'ed IDs. Rsyncing. |
seed_videos_2_o | 25,049 | travelinlibrarian | Split 51 chunks of 500 videos each travelinlibrarian 376/1-500 |
seed_videos_2_p | 23,713 | oli, Xentac, db48x, otro, Mqrius, Pentium100, Darkstar, ryan__, nstrom | 46 complete of 48 chunks (all 48 claimed) oli: aa to ah (complete, 90GB) - RSYNCING |
seed_videos_2_q | 17,727 | DoubleJ | Done (165GB) and uploaded to IA
2 bad IDs: -3522777020956111862 1920882098876352864 |
seed_videos_2_t | 25,301 | businux | Split 51 chunks of 500 videos each 961/25,301 3.79% 33GB
LietKynes going backwards, 50 threads, 310GB already |
seed_videos_2_u | 23,528 | barbich, negge | 48 chunks complete of 48 barbich: finished 0 to 29 (100% done, 370G) |
seed_videos_2_w | 21,732 | nickmoorman | Split] 0 chunks completed of 34 (34 claimed0 nickmoorman: aa ab ac ad ae af ag ah ai aj |
seed_videos_2_x | 19,733 | ksh | 100% / 78GB
Need to check for errors!
|
seed_videos_2_y | 20,965 | negge | Done (216GB) |
seed_videos_2_z | 18,877 | flare | Currently in progress (38% - 104GiB) |
seed_videos_a | 1000 | Dr.Sweety | Done (84G). 9 DocIDs with 404. |
seed_videos_a_related | This list contain errors | Dr.Sweety | Done, 44G total. ~1097 out of 1284 seem to be DocIDs, rest is text. Half of the DocIDs are broken (see "Broken DocIDs" for some examples, a complete list is here http://piratepad.net/b8VbxXCVPG). What about the errors, will there be an updated list? |
seed_videos_b | 999 | bjwebb | 651/999 |
seed_videos_c | 981 | dnova | Uploaded to Archive.org (40.2GB) |
seed_videos_d | 999 | NomDuClavier | complete |
seed_videos_e | 999 | NomDuClavier | complete |
seed_videos_f | 999 | DoubleJ | Done (25GB)
Uploaded to IA w/subtitles |
seed_videos_g | 999 | dnova | Uploaded to Archive.org (30.9GB) one bad id=7751522177274361392 |
seed_videos_h | 999 | ARc[Clone | Done |
seed_videos_i | 999 | DeCarabas | Done (58 GB) |
seed_videos_j | 999 | joethehuman | Done (36.7 GB) |
seed_videos_k | 999 | aggroskater | Done (28.7 GB) one bad ID: -4784504756717962046 |
seed_videos_l | 999 | yipdw | Uploaded |
seed_videos_m | 999 | TJ__ | Done (34.7GB) |
seed_videos_n | 999 | ndurner | Done (38 GB) |
seed_videos_o | 999 | com_lab, grelbar (list) | ~38GB (com_lab) already uploaded, ~24GB(grelbar) |
seed_videos_p | 999 | Pneu | |
seed_videos_q | 996 | NomDuClavier | Done (~24Gb) |
seed_videos_r | 996 | Pentium | Done (26.5GB), two bad IDs (-6997682955012239023, -5475489738249304784) |
seed_videos_s | 999 | Pentium | Done (48.9GB), two bad IDs (2103424227166759427, -8954969329395485241) |
seed_videos_t | 999 | joethehuman | Done with errors below (56.8 GB) |
seed_videos_u | 999 | perfinion, 0xDEADBEEF, norc | 0xDEADBEEF 516/1000 24GB. norc 500-1000 done, 24GB. Perfinion done, 44GB. |
seed_videos_v | 999 | masterme1 | 497/999 (~28GB) |
seed_videos_w | 1000 | com_lab | Done (~5.7GB) |
seed_videos_x | 1000 | Dark-Star | Done (~33GB) |
seed_videos_y | 1000 | beremat | Done (~61.01GB) |
seed_videos_z | 1000 | ksh | Done (27GB) |
"microelectronics", "circuit+design", "microprocessor", "chiptune", "electrical+engineering", "hardware+hacking", "unboxing", "demoscene", |
1267 | dnova | Uploaded to Archive.org (33.9GB) |
"transistor", "tonawanda", "micron", "gallium", "nanometer", "femtosecond", "qubit", "integrated+circuit" |
343 | dnova | Uploaded to Archive.org (7.1GB) |
"singularity" | 174 | db48x | completed, 12.57GB (list created at 8am UTC April 18th 2011) |
"Feynman" | 28 | db48x | completed, 2.20GB (list created at 9am UTC April 18th 2011) |
"police" | 998 | lutostag | done, ~33GB (list created at 8am UTC April 18th 2011) |
"eliezer" | 150 (1000) | norc | uploaded, 6.8G (list created at 8am UTC April 18th 2011) |
"obama" | 1000 | ryan__ | 302/1000 as of 04-19-2011 00:51 EDT (still WIP) (list created at 8am UTC April 18th 2011) |
"cia" | 999 | ndurner | 800 (list created at 8am UTC April 18th 2011) |
"charlie" | 1000 | ryan__ | 120/1000 as of 04-19-2011 00:51 EDT (still WIP) (list created at 8am UTC April 18th 2011) |
IDs from the metafilter thread | 28 | db48x | completed, 6.17GB (list created at 9am UTC April 18th 2011) |
IDs from the reddit thread | 106 | ndurner | done (list created at 9am UTC April 18th 2011) |
"rare" | ~3100 | Darkstar | done (~70gb) |
"vintage" | |||
"commercial" | |||
"douglas adams", "richard dawkins", "charles darwin" |
NomDuClavier | 513 videos, done (one de-duped list for the 3 terms) | |
"australia history" "indigenous aboriginal australia" |
1659 | oli | complete - RSYNCING |
"linux" | 1641 | xtat | Done, 70GB, 8 failures |
"Bugs Bunny" | 153 | stack,wgfreewill | Done, 2.7GB |
"rodney mullen" | 176 | com_lab | Done, 1.7GB |
"tech talks" | 562 | tahu | completed, 562 videos, 47GB, 2011-04-20 22:07:31 UTC |
"rick astley" | 17 | db48x | completed, 272.8MB (grabbed 13:00 UTC April 18th 2011) |
"CERN" | 912 | vled | Done |
multiple: "michio kaku", "brian cox", "vernor vinge", "carl sagan", "simon singh" |
176 | NomDuClavier | done |
"intel", "amd" |
1547 | leftfield | done 21.5GB one broken docid -712494279917239419 |
"foia" | 89 | com_lab | Done, 4.1GB |
"creative commons" | 1000 (968 d/d) | aikidork | Uploaded |
"TED" | 1000 | vled | w/ problems |
"programming" | 1546 | Xentac | In Progress |
"military", "army", "navy", "air force", "marine corps" | 3108 | tj__ & ksh | Done (18GB + unknown) |
"fiddle", "banjo", "old time music" | 921 | RJL20 | Done |
"silent+film" | 1000 | dericed | In Progress |
"industrial" | 1584 | Archive242 | In Progress |
(pretty much) every valid GV link on MetaFilter | 1675 | RJL20 | Done |
http://hubpages.com/hub/The_Best_of_GoogleVideo | 122 | Lightblb | Done: 7.1GB - 55 Failed - Rsync Done. |
a few Olympics 1980 videos | 4 | gribozavr | Rsync done |
"kurzweil" | 61 | NomDuClavier | Completed |
"human+rights" | 2943 | witness.org,dericed | In Progress |
"the+netherlands", "nederland" |
1650 | NomDuClavier | In Progress |
Total | >324,788 | Archive Team | >2.24 TB (Apr. 19, 11:37:13 UTC) |
DocID Errors
The following table is a list of all the video document IDs that did not work.
DocID | Title | list |
---|---|---|
-4313176927520589553 | Ferrari 320 km/h SelMcKenzie | seed_videos_h |
710915802292429594 | Triple H-Best Pedigree Ever | seed_videos_h |
919675995190477263 | 404s | seed_videos_h |
-7433458566080701467 | 404s | seed_videos_2_k |
7476314005948269525 | Tan Tay Du Ky 2 tap 1 phan 2 | seed_videos_2_k |
1310034078921227326 | Presentatie H. van Garderen | seed_videos_h |
-8196546459051063200 | Ethiopia - Ethiopian Talk Show - Dr. Kinfe M Kassaye | seed_videos_m |
6012309833489564165 | I'm gonna miss you forever | seed_videos_m |
1006201176909432045 | Nick "KNUCKLEHEAD" Thomas Learning to Ride A KX 65 | seed_videos_2_k_br |
9013618753646293166 | TooSexii | seed_videos_m |
4607644763702261746 | Most Haunted | seed_videos_m |
910327017359455024 | 404s | seed_videos_2_k_br |
-3505183273546479430 | Top 10 Dunkers in Slam Dunk Contest History by www.todonba.mx.kz | seed_videos_2_k_bu |
515155312540224448 | Prof. Stephen Berk - The Six Day War -- (Only downloads 106MB & manual seek fails) | seed_videos_m |
8233620694803027158 | Tien Kiem Ky Hiep 12a | seed_videos_2_k_bs |
-7026671761719496982 | KV Kortrijk - Virton: kans Vervaeke | seed_videos_2_k_bo |
4744936758707683681 | 404s | seed_videos_2_k_bo |
-4138015874145288917 | Irvine City Council Regular Meeting -- content too short (expected 880173643 bytes and served 871) | seed_videos_2_k_bo |
1751753922865083288 | Lou Dobbs - Bill Gates Testifies to Senate: Part 2 | seed_videos_h |
-1847242336625060764 | 404s | seed_videos_h |
-840074924615574683 | H.O.T. TV EPISODE 7 | seed_videos_h |
5450039563312738134 | seed_videos_2_o | |
2740779495236816438 | seed_videos_2_o | |
8240553330007645065 | 404 | "rick astley" |
2776148046666235174 | 404 | seed_videos_d |
4641809537228296381 | 404 | seed_videos_ |
-4718427583805445551 | 404 | seed_videos_e |
5588388288256218328 | 404 | seed_videos_d |
-1413491257698089214 | Redirects to http://www.khou.com/news/119535529.html | seed_videos_a_related |
1895753595163256038 | Redirects to http://tv.sky.com/martina-my-toughest-opponent | seed_videos_a_related |
-4941694769105315227 | Redirects to http://saratoga-north.ynn.com/content/headlines/524274/governor-visit-s-nation-s-capitol/ | seed_videos_a_related |
-7773409926173229653 | Redirects to http://www.zacks.com/commentary/15486/Value+Stock+Picks-August+24,+2010 | seed_videos_a_related |
7391058183663855490 | Redirects to http://www.ebaumsworld.com/video/watch/81158874/ | seed_videos_a_related |
-4381742157481868130 | Redirects to http://arcade.modemhelp.net/play-3613-Stealing_A_Van.html | seed_videos_a_related |
-1554641026467581780 | Redirects to http://s167.photobucket.com/albums/u158/browneydgurl1212/?action=view¤t=meganstealinghashbrown.mp4 | seed_videos_a_related |
2353616771034791644 | Redirects to http://berkshires.ynn.com/content/headlines/523405/glens-falls-woman-accused-of-stealing-a-cat-from-pet-store/ | seed_videos_a_related |
9195455606734953941 | Redirects to http://abcnews.go.com/ThisWeek/video/roundtable-tragedy-tucson-12575675 | seed_videos_a_related |
9150764031039845836 | Redirects to http://www.ebaumsworld.com/video/watch/81298536/ | seed_videos_a_related |
9111781772616747857 | Redirects to http://abcnews.go.com/Politics/video/stephen-colbert-testifies-house-hearing-illegal-farm-workers-11718759 | seed_videos_a_related |
9106424136068226425 | Redirects to http://www.gameswelt.de/videos/videos/10349-Warhammer_Online_-_Home_Movie_Ever_Forward.html | seed_videos_a_related |
9106312808616607793 | Redirects to http://video.google.com/videoplay?docid=9106312808616607793 | seed_videos_a_related |
-423230311474262633 | seed_videos_2_k_at | |
-1989250447613793254 | seed_videos_2_k_at | |
-1717591024529167847 | seed_videos_2_k_au | |
-1893715945421217990 | seed_videos_2_k_aw | |
98954701061936704 | seed_videos_2_k_az | |
-857514171338089705 | 871B instead of 9.9MB | seed_videos_2_k_az |
187959010149993716 | seed_videos_2_k_az | |
-3761310108351243571 | seed_videos_2_k_az | |
-5034671686367848138 | Umar Kalim breaks it all content too short | seed_videos_2_k_bh |
3687153060611498767 | Picnic Tables at CiCo content too short | seed_videos_2_k_bj |
1010610140821179600 | seed_videos_2_k_bf | |
1272139449455901373 | seed_videos_2_k_bi | |
2154847967655726343 | seed_videos_2_k_bj | |
2453599535490760149 | seed_videos_2_k_bl | |
2525371248363122880 | seed_videos_2_k_bf | |
-3761310108351243571 | seed_videos_2_k_bh | |
4549148983829940555 | 404s | seed_videos_2_k_bi |
7051814862620931463 | seed_videos_2_k_bh | |
-7353344548521134361 | seed_videos_2_k_bl | |
-817434969229495880 | seed_videos_2_k_bh | |
8335036545639007262 | seed_videos_2_k_bh | |
-8653635503491974486 | seed_videos_2_k_bh | |
-970580050717025709 | seed_videos_2_k_bg | |
-3891054104657374974 | seed_videos_2_k_bb | |
-5401734107040161313 | seed_videos_2_k_bb | |
-6540216432023094075 | seed_videos_2_k_bb | |
-1165561225258043258 | L'universo elegante parte 1 | seed_videos_l |
1922748009661857239 | 4/8 - L'histoire secrète du pétrole - Le temps des premiers craquements | seed_videos_l |
300163955057959602 | 6/8 - L'histoire secrète du pétrole - Le temps des magouilles | seed_videos_l |
-7110898118644169273 | Beppe Grillo e l'inceneritore | seed_videos_l |
-7942619273555709195 | Le monde selon Monsanto - Arte FR | seed_videos_l |
8543705644990106023 | José Bové à Aubagne le 7 Février. | seed_videos_l |
2781869234442161475 | 404 | seed_videos_2_k_ap |
3684594607388096414 | 404 | seed_videos_2_k_ap |
4857427355245773332 | 404 | seed_videos_2_wap |
4818927167565306511 | 404 | seed_videos_2_wap |
-7139586667055487256 | Cadru 4 : Une mission du roi Even lui même? | meme |
744578668610845478 | Massieux délire (saut à poil) | meme |
9027107881335248661 | 404 | meme |
712494279917239419 | Unavailable - Charlie Rose - Red Wine & Mice / Andy Grove & Richard Tedlow | intel amd |
-4770095342392663956 | Trailer Park Boys - S03E08 - A Sh*t Leopard Can't Change Its Spots | seed_videos_t |
http://pastebin.com/LhR0vDFu | "Content Unavailable" or 404s | seed_videos_2_x |
-2183089322473530253 | EOF | army seed list |
7899609783711363184 | EOF | army seed list |
-8998613917213332529 | EOF | army seed list |
-4784504756717962046 | EOF ; visiting 2007 K-FROG Cares Golf Classic - Part 4: Pat Green Concert shows "video is not currently available" message | seed_videos_k |
7282734499247419085 | Papell Studio Samba Serenade Printed Silk Georgette Pants - Item: 129-160 | from listerine |
1551984263748100534 | ALLAMA TALIB JAUHARI - NASHTAR PARK KARACHI 2006 (PART-III) | from listerine |
2769128814553569958 | Laguna_Beach__-_Season_3_-_Episode_15_-_16.avi | from listerine |
3368393825136501633 | Magic Kingdom Hearts | from listerine |
-4534051497958455065 | Naruto Shippuuden 10 Fuuin Jutsu - Genryuu Kyuu Fuujin | from listerine |
-2661405767136566167 | marché aux animaux à Douz | from listerine |
-4129568891134205061 | 浙江化工廠釋放毒瓦斯 居民抗議遭鎮壓 | seed_videos_2_p |
-863669053556310192 | silencio | seed_videos_2_p |
1529854584895362082 | Dédicuce à ma Turtle Que Je Nadloveme !! | seed_videos_2_p |
-1190862519877917483 | Reportaje | seed_videos_2_p |
777223614374448946 | seed_videos_2_pan | |
-3753237639401264919 | seed_videos_2_pan | |
513998298993769213 | seed_videos_2_pan | |
4197907857130732658 | seed_videos_2_pan | |
-7209518661908939846 | seed_videos_2_pan | |
1936036414289617481 | seed_videos_2_pan | |
1231628683306604703 | seed_videos_2_pan | |
8391426573583714670 | seed_videos_2_pao | |
-5030624673313016595 | seed_videos_2_pao | |
2797125101537296652 | seed_videos_2_pao | |
1231628683306604703 | seed_videos_2_pao | |
765639190728070873 | seed_videos_2_pap | |
3106095225664799618 | seed_videos_2_pap | |
3824729866360231334 | seed_videos_2_pap | |
-1011278591250373536 | seed_videos_2_paq | |
5017038353295770271 | seed_videos_2_paq | |
-2103962498187129713 | seed_videos_2_par | |
-1920063529943044649 | seed_videos_2_par | |
-8842656122683618628 | seed_videos_2_par | |
3980781378957129624 | seed_videos_2_par | |
3168333365786153885 | seed_videos_2_par | |
-850263308777060275 | seed_videos_2_par | |
-2739776417348844007 | seed_videos_2_par | |
-3693490165652585623 | seed_videos_2_par | |
-4421953779802914087 | seed_videos_2_par | |
-4985191518265705146 | seed_videos_2_par | |
-5030272711619967323 | seed_videos_2_par | |
-7480760343548282696 | seed_videos_2_par | |
-8507025902579487785 | seed_videos_2_par | |
-8565673568506246688 | seed_videos_2_par | |
7948280818830462878 | seed_videos_2_par | |
7111518386861929818 | seed_videos_2_par | |
5414116161601449115 | seed_videos_2_par | |
4453387956996456150 | seed_videos_2_par | |
3484019002795418536 | seed_videos_2_par | |
2599414351734791684 | seed_videos_2_par | |
981037964378644131 | seed_videos_2_par | |
503478249453792411 | seed_videos_2_par | |
-626427952319840934 | seed_videos_2_pas | |
6692782035853741408 | seed_videos_2_pas | |
-8104722695725517962 | seed_videos_2_pas | |
6603725717674618753 | seed_videos_2_pas | |
-6885426254291916923 | seed_videos_2_pas | |
8878306115268123242 | seed_videos_2_pas | |
2664598798454107069 | seed_videos_2_pas | |
-1130301863313429407 | seed_videos_2_pas | |
6383722209898652464 | seed_videos_2_pas | |
1410624060530577390 | seed_videos_2_pat | |
1100175904848145330 | seed_videos_2_pat | |
6421364272580349095 | seed_videos_2_pat | |
3243976296567942326 | seed_videos_2_pat | |
2856723628413664723 | seed_videos_2_pau | |
-6684370625181545902 | seed_videos_2_pau | |
-9112039128971736721 | seed_videos_2_pau | |
-5134977928545797502 | seed_videos_2_pau | |
491463814477878191 | listerine | |
8027332670412780967 | listerine | |
-8620028295602605989 | listerine | |
6793949560762919914 | unavailable | listerine |
-4337343993095627162 | listerine | |
-4246080235264001426 | youtube-dl errors with "unable to extract title" but video plays in browser | listerine |
Deduplication For Those Not Using Listerine
To avoid downloading videos that have already been downloaded by others:
- check if you have SQLite installed ("which sqlite3")
- download the gv-dedup scripts
- initialize a fresh database with "./gv-list-create.sh"
- download all seed lists on this page (plus the cherry picks) and import them with "./gv-list-import.sh seed_file" (or "find seeds/* -exec ./gv-list-import.sh {} \;")
- invoke "./gv-list-dedup.sh seed_videos_foo > list" to filter already downloaded videos from your custom seed list
- also import your custom seed file with "./gv-list-import.sh list"
A pre-filled database is available.
Tools
Youtube-DL
- http://rg3.github.com/youtube-dl/download.html
- python youtube-dl googlevideourl
DocID scripts
Scraping by dates uploaded:
Check to see which dates have already been scraped at:
GoogleGargle
Aria2c (APT)
- apt-add-repository ppa:t-tujikawa/ppa
- apt-get update
- apt-get install aria2
Aria2c (RPM)
Fedora and CentOS have RPMs available.
- yum install aria2
Searcher
Bash script to search for terms on Google Video, includes dedupe and ability to restrict search by video length.
predict-download-size
Bash script to read a docid list and find out the total size of the listed videos. Requires youtube-dl, curl.
Subtitles
Some videos have subtitles which haven't been included in the download script (yet). I've created a fairly basic script which retrieves all available subtitles and stores them into the correct folder. You just need perl and a seed list (saved as "list"). You can also run it in an empty dir if you're afraid that it will mess with the videos you have downloaded so far (probably a good idea as I didn't do extensive tests yet). Once the subtitles have been downloaded, just run a "rsync -avP $subtitle_directory $video_directory" to transfer the subtitles to the corresponding video.
You may grab the script at http://piratepad.net/K7wZRrxvoU. Feel free to modify it.
--- For some reason it sometimes saves the file under a different name than what it outputs to the console, tested on Debian 6 -Pentium100 -> This has been corrected, the problem arose whenever there were spaces in the filename.
--- Google will return a 503 if it feels like it's queried by a bot (http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=86640). I have modified the script to pause for 60 seconds after 100 queries, hope that this will suffice. If not, you can either tweak the $PAUSE_AFTER or the actual pause duration in the script. Also, the script will now download multiple subtitles for one video (it didn't do that before, sorry!). -Dr.Sweety
Saving Individual Videos
The seed files do currently not include all videos, so you might want to save precious videos explicitely. To do that, add IDs (found in the docid URL parameter video) to the "list" file in the same directory as the script, for example:
docid=1545969803753962248 docid=1598207563000425446 docid=-1679753730105404298
and start ./googlegargle
To request a video, add it to this list: http://piratepad.net/gvspecificrequests
If you download something from that list, add its docid to http://piratepad.net/TL7KDN8821 so that others won't download those videos for the second time.
Custom Keyword Searches
Linux
If you want to grab videos by your own custom keyword search term, you can use this script.
Alternatively, you can use this command:
SEARCH='my+search+term';for i in `seq 0 10 990 `;do curl -A "AT, Bitches" "http://www.google.com/search?q=$SEARCH+site:video.google.com&hl=en&safe=off&tbm=vid&start=$i&sa=N"|grep -o "docid=[0-9-]*"|sort -u|tee -a seed_videos_$SEARCH;done
Change "my+search+term" to your search term, and remember to use a plus sign instead of spaces (and to url encode the text for other special characters).
Mac Bash Command
Uses jot instead of seq:
SEARCH='my+search+term';for i in `jot - 0 990 10 `;do curl -A "AT, Bitches" "http://www.google.com/search?q=$SEARCH+site:video.google.com&hl=en&safe=off&tbm=vid&start=$i&sa=N"|grep -o "docid=[0-9-]*"|sort -u|tee -a seed_videos_$SEARCH;done
Alternatively, you can get seq (and lots of other useful stuff) by installing the macports coreutils package: sudo port install coreutils. Commands are prefixed with a 'g', so seq is called gseq, but you may of course symlink it so you don't have to modify your scripts.
Searches Undertaken
Since we want to minimize overlap, here are some search terms that are already in progress of being downloaded along with the name of the downloader:
- Darkstar: "rare", "vintage", "commercial"
- NomDuClavier: "douglas adams", "richard dawkins", "charles darwin", "michio kaku", "brian cox", "vernor vinge", "carl sagan", "simon singh"
- oli: "australia history"
- dnova: "microelectronics"
- Lightblb: "documentary" (medium and long videos), "lecture" (medium and long videos), "atheism" (medium & long), "interview" (long), talk (medium & long), brain (medium & long), civilization (medium & long), evolution (medium & long), future (medium & long), language (medium & long), literature (medium & long), mind (medium & long), money (medium & long), neurolinguistic (medium & long), singularity (medium & long)
- ttuttle: "astronomy"
- crackbab1: "ecology"
- tj__: "army"
- r00s: "dokumentation" (medium, long)
Also check the specificrequest PiratePad under Cherry Picking on this page.
Troubleshooting
- /usr/bin/aria2c: unrecognized option '--max-connection-per-server=16'
- The Aria version available in many linux distributions is not up to date and will throw errors.
- To fix this remove the option from the goooglegargle script line starting with "ARIAOPTIONS="
- User 'negge' on IRC reports the following ARIA command line works for Debian Squeeze with ext4 filesystem,
- --max-overall-download-limit=1024M --file-allocation=falloc --max-connection-per-server=4 --min-split-size=1M --log-level=notice --remote-time=true
- or for ext3 on Debian Squeeze,
- --max-overall-download-limit=1024M --file-allocation=prealloc --max-connection-per-server=4 --min-split-size=1M --log-level=notice --remote-time=true