Difference between revisions of "Google Groups Files"

From Archiveteam
Jump to navigation Jump to search
Line 33: Line 33:


directories: NEW: 71364, PROCESSING: 6, DONE_DIR: 101141<br>
directories: NEW: 71364, PROCESSING: 6, DONE_DIR: 101141<br>
groups: NEW: 778080, PROCESSING: 290, ERROR: 10249, ADULT: 4177, DONE_GRP: 298122
groups: NEW: 778080, PROCESSING: 290, ERROR: 10249, ADULT: 4177, DONE_GRP: 298122<br>
completion rate: directories: 170/hr, groups: 2213/hr


== Script ==
== Script ==

Revision as of 18:27, 20 June 2011

Googleparty.jpg

Google is challenging AT again...

This notice appears on Google Groups pages:


Zipped versions of the pages and files associated with this group will be available for download until August 31, 2011. After this date, this feature and the zip file downloads will be turned off permanently.


A script is available that searches Google Groups directories and downloads the ZIP files of individual groups. The script uses a Google App Engine hosted app for coordination.

Status



2011-06-04:

directories: new: 59926, done: 25585
groups: new 552075, done: 70804

2011-06-14:

directories: done: 76080
groups: done: 163749

2011-06-16:

directories: NEW: 80942, PROCESSING: 7, DONE_DIR: 81644
groups: NEW: 880230, PROCESSING: 97, ERROR: 5825, ADULT: 3893, DONE_GRP: 172148

2011-06-20:

directories: NEW: 71364, PROCESSING: 6, DONE_DIR: 101141
groups: NEW: 778080, PROCESSING: 290, ERROR: 10249, ADULT: 4177, DONE_GRP: 298122
completion rate: directories: 170/hr, groups: 2213/hr

Script

Requirements

(ba)sh, wget, grep, curl

Usage

  • Normal operation
./ggroups_zipdl.sh
  • Discover only (no downloads to store)
./ggroups_zipdl.sh discover
  • Download only (no discovery of new groups)
./ggroups_zipdl.sh download

Issues

-