Starwars.yahoo.com

From Archiveteam
Revision as of 20:10, 23 December 2009 by Scumola (talk | contribs)
Jump to navigation Jump to search

Problems encountered:

  • Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this.
    • TOR (slow as molasses, but worked) - collected using httrack
    • multiple IPs (fast, but needs large IP resources) - collected using wget

The tarballs in the archive reflect both archiving methods:

-rw-r--r--  1 root   root   228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2
-rw-r--r--  1 root   root    36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2