Difference between revisions of "Starwars.yahoo.com"
Jump to navigation
Jump to search
(Created page with 'Problems encountered: * Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this. ** TOR ** multiple IPs The ta…') |
|||
Line 1: | Line 1: | ||
Problems encountered: | Problems encountered: | ||
* Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this. | * Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this. | ||
** TOR | ** TOR (slow as molasses, but worked) - collected using httrack | ||
** multiple IPs | ** multiple IPs (fast, but needs large IP resources) - collected using wget | ||
The tarballs in the archive reflect both archiving methods: | The tarballs in the archive reflect both archiving methods: | ||
-rw-r--r-- 1 root root 228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2 | -rw-r--r-- 1 root root 228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2 | ||
-rw-r--r-- 1 root root 36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2 | -rw-r--r-- 1 root root 36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2 |
Revision as of 20:10, 23 December 2009
Problems encountered:
- Yahoo issues an error 999 after about 30 minutes of fetching from a certain IP. We used two approaches to get around this.
- TOR (slow as molasses, but worked) - collected using httrack
- multiple IPs (fast, but needs large IP resources) - collected using wget
The tarballs in the archive reflect both archiving methods:
-rw-r--r-- 1 root root 228855239 Dec 15 13:35 starwars.yahoo.com-goekesmi-raw.tar.bz2 -rw-r--r-- 1 root root 36529217 Dec 20 15:53 starwars.yahoo.com-tor.tar.bz2