Difference between revisions of "EditThis"

From Archiveteam
Jump to navigation Jump to search
(update)
(meh)
Line 22: Line 22:


Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:
Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:
* add <code>--exnamespaces=8</code>
* add <code>--exnamespaces=8,9</code>
* use API and a <code>--delay=60</code>, wait 120 s between each wiki.
* use API and a <code>--delay=60</code>, wait 120 s between each wiki.
If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).
If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).

Revision as of 15:22, 13 February 2014

Edit.This
EditThis logo
A screen shot of the EditThis.info home page taken on 27 May 2012
A screen shot of the EditThis.info home page taken on 27 May 2012
URL http://editthis.info
Status Online!
Archiving status Not saved yet
Archiving type Unknown
IRC channel #wikiteam (on hackint)

Edit.This is a wikifarm.

For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis

This farm is quite hard to archive, because of

  • old software (MediaWiki 1.15) with several weirdnesses, both at application and webserver level (like directory structure, URL rewrites, l10n in MediaWiki namespace);
  • slow servers (even after they fixed their robots.txt);
  • very strict captcha and throttling (with unhelpful status codes);
  • number of wikis taken over by spam since 2012 or earlier.

The owner clearly has not had time to manage it properly for several years now.

Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:

  • add --exnamespaces=8,9
  • use API and a --delay=60, wait 120 s between each wiki.

If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).

See also

External links