Difference between revisions of "EditThis"
Jump to navigation
Jump to search
(update) |
|||
Line 14: | Line 14: | ||
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis | For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis | ||
== | This farm is quite hard to archive, because of | ||
* old software (MediaWiki 1.15) with several weirdnesses, both at application and webserver level (like directory structure, URL rewrites, l10n in MediaWiki namespace); | |||
* slow servers (even after they fixed their robots.txt); | |||
* very strict captcha and throttling (with unhelpful status codes); | |||
* number of wikis taken over by spam since 2012 or earlier. | |||
The owner clearly has not had time to manage it properly for several years now. | |||
Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following: | |||
* add <code>--exnamespaces=8</code> | |||
* use API and a <code>--delay=60</code>, wait 120 s between each wiki. | |||
If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle). | |||
== See also == | == See also == |
Revision as of 09:38, 25 January 2014
Edit.This | |
A screen shot of the EditThis.info home page taken on 27 May 2012 | |
URL | http://editthis.info |
Status | Online! |
Archiving status | Not saved yet |
Archiving type | Unknown |
IRC channel | #wikiteam (on hackint) |
Edit.This is a wikifarm.
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis
This farm is quite hard to archive, because of
- old software (MediaWiki 1.15) with several weirdnesses, both at application and webserver level (like directory structure, URL rewrites, l10n in MediaWiki namespace);
- slow servers (even after they fixed their robots.txt);
- very strict captcha and throttling (with unhelpful status codes);
- number of wikis taken over by spam since 2012 or earlier.
The owner clearly has not had time to manage it properly for several years now.
Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:
- add
--exnamespaces=8
- use API and a
--delay=60
, wait 120 s between each wiki.
If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).
See also
External links