Difference between revisions of "EditThis"
Jump to navigation
Jump to search
Line 10: | Line 10: | ||
}} | }} | ||
'''EditThis''' is a [[wikifarm]]. | '''EditThis''' is a [[wikifarm]]. According to our estimates, there are over [https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/editthis.info 1,300 wikis]. | ||
This farm is quite hard to archive, because of | This farm is quite hard to archive, because of |
Revision as of 18:23, 21 July 2016
EditThis | |
A screen shot of the EditThis.info home page taken on 27 May 2012 | |
URL | http://editthis.info |
Status | Online! |
Archiving status | Not saved yet |
Archiving type | Unknown |
IRC channel | #wikiteam (on hackint) |
EditThis is a wikifarm. According to our estimates, there are over 1,300 wikis.
This farm is quite hard to archive, because of
- old software (MediaWiki 1.15) with several weirdnesses, both at application and webserver level (like directory structure, URL rewrites, l10n in MediaWiki namespace);
- slow servers (even after they fixed their robots.txt);
- very strict captcha and throttling (with unhelpful status codes);
- number of wikis taken over by spam since 2012 or earlier.
The owner clearly has not had time to manage it properly for several years now.
Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:
- add
--exnamespaces=8,9
- use API and a
--delay=60
, wait 120 s between each wiki.
If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).