Difference between revisions of "EditThis"

From Archiveteam
Jump to navigation Jump to search
m (moved Edit.This to EditThis)
(update)
Line 14: Line 14:
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis


== Backups ==
This farm is quite hard to archive, because of
* old software (MediaWiki 1.15) with several weirdnesses, both at application and webserver level (like directory structure, URL rewrites, l10n in MediaWiki namespace);
* slow servers (even after they fixed their robots.txt);
* very strict captcha and throttling (with unhelpful status codes);
* number of wikis taken over by spam since 2012 or earlier.
The owner clearly has not had time to manage it properly for several years now.
 
Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:
* add <code>--exnamespaces=8</code>
* use API and a <code>--delay=60</code>, wait 120 s between each wiki.
If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).


== See also ==
== See also ==

Revision as of 09:38, 25 January 2014

Edit.This
EditThis logo
A screen shot of the EditThis.info home page taken on 27 May 2012
A screen shot of the EditThis.info home page taken on 27 May 2012
URL http://editthis.info
Status Online!
Archiving status Not saved yet
Archiving type Unknown
IRC channel #wikiteam (on hackint)

Edit.This is a wikifarm.

For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis

This farm is quite hard to archive, because of

  • old software (MediaWiki 1.15) with several weirdnesses, both at application and webserver level (like directory structure, URL rewrites, l10n in MediaWiki namespace);
  • slow servers (even after they fixed their robots.txt);
  • very strict captcha and throttling (with unhelpful status codes);
  • number of wikis taken over by spam since 2012 or earlier.

The owner clearly has not had time to manage it properly for several years now.

Best results to complete a download with launcher.py and dumpgenerator.py have been reached with the following:

  • add --exnamespaces=8
  • use API and a --delay=60, wait 120 s between each wiki.

If you lower the delay too much, or forget to sleep between some kinds of requests, you can easily enter a loop of endless 503 errors and never get out of it (each request, failed or not, counts for the throttle).

See also

External links