https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Hydriz&feedformat=atomArchiveteam - User contributions [en]2024-03-29T09:35:44ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=WikiTeam&diff=26433WikiTeam2016-10-03T14:13:42Z<p>Hydriz: /* Wikifarms */ Miraheze</p>
<hr />
<div>{{Infobox project<br />
| title = WikiTeam<br />
| image = Wikiteam.jpg<br />
| description = WikiTeam, we preserve wikis<br />
| URL = [https://github.com/WikiTeam/wikiteam wikiteam github], manual for now, check [https://wikiapiary.com/wiki/Category:Website_not_archived not archived wikis on wikiapiary]<br />
| project_status = {{online}} (at least some of them)<br />
| source = [https://github.com/Archiveteam/wikis-grab wikis-grab]<br />
| tracker = [http://tracker.archiveteam.org/wikis/ wiki tracker]<br />
| archiving_status = {{inprogress}}<br />
| irc = wikiteam<br />
}}<br />
<br />
'''WikiTeam''' software is a set of tools for archiving wikis. They work on [[MediaWiki]] wikis, but we want to expand to other wiki engines. As of January 2016, WikiTeam has preserved more than 27,000 stand-alone.<br />
<br />
You can check [https://archive.org/details/wikiteam our collection] at [[Internet Archive]], the [https://github.com/WikiTeam/wikiteam source code] in [[GitHub]] and some [https://wikiapiary.com/wiki/Websites/WikiTeam lists of wikis by status] in [[WikiApiary]].<br />
<br />
== Current status ==<br />
<br />
The total number of MediaWiki wikis is unknown, but some estimates exist.<br />
<br />
According to [[WikiApiary]], which is the most updated database, there are 21,369 independent wikis (1,508 are semantic) and 4,554 in wikifarms.<ref>[https://wikiapiary.com/wiki/Websites Websites] - WikiApiary</ref> But it doesn't include [[Wikia]] 400,000+ wikis and the independent list coverage can be improved for sure.<br />
<br />
According to Pavlo's list generated in December 2008, there are 20,000 wikis.<ref>[http://cs.brown.edu/~pavlo/mediawiki/ Pavlo's list of wikis] ([http://www.cs.brown.edu/~pavlo/mediawiki/mediawikis.csv mediawiki.csv]) ([https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/mediawikis_pavlo.csv backup])</ref> This list was imported into WikiApiary.<br />
<br />
According to [[WikiIndex]], there are 20,698 wikis.<ref>[http://wikiindex.org/Special:Statistics WikiIndex Statistics]</ref> The URLs in this project were added to WikiApiary in the past too.<br />
<br />
A number of [[#Wikifarms|wikifarms]] have vanished and about 150 are still online.<ref>[https://wikiapiary.com/wiki/Farm:Farms Wikifarms]</ref><ref>[https://en.wikipedia.org/wiki/Comparison_of_wiki_hosting_services Comparison of wiki hosting services]</ref><ref>[http://wikiindex.org/Category:WikiFarm Category:WikiFarm]</ref><br />
<br />
Most wikis are small, containing about 100 pages or less, but there are some very large wikis:<ref>[http://meta.wikimedia.org/wiki/List_of_largest_wikis List of largest wikis]</ref><ref>[http://s23.org/wikistats/largest_html.php?th=15000&lines=500 List of largest wikis in the world]</ref><br />
* By number of pages: Wikimedia Commons (40 million), English Wikipedia (37 million), DailyWeeKee (35 million), WikiBusiness (22 million) and Wikidata (19 million).<br />
* By number of files: Wikimedia Commons (28 million), English Wikipedia (800,000).<br />
<br />
The oldest dumps are probably some 2001 dumps of Wikipedia when it used UseModWiki.<ref>[https://dumps.wikimedia.org/archive/ Wikimedia Downloads Historical Archives]</ref><ref>[http://dumps.wikimedia.org/nostalgiawiki Dump] of [http://nostalgia.wikipedia.org/ Nostalgia], an ancient version of Wikipedia from 2001</ref><br />
<br />
As of November 2015, our collection at Internet Archive holds dumps for 27,420 wikis (including independent, wikifarm wikis, some packages of wikis and Wiki[pm]edia).<ref>[https://archive.org/details/wikiteam WikiTeam collection] at Internet Archive</ref><br />
<br />
== Wikifarms ==<br />
<br />
There are also wikifarms with hundreds of wikis. Here we only create pages for those we have some special information about that we don't want to lose (like archiving history and tips). For a full list, please use WikiApiary [https://wikiapiary.com/wiki/Farm:Main_Page wikifarms main page].<br />
<br />
Before backing up a wikifarm, try to update the list of wikis for it. There are [https://github.com/WikiTeam/wikiteam/tree/master/listsofwikis/mediawiki Python scripts to generate those lists] for many wikifarms.<br />
<br />
{| class="wikitable sortable plainlinks" style="text-align: center;"<br />
! width=140px | Wikifarm !! width=80px | Wikis !! Status !! width=80px | Dumps !! Comments<br />
|-<br />
| [[Battlestar Wiki]] ([http://battlestarwiki.org site]) || 8 || {{green|Online}} || 0<ref>[https://archive.org/search.php?query=battlestarwikiorg%20subject%3Awikiteam battlestarwikiorg - dumps]</ref> || <br />
|-<br />
| [[BluWiki]] ([http://wayback.archive.org/web/20090301060338/http://bluwiki.com/go/Main_Page site]) || ? || {{red|Offline}} || ~20<ref>[https://archive.org/search.php?query=bluwiki%20subject%3Awikiteam bluwiki - dumps]</ref> || <br />
|-<br />
| [[Communpedia]] ([https://wikiapiary.com/wiki/Communpedia_%28ru%29 site]) || 5 || {{yellow|Unestable}} || 4<ref>[https://archive.org/search.php?query=subject%3A%22Comunpedia%22%20OR%20subject%3A%22Communpedia%22%20OR%20subject%3A%22kommynistru%22 communpedia - dumps]</ref><br />
|-<br />
| [[EditThis]] ([http://editthis.info site]) || 1,350<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/editthis.info editthis.info - list of wikis]</ref> || {{yellow|Unstable}} || 1307+ (IA: 1,297<ref>[https://archive.org/search.php?query=editthisinfo%20subject%3Awikiteam editthis.info - dumps]</ref>) || Most dumps were done in 2014. This wikifarm is not well covered in WikiApiary.<ref>[https://wikiapiary.com/wiki/Farm:EditThis Farm:EditThis]</ref><br />
|-<br />
| [[elwiki.com]] ([https://web.archive.org/web/20070917110429/http://www.elwiki.com/ site]) || Unknown<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/elwiki.com elwiki.com - list of wikis]</ref> || {{red|Offline}} || None<ref>[https://archive.org/search.php?query=elwiki%20subject%3Awikiteam elwiki.com - dumps]</ref> || Last seen online in 2008.<ref>[https://web.archive.org/web/20080221125135/http://www.elwiki.com/ We're sorry about the downtime we've been having lately]</ref> There is no dumps, presumably lost. Perhaps [https://web.archive.org/web/form-submit.jsp?type=prefixquery&url=http://elwiki.com/ some pages] are in the Wayback Machine.<br />
|-<br />
| [[Miraheze]] ([https://meta.miraheze.org site]) || 687 || {{green|Online}} || 685 || Non-profit. Dumps were made in September 2016.<br />
|-<br />
| [[Neoseeker.com]] ([https://neowiki.neoseeker.com site])|| 229<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/neoseeker.com neoseeker.com - list of wikis]</ref> || {{green|Online}} || 159<ref>[https://archive.org/search.php?query=neoseeker+subject%3Awikiteam neoseeker.com - dumps]</ref> || Check why there are dozens of wikis without dump.<br />
|-<br />
| [[Orain]] ([https://meta.orain.org site]) || 425<ref>[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/listsofwikis/mediawiki/orain.org orain.com - list of wikis]</ref> || {{red|Offline}} || ~380<ref>[https://archive.org/search.php?query=orain%20subject%3Awikiteam orain - dumps]</ref><ref>[https://archive.org/details/wikifarm-orain.org-20130824 Orain wikifarm dump (August 2013)]</ref> || Last seen online in September 2015. Dumps were made in August 2013, January 2014 and August 2015.<br />
|-<br />
| [[Referata]] ([http://www.referata.com site]) || 156<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/referata.com referata.com - list of wikis]</ref> || {{green|Online}} || ~80<ref>[https://archive.org/search.php?query=referata%20subject%3Awikiteam referata.com - dumps]</ref><ref>[https://archive.org/details/referata.com-20111204 Referata wikifarm dump 20111204]</ref><ref>[https://archive.org/details/wikifarm-referata.com-20130824 Referata wikifarm dump (August 2013)]</ref> || Check why there are dozens of wikis without dump.<br />
|-<br />
| [[ScribbleWiki]] ([http://scribblewiki.com site]) || 119<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/scribblewiki.com scribblewiki.com - list of wikis]</ref> || {{red|Offline}} || None<ref>[https://archive.org/search.php?query=scribblewiki%20subject%3Awikiteam scribblewiki.com - dumps]</ref> || Last seen online in 2008.<ref>[https://web.archive.org/web/20080404093502/http://scribblewiki.com/main.php What is ScribbleWiki?]</ref> There is no dumps, presumably lost. Perhaps [https://web.archive.org/web/form-submit.jsp?type=prefixquery&url=http://scribblewiki.com/ some pages] are in the Wayback Machine.<br />
|-<br />
| [[ShoutWiki]] ([http://www.shoutwiki.com site]) || 1,879<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/shoutwiki.com shoutwiki.com - list of wikis]</ref> || {{green|Online}} || ~1,300<ref>[https://archive.org/search.php?query=shoutwiki%20subject%3Awikiteam shoutwiki.com - dumps]</ref><ref>[http://www.archive.org/details/shoutwiki.com ShoutWiki wikifarm dump]</ref> || Check why there are dozens of wikis without dump.<br />
|-<br />
| [[Sourceforge]] || ? || {{green|Online}} || 315<ref>[https://archive.org/search.php?query=sourceforge%20subject%3Awikiteam sourceforge - dumps]</ref> || <br />
|-<br />
| [[TropicalWikis]] ([http://tropicalwikis.com site]) || 187<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/tropicalwikis.com tropicalwikis.com - list of wikis]</ref> || {{red|Offline}} || 152<ref>[https://archive.org/search.php?query=tropicalwikis%20subject%3Awikiteam tropicalwikis.com - dumps]</ref> || Killed off in November 2013. Allegedly pending move to [[Orain]] (which became offline too). Data from February 2013 and earlier saved.<br />
|-<br />
| [[Wik.is]] ([http://wik.is site]) || ? || {{red|Offline}} || ? || Non-MediaWiki.<br />
|-<br />
| [[Wiki-Site]] ([http://www.wiki-site.com site]) || 5,839<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/wiki-site.com wiki-site.com - list of wikis]</ref> || {{green|Online}} || 367 || No uploaded dumps yet.<br />
|-<br />
| [[Wikia]] ([http://www.wikia.com site]) || 400,000<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/wikia.com wikia.com - list of wikis]</ref> || {{green|Online}} || ~34,000<ref>[https://archive.org/details/wikia_dump_20121204 Wikia wikis data dumps]</ref> || [http://community.wikia.com/wiki/Help:Database_download Help:Database download], [https://github.com/Wikia/app/tree/dev/extensions/wikia/WikiFactory/Dumps Their dumping code]<br />
|-<br />
| [[WikiHub]] ([http://wikihub.ssu.lt site]) || ? || {{red|Offline}} || 7<ref>[https://archive.org/details/wikifarm-wikihub.ssu.lt-20131110 wikihub - dumps]</ref> || <br />
|-<br />
| [[Wiki.Wiki]] ([https://wiki.wiki site]) || 100<ref>[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/listsofwikis/mediawiki/wiki.wiki wiki.wiki - list of wikis]</ref> || {{green|Online}} || ? || <br />
|-<br />
| [[Wikkii]] ([https://web.archive.org/web/20140621054654/http://wikkii.com/wiki/Free_Wiki_Hosting site]) || 3,267 || {{red|Offline}} || 1,300<ref>[https://archive.org/search.php?query=wikkii%20subject%3Awikiteam wikki.com - dumps]</ref> || <br />
|-<br />
| [[YourWiki.net]] ([https://web.archive.org/web/20100124003107/http://www.yourwiki.net/wiki/YourWiki site]) || ? || {{red|Offline}} || ? || <br />
|}<br />
<br />
== Wikis to archive ==<br />
<br />
Please [https://wikiapiary.com/wiki/Special:FormEdit/Website add a wiki to WikiApiary] if you want someone to archive it sooner or later; or tell us on the #wikiteam channel if it's particularly urgent. Remember that there are thousands of wikis we don't even know about yet.<br />
<br />
[https://github.com/WikiTeam/wikiteam/wiki/Tutorial You can help] downloading wikis yourself. If you don't know where to start, pick a [https://wikiapiary.com/wiki/Category:Website_not_archived wiki which was not archived yet] from the lists on WikiApiary. Also, you can edit those pages to link existing dumps! You'll help others focus their work.<br />
<br />
Examples of huge wikis:<br />
<br />
* '''[[Wikipedia]]''' - arguably the largest and one of the oldest wikis on the planet. It offers public backups (also for sister projects): http://dumps.wikimedia.org<br />
** They have some mirrors but not many.<br />
** The transfer of the dumps to the Internet Archive is automated and is currently managed by [[User:Hydriz|Hydriz]].<br />
<br />
* '''[[Wikimedia Commons]]''' - a wiki of media files available for free usage. It offers public backups: http://dumps.wikimedia.org<br />
** But there is no image dump available, only the image descriptions<br />
** So we made it! http://archive.org/details/wikimediacommons<br />
<br />
* '''[[Wikia]]''' - a website that allows the creation and hosting of wikis. Doesn't make regular backups.<br />
<br />
We're trying to decide which [https://groups.google.com/forum/#!topic/wikiteam-discuss/TxzfrkN4ohA other wiki engines] to work on: suggestions needed!<br />
<br />
== Tools and source code ==<br />
=== Official WikiTeam tools ===<br />
* [https://github.com/WikiTeam/wikiteam WikiTeam in GitHub]<br />
* '''[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/dumpgenerator.py dumpgenerator.py] to download MediaWiki wikis:''' <tt>python dumpgenerator.py --api=http://archiveteam.org/api.php --xml --images</tt><br />
* [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/wikipediadownloader.py wikipediadownloader.py] to download Wikipedia dumps from download.wikimedia.org: <tt>python wikipediadownloader.py</tt><br />
<br />
=== Other ===<br />
* [http://dl.dropbox.com/u/63233/Wikitravel/Source%20Code%20and%20tools/Source%20Code%20and%20tools.7z Scripts of a guy who saved Wikitravel]<br />
* [http://www.communitywiki.org/en/BackupThisWiki OddMuseWiki backup]<br />
* UseModWiki: use wget/curl and [http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/RawMode raw mode] (might have a different URL scheme, like [http://meatballwiki.org/wiki/action=browse&id=TheTippingPoint&raw=1 this])<br />
** Some wikis: [[UseMod:SiteList]]<br />
<br />
== Wiki dumps ==<br />
<br />
Most of our dumps are in the [http://www.archive.org/details/wikiteam wikiteam collection at the Internet Archive]. If you want an item to land there, just upload it in "opensource" collection and remember the "WikiTeam" keyword, it will be moved at some point. When you've uploaded enough wikis, you'll probably be made a collection admin to save others the effort to move your stuff.<br />
<br />
For a manually curated list, [https://github.com/WikiTeam/wikiteam/wiki/Available-Backups visit the download section] on GitHub.<br />
<br />
There is another site of MediaWiki dumps located [http://mirrors.sdboyd56.com/WikiTeam/index.html here] on [http://www.archiveteam.org/index.php?title=User:Sdboyd Scott's] website.<br />
<br />
=== Tips ===<br />
Some tips:<br />
* When downloading Wikipedia/Wikimedia Commons dumps, pages-meta-history.xml.7z and pages-meta-history.xml.bz2 are the same, but 7z use to be smaller (better compress ratio), so use 7z.<br />
* To download a mass of wikis with N parallel threads, just <code>split</code> your full <code>$list</code> in N chunks, then start N instances of <code>launcher.py</code> ([https://github.com/WikiTeam/wikiteam/wiki/Tutorial#Download_a_list_of_wikis tutorial]), one for each list<br />
** If you want to upload dumps as they're ready and clean up your storage: at the same time, in a separate window or screen, run a loop of the kind <code>while true; do ./uploader.py $list --prune-directories --prune-wikidump; sleep 12h; done;</code> (the <code>sleep</code> ensure each run has something to do). <br />
** If you want to go advanced and run really ''many'' instances, use <code>tmux</code>[http://blog.hawkhost.com/2010/07/02/tmux-%E2%80%93-the-terminal-multiplexer-part-2/]! Every now and then, attach to the tmux session and look (<code>ctrl-b f</code>) for windows stuck on "is wrong", "is slow" or "......" loops, or which are inactive[http://unix.stackexchange.com/questions/78093/how-can-i-make-tmux-monitor-a-window-for-inactivity]. Even with a couple cores you can run a hundred instances, just make sure to have enough disk space for the occasional huge ones (tens of GB).<br />
<br />
=== BitTorrent downloads ===<br />
You can download and seed the torrents from the archive.org collection. Every item has a "Torrent" link.<br />
<br />
=== Old mirrors ===<br />
<span class="plainlinks"><br />
# [https://sourceforge.net/projects/wikiteam/files/ Sourceforge] (also mirrored to another 26 mirrors)<br />
# [http://www.archive.org/details/WikiTeamMirror Internet Archive] ([http://ia700705.us.archive.org/16/items/WikiTeamMirror/ direct link] to directory)<br />
</span><br />
<br />
=== Recursive ===<br />
<br />
We also have dumps for our coordination wikis:<br />
* [[ArchiveTeam wiki]] ([https://archive.org/details/wiki-archiveteamorg 2014-03-26])<br />
* [[WikiApiary]] ([https://archive.org/details/wiki-wikiapiarycom_w 2015-03-25])<br />
<br />
== Restoring wikis ==<br />
<br />
Anyone can restore a wiki using its XML dump and images.<br />
<br />
Wikis.cc is [https://www.wikis.cc/wiki/Wikis_recuperados restoring some sites].<br />
<br />
== References ==<br />
<references/><br />
<br />
== External links ==<br />
* http://wikiindex.org - A lot of wikis to save<br />
* http://wiki1001.com/ offline?<br />
<br />
* http://s23.org/wikistats/<br />
* http://en.wikipedia.org/wiki/Comparison_of_wiki_farms<br />
* http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive<br />
* http://blog.shoutwiki.com/<br />
* http://wikiheaven.blogspot.com/<br />
<br />
{{wikis}}<br />
<br />
[[Category:Archive Team]]<br />
[[Category:Wikis| ]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Miraheze&diff=26432Miraheze2016-10-03T14:11:33Z<p>Hydriz: Create new page with details on archive</p>
<hr />
<div>{{Infobox project<br />
| title = Miraheze<br />
| description = wiki hosting service<br />
| URL = https://meta.miraheze.org<br />
| project_status = {{online}}<br />
| archiving_status = {{partiallysaved}}<br />
| irc = wikiteam<br />
}}<br />
<br />
'''Miraheze''' is a non-profit MediaWiki wikifarm similar to [[Orain]] which was set up in August 2015. As of September 2016, they host about 1500 wikis (including private ones).<br />
<br />
== Hosting ==<br />
Based on their Finance page, they operate on a rather tight budget with hosting provided by RamNode.<ref>[https://meta.miraheze.org/wiki/Finance Finance - Miraheze Meta]</ref> As of September 2016, their whole cluster sits on approximately 12 CPU cores, 7 GB RAM and ~1 TB of disk space<ref>[https://ganglia.miraheze.org/?p=2&c=Miraheze Miraheze Ganglia]</ref>, hardly enough to survive a DDoS attack (which they have experienced multiple times before). The finances seems to suggest that they have enough assets to survive for about 1 year without any growth.<br />
<br />
The wikifarm also hosts the [https://allthetropes.org All The Tropes wiki], which is also the largest wiki that they host.<br />
<br />
== Backups ==<br />
Miraheze makes their own backups of their services regularly to an offsite server (provided by Backupsy).<ref>[https://meta.miraheze.org/wiki/Backups Backups - Miraheze Meta]</ref> [[WikiTeam]] also managed to grab about 685 public wikis hosted on its cluster in September 2016.<ref>[https://archive.org/details/wikifarm-miraheze.org-20160930 Archive.org item]</ref><br />
<br />
== See also ==<br />
* [[WikiTeam#Wikifarms]]<br />
<br />
== References ==<br />
<references /></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=WikiTeam&diff=24430WikiTeam2015-10-17T13:29:56Z<p>Hydriz: /* Wikis to archive */ nope, it's really automated now</p>
<hr />
<div>{{Infobox project<br />
| title = WikiTeam<br />
| image = Wikiteam.jpg<br />
| description = WikiTeam, we preserve wikis<br />
| URL = https://github.com/WikiTeam/wikiteam<br />
| project_status = {{online}} (at least some of them)<br />
| tracker = manual for now, check [https://wikiapiary.com/wiki/Category:Website_not_archived not archived wikis on wikiapiary]<br />
| archiving_status = {{inprogress}}<br />
| irc = wikiteam<br />
}}<br />
<br />
'''WikiTeam''' software is a set of tools for archiving wikis. They work on [[MediaWiki]] wikis, but we want to expand to other wiki engines. As of October 2015, WikiTeam has preserved more than 27,000 stand-alone.<br />
<br />
You can check [https://archive.org/details/wikiteam our collection] at [[Internet Archive]], the [https://github.com/WikiTeam/wikiteam source code] in [[GitHub]] and some [https://wikiapiary.com/wiki/Websites/WikiTeam lists of wikis by status] in [[WikiApiary]].<br />
<br />
== Current status ==<br />
<br />
The total number of MediaWiki wikis is unknown, but some estimates exist.<br />
<br />
According to [[WikiApiary]], which is the most updated database, there are 21,369 independent wikis (1,508 are semantic) and 4,554 in wikifarms.<ref>[https://wikiapiary.com/wiki/Websites Websites] - WikiApiary</ref> But it doesn't include [[Wikia]] 400,000+ wikis and the independent list coverage can be improved for sure.<br />
<br />
According to Pavlo's list generated in December 2008, there are 20,000 wikis.<ref>[http://cs.brown.edu/~pavlo/mediawiki/ Pavlo's list of wikis] ([http://www.cs.brown.edu/~pavlo/mediawiki/mediawikis.csv mediawiki.csv]) ([https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/mediawikis_pavlo.csv backup])</ref> This list was imported into WikiApiary.<br />
<br />
According to [[WikiIndex]], there are 20,698 wikis.<ref>[http://wikiindex.org/Special:Statistics WikiIndex Statistics]</ref><br />
<br />
A number of [[#Wikifarms|wikifarms]] have vanished and about 150 are still online.<ref>[https://wikiapiary.com/wiki/Farm:Farms Wikifarms]</ref><ref>[https://en.wikipedia.org/wiki/Comparison_of_wiki_hosting_services Comparison of wiki hosting services]</ref><ref>[http://wikiindex.org/Category:WikiFarm Category:WikiFarm]</ref><br />
<br />
Most wikis are small, containing about 100 pages or less, but there are some very large wikis:<ref>[http://meta.wikimedia.org/wiki/List_of_largest_wikis List of largest wikis]</ref><ref>[http://s23.org/wikistats/largest_html.php?th=15000&lines=500 List of largest wikis in the world]</ref><br />
* By number of pages: Wikimedia Commons (40 million), English Wikipedia (37 million), DailyWeeKee (35 million), WikiBusiness (22 million) and Wikdiata (19 million).<br />
* By number of files: Wikimedia Commons (28 million), English Wikipedia (800,000).<br />
<br />
The oldest dumps are probably some 2001 dumps of Wikipedia when it used UseModWiki.<ref>[https://dumps.wikimedia.org/archive/ Wikimedia Downloads Historical Archives]</ref><ref>[http://dumps.wikimedia.org/nostalgiawiki Dump] of [http://nostalgia.wikipedia.org/ Nostalgia], an ancient version of Wikipedia from 2001</ref><br />
<br />
As of October 2015, our collection at Internet Archive holds dumps for 27,398 wikis (including independent, wikifarm wikis, some packages of wikis and Wiki[pm]edia).<ref>[https://archive.org/details/wikiteam WikiTeam collection] at Internet Archive</ref><br />
<br />
== Wikifarms ==<br />
<br />
There are also wikifarms with hundreds of wikis. Here we only create pages for those we have some special information about that we don't want to lose (like archiving history and tips). For a full list, please use WikiApiary [https://wikiapiary.com/wiki/Farm:Main_Page wikifarms main page].<br />
<br />
Before backing up a wikifarm, try to update the list of wikis for it. There are [https://github.com/WikiTeam/wikiteam/tree/master/listsofwikis/mediawiki Python scripts to generate those lists] for many wikifarms.<br />
<br />
{| class="wikitable sortable plainlinks" style="text-align: center;"<br />
! width=140px | Wikifarm !! width=80px | Wikis !! Status !! width=80px | Dumps !! Comments<br />
|-<br />
| [[Battlestar Wiki]] ([http://battlestarwiki.org site]) || 8 || {{green|Online}} || ? || <br />
|-<br />
| [[BluWiki]] ([http://wayback.archive.org/web/20090301060338/http://bluwiki.com/go/Main_Page site]) || ? || {{red|Offline}} || ? || <br />
|-<br />
| [[Edit.This]] ([http://editthis.info site]) || 1,350<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/editthis.info editthis.info - list of wikis]</ref> || {{yellow|Unstable}} || 1307+ (IA: 1,297<ref>[https://archive.org/search.php?query=editthis%20subject%3Awikiteam editthis.info - dumps]</ref>) || Most dumps were done in 2014. This wikifarm is not well covered in WikiApiary.<ref>[https://wikiapiary.com/wiki/Farm:EditThis Farm:EditThis]</ref><br />
|-<br />
| [[elwiki.com]] ([https://web.archive.org/web/20070917110429/http://www.elwiki.com/ site]) || Unknown<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/elwiki.com elwiki.com - list of wikis]</ref> || {{red|Offline}} || None<ref>[https://archive.org/search.php?query=elwiki%20subject%3Awikiteam elwiki.com - dumps]</ref> || Last seen online in 2008.<ref>[https://web.archive.org/web/20080221125135/http://www.elwiki.com/ We're sorry about the downtime we've been having lately]</ref> There is no dumps, presumably lost. Perhaps [https://web.archive.org/web/form-submit.jsp?type=prefixquery&url=http://elwiki.com/ some pages] are in the Wayback Machine.<br />
|-<br />
| [[Miraheze]] ([https://meta.miraheze.org site]) || ? || {{green|Online}} || ? || <br />
|-<br />
| [[Neoseeker.com]] ([https://neowiki.neoseeker.com site])|| 229<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/neoseeker.com neoseeker.com - list of wikis]</ref> || {{green|Online}} || 159<ref>[https://archive.org/search.php?query=neoseeker+subject%3Awikiteam neoseeker.com - dumps]</ref> || Check why there are dozens of wikis without dump.<br />
|-<br />
| [[Orain]] ([https://meta.orain.org site]) || 425<ref>[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/listsofwikis/mediawiki/orain.org orain.com - list of wikis]</ref> || {{red|Offline}} || ~380<ref>[https://archive.org/search.php?query=orain%20subject%3Awikiteam orain - dumps]</ref><ref>[https://archive.org/details/wikifarm-orain.org-20130824 Orain wikifarm dump (August 2013)]</ref> || Last seen online in September 2015. Dumps were made in August 2013, January 2014 and August 2015.<br />
|-<br />
| [[Referata]] ([http://www.referata.com site]) || 156<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/referata.com referata.com - list of wikis]</ref> || {{green|Online}} || ~80<ref>[https://archive.org/search.php?query=referata%20subject%3Awikiteam referata.com - dumps]</ref><ref>[https://archive.org/details/referata.com-20111204 Referata wikifarm dump 20111204]</ref><ref>[https://archive.org/details/wikifarm-referata.com-20130824 Referata wikifarm dump (August 2013)]</ref> || Check why there are dozens of wikis without dump.<br />
|-<br />
| [[ScribbleWiki]] ([http://scribblewiki.com site]) || 119<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/scribblewiki.com scribblewiki.com - list of wikis]</ref> || {{red|Offline}} || None<ref>[https://archive.org/search.php?query=scribblewiki%20subject%3Awikiteam scribblewiki.com - dumps]</ref> || Last seen online in 2008.<ref>[https://web.archive.org/web/20080404093502/http://scribblewiki.com/main.php What is ScribbleWiki?]</ref> There is no dumps, presumably lost. Perhaps [https://web.archive.org/web/form-submit.jsp?type=prefixquery&url=http://scribblewiki.com/ some pages] are in the Wayback Machine.<br />
|-<br />
| [[ShoutWiki]] ([http://www.shoutwiki.com site]) || 1,879<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/shoutwiki.com shoutwiki.com - list of wikis]</ref> || {{green|Online}} || ~1,300<ref>[https://archive.org/search.php?query=shoutwiki%20subject%3Awikiteam shoutwiki.com - dumps]</ref><ref>[http://www.archive.org/details/shoutwiki.com ShoutWiki wikifarm dump]</ref> || Check why there are dozens of wikis without dump.<br />
|-<br />
| [[Sourceforge]] || ? || {{green|Online}} || 315<ref>[https://archive.org/search.php?query=sourceforge%20subject%3Awikiteam sourceforge - dumps]</ref> || <br />
|-<br />
| [[TropicalWikis]] ([http://tropicalwikis.com site]) || 187<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/tropicalwikis.com tropicalwikis.com - list of wikis]</ref> || {{red|Offline}} || 152<ref>[https://archive.org/search.php?query=tropicalwikis%20subject%3Awikiteam tropicalwikis.com - dumps]</ref> || Killed off in November 2013. Allegedly pending move to [[Orain]] (which became offline too). Data from February 2013 and earlier saved.<br />
|-<br />
| [[Wik.is]] ([http://wik.is site]) || ? || {{red|Offline}} || ? || Non-MediaWiki.<br />
|-<br />
| [[Wiki-Site]] ([http://www.wiki-site.com site]) || 5,839<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/wiki-site.com wiki-site.com - list of wikis]</ref> || {{green|Online}} || 367 || No uploaded dumps yet.<br />
|-<br />
| [[Wikia]] ([http://www.wikia.com site]) || 400,000<ref>[https://github.com/WikiTeam/wikiteam/blob/master/listsofwikis/mediawiki/wikia.com wikia.com - list of wikis]</ref> || {{green|Online}} || ~34,000<ref>[https://archive.org/details/wikia_dump_20121204 Wikia wikis data dumps]</ref> || [http://community.wikia.com/wiki/Help:Database_download Help:Database download], [https://github.com/Wikia/app/tree/dev/extensions/wikia/WikiFactory/Dumps Their dumping code]<br />
|-<br />
| [[WikiHub]] ([http://wikihub.ssu.lt site]) || ? || {{red|Offline}} || 7<ref>[https://archive.org/details/wikifarm-wikihub.ssu.lt-20131110 wikihub - dumps]</ref> || <br />
|-<br />
| [[Wiki.Wiki]] ([https://wiki.wiki site]) || 100<ref>[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/listsofwikis/mediawiki/wiki.wiki wiki.wiki - list of wikis]</ref> || {{green|Online}} || ? || <br />
|-<br />
| [[Wikkii]] ([https://web.archive.org/web/20140621054654/http://wikkii.com/wiki/Free_Wiki_Hosting site]) || 3,267 || {{red|Offline}} || 1,300<ref>[https://archive.org/search.php?query=wikkii%20subject%3Awikiteam wikki.com - dumps]</ref> || <br />
|-<br />
| [[YourWiki.net]] ([https://web.archive.org/web/20100124003107/http://www.yourwiki.net/wiki/YourWiki site]) || ? || {{red|Offline}} || ? || <br />
|}<br />
<br />
== Wikis to archive ==<br />
<br />
Please [https://wikiapiary.com/wiki/Special:FormEdit/Website add a wiki to WikiApiary] if you want someone to archive it sooner or later; or tell us on the #wikiteam channel if it's particularly urgent. Remember that there are thousands of wikis we don't even know about yet.<br />
<br />
[https://github.com/WikiTeam/wikiteam/wiki/Tutorial You can help] downloading wikis yourself. If you don't know where to start, pick a [https://wikiapiary.com/wiki/Category:Website_not_archived wiki which was not archived yet] from the lists on WikiApiary. Also, you can edit those pages to link existing dumps! You'll help others focus their work.<br />
<br />
Examples of huge wikis:<br />
<br />
* '''[[Wikipedia]]''' - arguably the largest and one of the oldest wikis on the planet. It offers public backups (also for sister projects): http://dumps.wikimedia.org<br />
** They have some mirrors but not many.<br />
** The transfer of the dumps to the Internet Archive is automated and is currently managed by [[User:Hydriz|Hydriz]].<br />
<br />
* '''[[Wikimedia Commons]]''' - a wiki of media files available for free usage. It offers public backups: http://dumps.wikimedia.org<br />
** But there is no image dump available, only the image descriptions<br />
** So we made it! http://archive.org/details/wikimediacommons<br />
<br />
* '''[[Wikia]]''' - a website that allows the creation and hosting of wikis. Doesn't make regular backups.<br />
<br />
We're trying to decide which [https://groups.google.com/forum/#!topic/wikiteam-discuss/TxzfrkN4ohA other wiki engines] to work on: suggestions needed!<br />
<br />
== Tools and source code ==<br />
=== Official WikiTeam tools ===<br />
* [https://github.com/WikiTeam/wikiteam WikiTeam in GitHub]<br />
* '''[https://raw.githubusercontent.com/WikiTeam/wikiteam/master/dumpgenerator.py dumpgenerator.py] to download MediaWiki wikis:''' <tt>python dumpgenerator.py --api=http://archiveteam.org/api.php --xml --images</tt><br />
* [https://raw.githubusercontent.com/WikiTeam/wikiteam/master/wikipediadownloader.py wikipediadownloader.py] to download Wikipedia dumps from download.wikimedia.org: <tt>python wikipediadownloader.py</tt><br />
<br />
=== Other ===<br />
* [http://dl.dropbox.com/u/63233/Wikitravel/Source%20Code%20and%20tools/Source%20Code%20and%20tools.7z Scripts of a guy who saved Wikitravel]<br />
* [http://www.communitywiki.org/en/BackupThisWiki OddMuseWiki backup]<br />
* UseModWiki: use wget/curl and [http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/RawMode raw mode] (might have a different URL scheme, like [http://meatballwiki.org/wiki/action=browse&id=TheTippingPoint&raw=1 this])<br />
** Some wikis: [[UseMod:SiteList]]<br />
<br />
== Wiki dumps ==<br />
<br />
Most of our dumps are in the [http://www.archive.org/details/wikiteam wikiteam collection at the Internet Archive]. If you want an item to land there, just upload it in "opensource" collection and remember the "WikiTeam" keyword, it will be moved at some point. When you've uploaded enough wikis, you'll probably be made a collection admin to save others the effort to move your stuff.<br />
<br />
For a manually curated list, [https://github.com/WikiTeam/wikiteam/wiki/Available-Backups visit the download section] on GitHub.<br />
<br />
There is another site of MediaWiki dumps located [http://mirrors.sdboyd56.com/WikiTeam/index.html here] on [http://www.archiveteam.org/index.php?title=User:Sdboyd Scott's] website.<br />
<br />
=== Tips ===<br />
Some tips:<br />
* When downloading Wikipedia/Wikimedia Commons dumps, pages-meta-history.xml.7z and pages-meta-history.xml.bz2 are the same, but 7z use to be smaller (better compress ratio), so use 7z.<br />
* To download a mass of wikis with N parallel threads, just <code>split</code> your full <code>$list</code> in N chunks, then start N instances of <code>launcher.py</code> ([https://github.com/WikiTeam/wikiteam/wiki/Tutorial#Download_a_list_of_wikis tutorial]), one for each list<br />
** If you want to upload dumps as they're ready and clean up your storage: at the same time, in a separate window or screen, run a loop of the kind <code>while true; do ./uploader.py $list --prune-directories --prune-wikidump; sleep 12h; done;</code> (the <code>sleep</code> ensure each run has something to do). <br />
** If you want to go advanced and run really ''many'' instances, use <code>tmux</code>[http://blog.hawkhost.com/2010/07/02/tmux-%E2%80%93-the-terminal-multiplexer-part-2/]! Every now and then, attach to the tmux session and look (<code>ctrl-b f</code>) for windows stuck on "is wrong", "is slow" or "......" loops, or which are inactive[http://unix.stackexchange.com/questions/78093/how-can-i-make-tmux-monitor-a-window-for-inactivity]. Even with a couple cores you can run a hundred instances, just make sure to have enough disk space for the occasional huge ones (tens of GB).<br />
<br />
=== BitTorrent downloads ===<br />
You can download and seed the torrents from the archive.org collection. Every item has a "Torrent" link.<br />
<br />
=== Old mirrors ===<br />
<span class="plainlinks"><br />
# [https://sourceforge.net/projects/wikiteam/files/ Sourceforge] (also mirrored to another 26 mirrors)<br />
# [http://www.archive.org/details/WikiTeamMirror Internet Archive] ([http://ia700705.us.archive.org/16/items/WikiTeamMirror/ direct link] to directory)<br />
</span><br />
<br />
=== Recursive ===<br />
<br />
We also have dumps for our coordination wikis:<br />
* ArchiveTeam wiki ([https://archive.org/details/wiki-archiveteamorg 2014-03-26])<br />
* WikiApiary ([https://archive.org/details/wiki-wikiapiarycom_w 2015-03-25])<br />
<br />
== References ==<br />
<references/><br />
<br />
== External links ==<br />
* http://wikiindex.org - A lot of wikis to save<br />
* http://wiki1001.com/ offline?<br />
<br />
* http://s23.org/wikistats/<br />
* http://en.wikipedia.org/wiki/Comparison_of_wiki_farms<br />
* http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive<br />
* http://blog.shoutwiki.com/<br />
* http://wikiheaven.blogspot.com/<br />
<br />
<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=21200Wikimedia Commons2014-12-27T14:57:49Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
An auto-generated size table is available [https://commons.wikimedia.org/wiki/Special:MediaStatistics here] (current versions only)<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?hiRjWTKKArP2iZP8_Ti6Fg March 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />[http://p.defau.lt/?vqWYYL8qVPuC9ZDQ60pO3g May 2007]<br />[http://p.defau.lt/?8gU8m9B2grwFoUh_Py1uFA June 2007]<br />[http://p.defau.lt/?tFg2kXsZ0TqummGyuOgj6Q July 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26, 2007-03-07, 2007-03-13, 2007-03-25, 2007-03-30, 2007-04-12, 2007-04-14, 2007-04-20, 2007-05-04, 2007-05-08, 2007-05-10, 2007-05-29, 2007-06-05, 2007-06-22.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hosting services]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8886Wikimedia Commons2012-10-03T09:32:46Z<p>Hydriz: /* Errors */ Adding first lest I lose the tracking file</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?hiRjWTKKArP2iZP8_Ti6Fg March 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />[http://p.defau.lt/?vqWYYL8qVPuC9ZDQ60pO3g May 2007]<br />[http://p.defau.lt/?8gU8m9B2grwFoUh_Py1uFA June 2007]<br />[http://p.defau.lt/?tFg2kXsZ0TqummGyuOgj6Q July 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26, 2007-03-07, 2007-03-13, 2007-03-25, 2007-03-30, 2007-04-12, 2007-04-14, 2007-04-20, 2007-05-04, 2007-05-08, 2007-05-10, 2007-05-29, 2007-06-05, 2007-06-22.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8838Wikimedia Commons2012-09-02T03:39:19Z<p>Hydriz: /* Volunteers */ +July 2007</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?hiRjWTKKArP2iZP8_Ti6Fg March 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />[http://p.defau.lt/?vqWYYL8qVPuC9ZDQ60pO3g May 2007]<br />[http://p.defau.lt/?8gU8m9B2grwFoUh_Py1uFA June 2007]<br />[http://p.defau.lt/?tFg2kXsZ0TqummGyuOgj6Q July 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8837Wikimedia Commons2012-08-31T08:11:19Z<p>Hydriz: /* Volunteers */ +June 2007</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?hiRjWTKKArP2iZP8_Ti6Fg March 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />[http://p.defau.lt/?vqWYYL8qVPuC9ZDQ60pO3g May 2007]<br />[http://p.defau.lt/?8gU8m9B2grwFoUh_Py1uFA June 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8833Wikimedia Commons2012-08-29T14:07:38Z<p>Hydriz: /* Volunteers */ +May 2007</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?hiRjWTKKArP2iZP8_Ti6Fg March 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />[http://p.defau.lt/?vqWYYL8qVPuC9ZDQ60pO3g May 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8760Wikimedia Commons2012-08-19T14:51:40Z<p>Hydriz: /* Volunteers */</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?hiRjWTKKArP2iZP8_Ti6Fg March 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8746Wikimedia Commons2012-08-17T14:51:51Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />[http://p.defau.lt/?_CtoP2saoRWMFCQDoMb91w April 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=WikiTeam&diff=8743WikiTeam2012-08-16T12:33:15Z<p>Hydriz: </p>
<hr />
<div><center><big>'''We save wikis, from Wikipedia to tiniest wikis'''<br/>[http://code.google.com/p/wikiteam/downloads/list?can=1 130+ wikis saved to date]</big></center><br />
{{Infobox project<br />
| title = WikiTeam<br />
| image = Wikiteam.jpg<br />
| description = WikiTeam, a set of tools for wiki preservation and a repository of wikis<br />
| URL = http://code.google.com/p/wikiteam<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| irc = wikiteam<br />
}}<br />
<br />
Welcome to '''WikiTeam'''. A '''wiki''' is a website that allows the creation and editing of any number of interlinked web pages, generally used to store information on a specific subject or subjects. This is done with a day-to-day web browser using a simplified markup language (HTML as an example) or a WYSIWYG (what-you-see-is-what-you-get) text editor.<br />
<br />
Examples of huge wikis:<br />
* '''[[Wikipedia]]''' - arguably the largest and one of the oldest Wikis on the planet. It offers public backups: http://dumps.wikimedia.org<br />
* '''[[Wikimedia Commons]]''' - a Wiki of media files available for free usage. It offers public backups: http://dumps.wikimedia.org<br />
** But there is no image dump available, only the image descriptions<br />
* '''[[Wikia]]''' - a website that allows the creation and hosting of wikis. It offers public backups: http://wiki-stats.wikia.com<br />
<br />
There are also '''[[List of wikifarms|several wikifarms]]''' with hundreds of wikis.<br />
<br />
Most of the wikis don't offer public backups. How bad!<br />
<br />
== Tools and source code ==<br />
=== Official WikiTeam tools ===<br />
* [http://code.google.com/p/wikiteam/ WikiTeam Google Code repository]<br />
* '''[http://code.google.com/p/wikiteam/source/browse/trunk/dumpgenerator.py dumpgenerator.py] to download MediaWiki wikis:''' <tt>python dumpgenerator.py --api=http://archiveteam.org/api.php --xml --images</tt><br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py wikipediadownloader.py] to download Wikipedia dumps from download.wikimedia.org: <tt>python wikipediadownloader.py</tt><br />
<br />
=== Other ===<br />
* [http://dl.dropbox.com/u/63233/Wikitravel/Source%20Code%20and%20tools/Source%20Code%20and%20tools.7z Scripts of a guy who saved Wikitravel]<br />
* [http://www.communitywiki.org/en/BackupThisWiki OddMuseWiki backup]<br />
* UseModWiki: use wget/curl and [http://www.usemod.com/cgi-bin/wiki.pl?WikiPatches/RawMode raw mode] (might have a different URL scheme, like [http://meatballwiki.org/wiki/action=browse&id=TheTippingPoint&raw=1 this])<br />
<br />
{{-}}<br />
<br />
== Wiki dumps ==<br />
For a more detailed list, [http://code.google.com/p/wikiteam/downloads/list?can=1 visit the download section] on Google Code.<br />
<br />
There is another site of MediaWiki dumps located [http://mirrors.sdboyd56.com/WikiTeam/index.html here] on [http://www.archiveteam.org/index.php?title=User:Sdboyd Scott's] Website. More dumps are available as a collection in the [http://www.archive.org/details/wikiteam Internet Archive].<br />
<br />
TODO lists:<br />
* [[WikiTeam/Sites using MediaWiki (English)]]<br />
* [[WikiTeam/Sites using MediaWiki (Multilingual)]]<br />
* Backup your favorite wikis or leave the URL [[Talk:WikiTeam|here]].<br />
<br />
{| class="wikitable"<br />
| colspan=2 | '''Legend'''<br />
|-<br />
| style="background: lightgreen" |&nbsp;&nbsp;&nbsp;&nbsp;<br />
| Good<br />
|-<br />
| style="background: lightyellow" |&nbsp;&nbsp;&nbsp;&nbsp;<br />
| Could be better<br />
|-<br />
| style="background: lightcoral" |&nbsp;&nbsp;&nbsp;&nbsp;<br />
| Bad<br />
|-<br />
| &nbsp;&nbsp;&nbsp;&nbsp;<br />
| Unknown<br />
|-<br />
|}<br />
{| class="wikitable" border=1 width=99% style="text-align: center;"<br />
! Wiki !! Wiki is online? !! Dumps available? (official or home-made) !! Comments/Details !! Saved by us? Who? Where?<br />
|-<br />
| [http://s23.org/wikistats/anarchopedias_html.php Anarchopedias] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: [http://www.mediafire.com/file/t73az9cwhzco2wb/Anarchopedia_Jun2011.7z Yes] || - || idiolect<br />
|-<br />
| [http://archiveteam.org Archive Team Wiki] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: [http://code.google.com/p/wikiteam/downloads/list?can=1&q=archiveteam yes] || - || WikiTeam <br />
|-<br />
| Bulbapedia || style="background: lightgreen" | Yes || style="background: lightcoral" | Official: no. Home-made: no || - || dr-spangle is working on it with a self-built PHP downloader<br />
|-<br />
| [[Citizendium]] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: [http://en.citizendium.org/wiki/CZ:Downloads daily] (no full history). Home-made: [[Citizendium|yes]], April 2011 || style="background: lightyellow" | No image dumps available || -<br />
|-<br />
| [http://s23.org/wikistats/editthis_html.php EditThis] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: in progress || - || - <br />
|-<br />
| enciclopedia.us.es || style="background: lightgreen" | Yes || style="background: lightcoral" | Official: no. Home-made: no || style="background: lightyellow" | Sysop sent me page text sql tables || emijrp<br />
|-<br />
| [[Encyclopedia Dramatica]] || style="background: lightcoral" | No || style="background: lightyellow" | Official: no. Home-made: partial || style="background: lightyellow" | WebEcology Project Article Dump (~9000 Articles)<br />Most of the Images probably Lost || -<br />
|-<br />
| [[Encyclopedia Dramatica|Encyclopedia Dramatica.ch]]<br />(new ED) || style="background: lightgreen" | Yes || Official: ? Home-made: ? || style="background: lightyellow" | Slowly being rebuilt from old sources.<br />Should be up for a while but for who knows how long? || -<br />
|-<br />
| [http://s23.org/wikistats/gentoo_html.php Gentoo wikis] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: [http://code.google.com/p/wikiteam/downloads/list?can=1&q=gentoo yes] || - || WikiTeam<br />
|-<br />
| GNUpedia || style="background: lightcoral" | No || style="background: lightcoral" | Official: no. Home-made: no || style="background: lightcoral" | No database. This "wiki encyclopedia" was only HTML pages. Only ~3 articles were sent to the mailing list. After that, the project was closed || -<br />
|-<br />
| [[MeatBall]] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: [http://mirrors.sdboyd56.com/WikiTeam/meatball_wiki-20110706.7z yes] ([http://code.google.com/p/wikiteam/downloads/detail?name=meatball_wiki-20110706.7z mirror]) || style="background: lightyellow" | No histories, no xml format || SDBoyd <br />
|-<br />
| [http://s23.org/wikistats/metapedias_html.php Metapedia] || style="background: lightgreen" | Yes || style="background: lightcoral" | Official: ?. Home-made: no || - || -<br />
|-<br />
| [http://s23.org/wikistats/scoutwiki_html.php Neoseeker] aka Scout wikis || style="background: lightgreen" | Yes || style="background: lightcoral" | Official: ?. Home-made: no || - || -<br />
|-<br />
| [[Nupedia]] || style="background: lightcoral" | No || style="background: lightyellow" | Official: ?. Home-made: Yes, saved from IA || - || - <br />
|-<br />
| OmegaWiki || style="background: lightgreen" | Yes || style="background: lightgreen" | Official: [http://www.omegawiki.org/Development daily] || - || - <br />
|-<br />
| OpenStreetMap || style="background: lightgreen" | Yes || Official: Yes. Home-made: no || - || - <br />
|-<br />
| [http://s23.org/wikistats/opensuse_html.php OpenSUSE wikis] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: [http://code.google.com/p/wikiteam/downloads/list?can=1&q=opensuse yes] || - || Hydriz<br />
|-<br />
| OSDev || style="background: lightgreen" | Yes || style="background: lightgreen" | Official: [http://wiki.osdev.org/OSDev_Wiki:About weekly] || - || Not yet<br />
|-<br />
| [http://tvtropes.org TV Tropes] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: No Unofficial: In progress || style="background: lightyellow" | No dump mechanism, using wget -nc -r -p -l 0 -np -w 45 -E -k -T 10 -nv -x "http://tvtropes.org" || DoubleJ<br />
|-<br />
| [http://s23.org/wikistats/uncyclomedia_html.php Uncyclomedias] || style="background: lightgreen" | Yes || - ||style="background: lightyellow" |Some dumps available on [http://download.uncyc.org download.uncyc.org]. Korean and Russian are not here as those are independently-hosted. || -<br />
|-<br />
| Wikanda || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: no. Home-made: [http://code.google.com/p/wikiteam/downloads/list?can=1&q=wikanda yes] || - || emijrp<br />
|-<br />
| [[Wikia]] || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: [http://wiki-stats.wikia.com/ on demand] || style="background: lightyellow" | No image dumps available || Not yet <br />
|-<br />
| [http://wikifur.com WikiFur] || style="background: lightgreen" | Yes || style="background: lightgreen" | Official: [http://dumps.wikifur.com/ yes] || style="background: lightyellow" | No image dumps available || Not yet <br />
|-<br />
| WikiHow || - || - || - || -<br />
|-<br />
| [[Wikimedia Commons]] || style="background: lightgreen" | Yes || style="background: lightgreen" | Official: [http://dumps.wikimedia.org/commonswiki/latest/ periodically] || style="background: lightyellow" | No image dumps available || Not yet <br />
|- <br />
| [[Wikipedia]] || style="background: lightgreen" | Yes || style="background: lightgreen" | Official: [http://dumps.wikimedia.org/backup-index.html periodically] || style="background: lightgreen" | [http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/ Yes] || Not yet <br />
|-<br />
| [http://s23.org/wikistats/wikisite_html.php Wiki-site.com] || - || - || - || -<br />
|-<br />
| WikiTravel || style="background: lightgreen" | Yes || style="background: lightyellow" | Official: [http://wikitravel.org/en/Wikitravel:Database_dump not yet]. Home-made: [http://code.google.com/p/wikiteam/downloads/list?can=1&q=wikitravel yes], another of [http://dl.dropbox.com/u/63233/Wikitravel/Complete%20zip/WikitravelComplete14-June-2010.7z 2010-06-14] || - || WikiTeam <br />
|-<br />
| WikiWikiWeb || style="background: lightgreen" | Yes || style="background: lightyellow" | Home-made: [http://www.multiupload.com/BGGCFUHOE7 yes] || - || Ca7 <br />
|-<br />
| [http://co-forum.de/ (o:forum] || style="background: lightgreen" | Yes || style="background: lightcoral" | No || - || Not yet, to figure out how<br />
|-<br />
| [http://www.wikiwiki.de/newwiki/pmwiki.php WikiWiki.de] || style="background: lightgreen" | Yes || style="background: lightcoral" | No || - || Not yet, to figure out how<br />
|-<br />
| [http://www.wikiservice.at/gruender/wiki.cgi?action=HomePage GruenderWiki] || style="background: lightgreen" | Yes || style="background: lightcoral" | No || - || Not yet, to figure out how<br />
|}<br />
<br />
=== Tips ===<br />
Some tips:<br />
* When downloading Wikipedia/Wikimedia Commons dumps, pages-meta-history.xml.7z and pages-meta-history.xml.bz2 are the same, but 7z use to be smaller (better compress ratio), so use 7z.<br />
<br />
=== BitTorrent downloads ===<br />
A feed of BitTorrent downloads is available for the latest files posted to the [http://code.google.com/p/wikiteam/downloads/list WikiTeam Google Code Downloads].<br />
* [http://pipes.yahoo.com/lobstor/google_code_torrent?_render=rss&project=wikiteam WikiTeam Torrent Feed] (pipes.yahoo.com)<br />
Files under 1 MB are blocked on the service generating these torrents (Burnbit.com), so not every file is available as a torrent. There may be some delay after a file is uploaded before the torrent appears on the feed. You can subscribe to this feed in your BitTorrent client for automatic downloads (this has been tested successfully in µTorrent on Windows).<br />
<br />
=== Mirrors ===<br />
<span class="plainlinks"><br />
# [https://sourceforge.net/projects/wikiteam/files/ Sourceforge] (also mirrored to another 26 mirrors)<br />
# [http://www.archive.org/details/WikiTeamMirror Internet Archive] ([http://ia700705.us.archive.org/16/items/WikiTeamMirror/ direct link] to directory)<br />
</span><br />
<br />
== Closing/In danger ==<br />
* Gentoo wikis: Error 503 Service Unavailable as of 2011-04-06 http://s23.org/wikistats/gentoo_html.php<br />
** Again up. [http://code.google.com/p/wikiteam/downloads/list?can=1&q=gentoo Saved]! [[User:Emijrp|Emijrp]] 21:30, 10 April 2011 (UTC)<br />
<br />
== See also ==<br />
* [[List of wikifarms]]<br />
<br />
== External links ==<br />
* http://wikiindex.org - A lot of wikis to save<br />
* http://wiki1001.com/ offline?<br />
* http://www.cs.brown.edu/~pavlo/mediawiki/mediawikis.csv - 20,000 wikis<br />
* http://meta.wikimedia.org/wiki/List_of_largest_wikis<br />
* http://s23.org/wikistats/<br />
* http://en.wikipedia.org/wiki/Comparison_of_wiki_farms<br />
* http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive<br />
* http://blog.shoutwiki.com/<br />
* http://wikiheaven.blogspot.com/<br />
* [http://s23.org/wikistats/largest_html.php?th=15000&lines=500 List of largest wikis in the world]<br />
* Dump of [http://nostalgia.wikipedia.org/ nostalgia], an ancient version of Wikipedia from 2001, [http://dumps.wikimedia.org/nostalgiawiki dump]<br />
* http://code.google.com/p/wikiteam/downloads/list?can=1 many dumps<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Citizendium&diff=8726Citizendium2012-08-12T03:05:29Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Citizendium<br />
| image = Welcome to Citizendium - Citizendium 1292887672746.png<br />
| description = Citizendium mainpage in 2010-12-21<br />
| URL = http://citizendium.org<br />
| project_status = {{online}}<br />
| archiving_status = {{saved}}<br />
}}<br />
<br />
The '''Citizendium''' is a [[wiki]]-based on-line encyclopedia that constantly "pursuits the highest standards of writing, reliability, and comprehensiveness". It is basically Wikipedia with higher standards.<br />
<br />
The latest complete dump was created on 2012-08-09 and is available on the [http://archive.org/details/wiki-encitizendiumorg Internet Archive].<br />
<br />
== April 2011 dump ==<br />
Using the Wikiteam script, I made a dump of the full histories up to April 2011. They are available through [http://www.megaupload.com/?d=BO69BM9E Megaupload] and [http://www.archive.org/details/Citizendium2011-04-14Wikidump Internet Archive]. —[[User:Tom Morris|Tom Morris]] 12:07, 18 April 2011 (UTC)<br />
<br />
== External links ==<br />
* http://citizendium.org<br />
* Dumps available (but only the last version of the page, not the whole history): http://en.citizendium.org/wiki/CZ:Downloads<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Google_Code&diff=8725Google Code2012-08-12T02:59:14Z<p>Hydriz: some expansion</p>
<hr />
<div>{{Infobox project<br />
| title = Google Code<br />
| image = Google_Code_1303511937361.png<br />
| description = <br />
| URL = {{url|1=http://code.google.com|2=Google Code}}<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''Google Code''' (AKA Project Hosting) is a software repository that is owned by [[Google]]. It hosts only open source software paired with an open source license.<ref>[https://code.google.com/p/support/wiki/FAQ#Hosting_Your_Open_Source_Project_on_Google_Code FAQ - support - Project Hosting on Google Code FAQ - User support for Google Project Hosting - Google Project Hosting]</ref><br />
<br />
Google Code allows people to commit their code into either a Subversion (SVN), Git or Mercurial repository. It has a downloads section for people to upload their software packages (with a quota limit of 4GB, can be increased upon request) and also a wiki for projects to document their work at. There is also an issue tracker to track bugs in the project's software.<br />
<br />
{{expand}}<br />
<br />
== Vital signs ==<br />
Looks relatively stable.<br />
<br />
== References ==<br />
<references /><br />
<br />
== External links ==<br />
* {{url|1=http://code.google.com|2=Google Code}}<br />
<br />
{{Navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=GitHub&diff=8724GitHub2012-08-12T02:50:01Z<p>Hydriz: update!</p>
<hr />
<div>{{Infobox project<br />
| title = GitHub<br />
| logo = GitHub_logo.png<br />
| image = GitHub 1303511667338.png<br />
| description = A screen shot of the GitHub home page taken on 22 April 2011.<br />
| URL = {{url|1=https://github.com/|2=GitHub}}<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''GitHub''' is a software repository powered by Git. Does not seem to have any site issues, often 24 hours uptime (see [http://status.github.com/ site status]). Looks pretty sunny at the moment, but when disaster strikes it would be a problem archiving the private repositories.<br />
<br />
As of 12th August 2012: 1,963,652 people hosting over 3,460,582 repositories [https://github.com/search?type=Repositories&q=fork%3Atrue 1,117,147 public repositories] are forks, which greatly reduces the amount of data required to archive it.<br />
<br />
{{expand}}<br />
<br />
== Backup tools ==<br />
<br />
"git clone" is the simplest one. However, it does not get some project data that is not stored in git, including issue reports, comments, pull requests. <br />
<br />
[http://github.com/joeyh/github-backup github-backup] runs in a git repository and chases down that information, <br />
committing it to a "github" branch. It also chases down the forks and efficiently downloads them as well.<br />
<br />
[http://www.githubarchive.org/ githubarchive.org] is creating an archive of the github "timeline", that is, all events like git pushes, forks, created issues, pull requests, … .<br />
<br />
== External links ==<br />
* {{url|1=https://github.com/|2=GitHub}}<br />
<br />
{{Navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8723Wikimedia Commons2012-08-11T16:24:00Z<p>Hydriz: February 2007 also done</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />[http://p.defau.lt/?AEEN0fKRzawfC_8x2kBg5A February 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
* Also issue 45: 2007-01-01, 2007-01-06, 2007-01-14, 2007-01-15, 2007-02-06, 2007-02-13, 2007-02-22, 2007-02-26.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Citizendium&diff=8722Citizendium2012-08-11T15:47:55Z<p>Hydriz: some updates about dumps</p>
<hr />
<div>{{Infobox project<br />
| title = Citizendium<br />
| image = Welcome to Citizendium - Citizendium 1292887672746.png<br />
| description = Citizendium mainpage in 2010-12-21<br />
| URL = http://citizendium.org<br />
| project_status = {{online}}<br />
| archiving_status = {{saved}}<br />
}}<br />
<br />
The '''Citizendium''' is a [[wiki]]-based on-line encyclopedia that constantly "pursuits the highest standards of writing, reliability, and comprehensiveness". It is basically Wikipedia with higher standards.<br />
<br />
The latest complete dump was created on 2012-04-15 and is available on the [http://archive.org/details/wiki-encitizendiumorg Internet Archive]. An even newer dump created on 2012-08-09 is complete, but yet to be published on the Internet Archive.<br />
<br />
== April 2011 dump ==<br />
Using the Wikiteam script, I made a dump of the full histories up to April 2011. They are available through [http://www.megaupload.com/?d=BO69BM9E Megaupload] and [http://www.archive.org/details/Citizendium2011-04-14Wikidump Internet Archive]. —[[User:Tom Morris|Tom Morris]] 12:07, 18 April 2011 (UTC)<br />
<br />
== External links ==<br />
* http://citizendium.org<br />
* Dumps available (but only the last version of the page, not the whole history): http://en.citizendium.org/wiki/CZ:Downloads<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8721Wikimedia Commons2012-08-11T15:07:48Z<p>Hydriz: January 2007 done</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' || Check:<br />[http://p.defau.lt/?Xd3HIsjWEvpOW4LykpA1SA January 2007]<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8719Wikimedia Commons2012-08-10T14:08:21Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2008-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=8712Wikimedia Commons2012-08-09T09:42:03Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2007-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2007-01-01 || 2007-12-31 || ? || ? || r349 || ''Downloading'' ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=ShoutWiki&diff=7996ShoutWiki2012-06-07T11:48:44Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = ShoutWiki<br />
| logo = ShoutWiki blocktext.png<br />
| image = <br />
| description = <br />
| URL = http://shoutwiki.com<br />
| project_status = {{online}}<br />
| archiving_status = {{saved}} (397 wikis, August 2011)<br />
| irc = wikiteam<br />
}}<br />
<br />
'''ShoutWiki''' is a [[wikifarm]]. It hosts about 800 wikis.<br />
<br />
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis<br />
<br />
== May 2012 update ==<br />
An update was sent out to all ShoutWiki wiki founders:<br />
<blockquote><br />
<p>Dear ShoutWiki user,</p><br />
<br />
<p>I am writing to you because our database tells us that you have at some point in the past, created a wiki on ShoutWiki.com through the CreateWiki interface. As some of you may be aware, there have been a number of human created issues involving the ShoutWiki site over the period of the last few years. Firstly, I wish to apologize for those issues caused by my predecessors.</p><br />
<br />
<p>Secondly, as you may know, we’re in the process of getting the site back online. Currently there are around 20 databases on the server, instead of the 500 which the master database believes we have. According to my latest predecessor, we have a backup of all contents of the previous server, wail. The point of this e-mail is simple, I would like to know how many of you have wikis, and which of those that you would like to imported to the new server. I will warn you first however, that these contents may be out of date, there are contents of 300-odd wikis at archive.org that I can attempt to reimport if its significantly newer than the database import.</p><br />
<br />
<p>I also apologise for any of you recieving this e-mail multiple times, as this is taken directly from the CreateWiki creation log database, and I am rushing to get this important e-mail to you. We will be working on fine tuning these “mailing lists” in future, including removing anyone that requested a wiki to be deleted, as you may still be getting this e-mail currently.</p><br />
<br />
<p>Thank you for your support,</p><br />
<p>– Lewis Cawte</p><br />
<p>Chief Technical Officer, ShoutWiki</p><br />
</blockquote><br />
<br />
== Backups ==<br />
* http://www.archive.org/details/shoutwiki.com (August 2011, 397 wikis)<br />
* This page says there are 774 wikis http://shoutwiki.com/wiki/ShoutWiki_Hub:About<br />
<br />
* [[ShoutWiki/Twitter account]] grab ([https://twitter.com/#!/ShoutWiki @ShoutWiki])<br />
<br />
== See also ==<br />
* [[List of wikifarms]]<br />
<br />
== External links ==<br />
* http://shoutwiki.com<br />
<br />
{{Navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7965Wikimedia Commons2012-05-31T04:19:57Z<p>Hydriz: update</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2005: [http://p.defau.lt/?Y3sVIK6OWKxW5Gs3BwbGtQ]<br />August 2005: [http://p.defau.lt/?s0kxuwB9DRugPLnLAXDxWQ]<br />September 2005: [http://p.defau.lt/?vggA7OYHbyY3dtxDui6BCQ]<br />October 2005: [http://p.defau.lt/?T_0resTl7qjJs_c5bvdw7Q]<br />November 2005: [http://p.defau.lt/?WoBL_VYsoCDyOnVqJD_j9w]<br />December 2005: [http://p.defau.lt/?B0yOIcf16qSscxm3tLn1Fg]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2005-08-08, 2005-09-12, 2005-09-18, 2005-09-25, 2005-11-18, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7948Tabblo2012-05-28T08:33:14Z<p>Hydriz: /* Downloading ZIPs */</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| logo = Tabblo logo.png<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== OS X ===<br />
<br />
Note that these instructions require [http://mxcl.github.com/homebrew/ Homebrew]<br />
<br />
<ul><br />
<li><code>brew tap ArchiveTeam/tools</code></li><br />
<li><code>brew install tabblo</code></li><br />
<li><code>cd `brew --prefix tabblo`</code></li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0-dev on Ubuntu) or lua-devel on RPM based distributions.</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several MB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Done<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| Done<br />
|-<br />
| 16 - 99<br />
| alard<br />
| Done<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Done<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Done<br />
|-<br />
| 700 - 720<br />
| closure<br />
| Done<br />
|-<br />
| 721 - 999<br />
| Short<br />
| Downloading<br />
|-<br />
| 1000 - 1099<br />
| closure<br />
| Done<br />
|-<br />
| 1100 - 1399<br />
| alard<br />
| Done<br />
|-<br />
| 1400 - 1499<br />
| Hydriz<br />
| Done<br />
|-<br />
| 1500 - 1845<br />
| alard<br />
| Downloading<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7891Wikimedia Commons2012-05-26T12:52:21Z<p>Hydriz: /* Errors */ +1</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2005-03-23, 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7890Wikimedia Commons2012-05-26T12:51:42Z<p>Hydriz: /* Volunteers */ major update</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />October 2004: [http://p.defau.lt/?j9Glz5ExKheNKXIaGtCOXQ]<br />November 2004: [http://p.defau.lt/?HPhH5E6LF2JYsd6vWAJk_w]<br />December 2004: [http://p.defau.lt/?EKVceBcekqKV0Zm8MTUORw]<br />January 2005: [http://p.defau.lt/?clYrnISJmvh7mh3yQBr_Tw]<br />February 2005: [http://p.defau.lt/?lMHkHMslwgqf_jOnRFT6FA]<br />March 2005: [http://p.defau.lt/?A_8Sd6BxVv_KDKtRrvL3Vg] (2005-03-23 - 2005-03-31 was downloaded differently, so its not available for checking)<br />April 2005: [http://p.defau.lt/?CEAUG6NRJ3FmdLvi1Uha0g]<br />May 2005: [http://p.defau.lt/?biJwakLb81mdrINVQYERxA]<br />June 2005: [http://p.defau.lt/?Ueuv51SG_dCChiDsRYbK1A]<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2005-07-01 || 2005-12-31 || ? || ? || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7880Tabblo2012-05-25T10:10:32Z<p>Hydriz: /* Downloading ZIPs */</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== OS X ===<br />
<br />
Note that these instructions require [http://mxcl.github.com/homebrew/ Homebrew]<br />
<br />
<ul><br />
<li><code>brew tap ArchiveTeam/tools</code></li><br />
<li><code>brew install tabblo</code></li><br />
<li><code>cd `brew --prefix tabblo`</code></li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0-dev on Ubuntu).</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several MB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Done<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| Done<br />
|-<br />
| 16 - 99<br />
| alard<br />
| Downloading<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Done<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Done<br />
|-<br />
| 700 - 720<br />
| closure<br />
| Done<br />
|-<br />
| 721 - 999<br />
| Short<br />
| Downloading<br />
|-<br />
| 1000 - 1099<br />
| closure<br />
| Downloading<br />
|-<br />
| 1100 - 1399<br />
| alard<br />
| Downloading<br />
|-<br />
| 1400 - 1499<br />
| Hydriz<br />
| Downloading<br />
|-<br />
| 1500 - 1845<br />
| <br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7877Tabblo2012-05-25T02:51:06Z<p>Hydriz: /* Downloading ZIPs */</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== OS X ===<br />
<br />
Note that these instructions require [http://mxcl.github.com/homebrew/ Homebrew]<br />
<br />
<ul><br />
<li><code>brew tap ArchiveTeam/tools</code></li><br />
<li><code>brew install tabblo</code></li><br />
<li><code>cd `brew --prefix tabblo`</code></li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0-dev on Ubuntu).</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several MB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Done<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| Done<br />
|-<br />
| 16 - 99<br />
| alard<br />
| Downloading<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Done<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Done<br />
|-<br />
| 700 - 720<br />
| closure<br />
| Done<br />
|-<br />
| 721 - 999<br />
| Short<br />
| Downloading<br />
|-<br />
| 1000 - 1099<br />
| closure<br />
| Downloading<br />
|-<br />
| 1100 - 1399<br />
| alard<br />
| Downloading<br />
|-<br />
| 1400 - 1599<br />
| Hydriz<br />
| Downloading<br />
|-<br />
| 1600 - 1845<br />
| <br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7876Tabblo2012-05-25T02:49:38Z<p>Hydriz: /* Downloading ZIPs */</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== OS X ===<br />
<br />
Note that these instructions require [http://mxcl.github.com/homebrew/ Homebrew]<br />
<br />
<ul><br />
<li><code>brew tap ArchiveTeam/tools</code></li><br />
<li><code>brew install tabblo</code></li><br />
<li><code>cd `brew --prefix tabblo`</code></li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0-dev on Ubuntu).</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several MB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Done<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| Done<br />
|-<br />
| 16 - 99<br />
| alard<br />
| Downloading<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Done<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Done<br />
|-<br />
| 700 - 720<br />
| closure<br />
| Done<br />
|-<br />
| 721 - 999<br />
| Short<br />
| Downloading<br />
|-<br />
| 1000 - 1099<br />
| closure<br />
| Downloading<br />
|-<br />
| 1100 - 1399<br />
| alard<br />
|<br />
|-<br />
| 1400 - 1599<br />
| Hydriz<br />
| Downloading<br />
|-<br />
| 1600 - 1845<br />
| <br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7875Tabblo2012-05-25T02:46:37Z<p>Hydriz: /* Downloading ZIPs */ done</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== OS X ===<br />
<br />
Note that these instructions require [http://mxcl.github.com/homebrew/ Homebrew]<br />
<br />
<ul><br />
<li><code>brew tap ArchiveTeam/tools</code></li><br />
<li><code>brew install tabblo</code></li><br />
<li><code>cd `brew --prefix tabblo`</code></li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0-dev on Ubuntu).</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several MB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Done<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| Done<br />
|-<br />
| 16 - 99<br />
| alard<br />
| Downloading<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Done<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Done<br />
|-<br />
| 700 - 720<br />
| closure<br />
| Done<br />
|-<br />
| 721 - 999<br />
| Short<br />
| Downloading<br />
|-<br />
| 1000 - 1099<br />
| closure<br />
| Downloading<br />
|-<br />
| 1100 - 1399<br />
| alard<br />
| <br />
|-<br />
| 1400 - 1845<br />
| <br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=ShoutWiki&diff=7874ShoutWiki2012-05-25T02:31:42Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = ShoutWiki<br />
| logo = ShoutWiki blocktext.png<br />
| image = <br />
| description = <br />
| URL = http://shoutwiki.com<br />
| project_status = {{offline}}[https://twitter.com/#!/shoutwiki/status/176056717807845376]<br />
| archiving_status = {{saved}} (397 wikis, August 2011)<br />
| irc = wikiteam<br />
}}<br />
<br />
'''ShoutWiki''' is a [[wikifarm]]. It hosts about 800 wikis.<br />
<br />
For a list of wikis hosted in this wikifarm see: https://code.google.com/p/wikiteam/source/browse/trunk/listsofwikis<br />
<br />
== May 2012 update ==<br />
An update was sent out to all ShoutWiki wiki founders:<br />
<blockquote><br />
<p>Dear ShoutWiki user,</p><br />
<br />
<p>I am writing to you because our database tells us that you have at some point in the past, created a wiki on ShoutWiki.com through the CreateWiki interface. As some of you may be aware, there have been a number of human created issues involving the ShoutWiki site over the period of the last few years. Firstly, I wish to apologize for those issues caused by my predecessors.</p><br />
<br />
<p>Secondly, as you may know, we’re in the process of getting the site back online. Currently there are around 20 databases on the server, instead of the 500 which the master database believes we have. According to my latest predecessor, we have a backup of all contents of the previous server, wail. The point of this e-mail is simple, I would like to know how many of you have wikis, and which of those that you would like to imported to the new server. I will warn you first however, that these contents may be out of date, there are contents of 300-odd wikis at archive.org that I can attempt to reimport if its significantly newer than the database import.</p><br />
<br />
<p>I also apologise for any of you recieving this e-mail multiple times, as this is taken directly from the CreateWiki creation log database, and I am rushing to get this important e-mail to you. We will be working on fine tuning these “mailing lists” in future, including removing anyone that requested a wiki to be deleted, as you may still be getting this e-mail currently.</p><br />
<br />
<p>Thank you for your support,</p><br />
<p>– Lewis Cawte</p><br />
<p>Chief Technical Officer, ShoutWiki</p><br />
</blockquote><br />
<br />
== Backups ==<br />
* http://www.archive.org/details/shoutwiki.com (August 2011, 397 wikis)<br />
* This page says there are 774 wikis http://shoutwiki.com/wiki/ShoutWiki_Hub:About<br />
<br />
* [[ShoutWiki/Twitter account]] grab ([https://twitter.com/#!/ShoutWiki @ShoutWiki])<br />
<br />
== See also ==<br />
* [[List of wikifarms]]<br />
<br />
== External links ==<br />
* http://shoutwiki.com<br />
<br />
{{Navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7868Tabblo2012-05-24T11:14:12Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== OS X ===<br />
<br />
Note that these instructions require [http://mxcl.github.com/homebrew/ Homebrew]<br />
<br />
<ul><br />
<li><code>brew tap ArchiveTeam/tools</code></li><br />
<li><code>brew install tabblo</code></li><br />
<li><code>cd `brew --prefix tabblo`</code></li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0-dev on Ubuntu).</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several MB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Done<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| Done<br />
|-<br />
| 16 - 99<br />
| <br />
| Unclaimed<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Downloading<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Downloading<br />
|-<br />
| 700 - 720<br />
| closure<br />
| Done<br />
|-<br />
| 721 - 999<br />
| Short<br />
| Downloading<br />
|-<br />
| 1000 - 1099<br />
| closure<br />
| Downloading<br />
|-<br />
| 1100 - 1845<br />
| <br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7850Tabblo2012-05-23T15:39:50Z<p>Hydriz: links :)</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}} ?<br />
| archiving_status = {{nosavedyet}}<br />
| tracker = http://tabb.heroku.com<br />
| source = https://github.com/ArchiveTeam/tabblo-grab<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== How to help archiving ==<br />
<br />
<b>Easy option: You can also do this with the ArchiveTeam Warrior, a virtual machine you can download from [http://archive.org/details/archiveteam-warrior]. Install the appliance, boot and choose the Tabblo project from the menu.</b><br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget with the Lua extensions.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/tabblo-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc-lua: <pre>./get-wget-warc-lua.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./seesaw.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager. You'll also need the liblua library (liblua5.1-0.dev on Ubuntu).</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several GB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from Tabblo.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Downloading<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| done<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Downloading<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Downloading<br />
|-<br />
| 700 - 1845<br />
|<br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7846Tabblo2012-05-23T10:23:24Z<p>Hydriz: /* Downloading ZIPs */ +me</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}} ?<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first). Add your name to the table below.</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Downloading<br />
|-<br />
| 10<br />
| underscor<br />
| Downloading<br />
|-<br />
| 11 - 15<br />
| bsmith093<br />
| done<br />
|-<br />
| 100 - 499<br />
| alard<br />
| Downloading<br />
|-<br />
| 500 - 549<br />
| underscor<br />
| Downloading<br />
|-<br />
| 550 - 599<br />
| Short<br />
| Downloading<br />
|-<br />
| 600 - 699<br />
| Hydriz<br />
| Downloading<br />
|-<br />
| 700 - 1845<br />
|<br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7840Wikimedia Commons2012-05-22T15:08:16Z<p>Hydriz: /* Volunteers */ There is no other range currently...</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || ? || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:db48x|db48x]] || 2005-07-01 || 2005-12-31 || ? || ? || || ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7823Tabblo2012-05-21T13:44:11Z<p>Hydriz: and URL is wrong...</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tabblo.com/}}<br />
| project_status = {{closing}} ?<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first).</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Downloading<br />
|-<br />
| 10 - 1845<br />
|<br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Tabblo&diff=7822Tabblo2012-05-21T13:43:12Z<p>Hydriz: fix template</p>
<hr />
<div>{{Infobox project<br />
| title = Tabblo<br />
| image = Tabblo-com.png<br />
| URL = {{url|1=http://www.tablo.com/}}<br />
| project_status = {{closing}} ?<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
A post called [http://nedbatchelder.com/blog/201201/goodbye_tabblo.html Goodbye Tabblo] by Ned Batchelder (former Tabblo employee).<br />
<br />
== Tabblo Lifeboat ==<br />
<br />
Ned Batchelder (former Tabblo employee) wrote [https://bitbucket.org/ned/lifeboat Tabblo Lifeboat], a Python script that helps users to download their tabblos.<br />
<br />
== Downloading ZIPs ==<br />
<br />
There is a script to download the Tabblo ZIP files. (This includes pictures and text, but no comments, profile pages et cetera.) The script downloads a range of 1000 Tabblos and uploads the ZIP files to Archive.org. For example, see [http://archive.org/details/archiveteam-tabblo-0 the first range].<br />
<br />
To participate:<br />
<br />
<ol><br />
<li>Get the code from [https://github.com/ArchiveTeam/tabblo-grab]. You need Bash and Curl to run it.</li><br />
<li>Claim one or more ranges (each range includes up to 1,000 Tabblos, so try claiming one or two ranges first).</li><br />
<li>Run the script: <code>./dld-tabblo-zip.sh $RANGE</code>, e.g. <code>./dld-tabblo-zip.sh 12</code> to download and upload Tabblos 12,000 to 12,999.</li><br />
</ol><br />
<br />
To speed things up a range can be divided in 10 parts (of 100 Tabblos each), so you can download several parts at the same time. For example:<br />
<pre><br />
for i in 0 1 2 3 4 5 6 7 8 9 ; do<br />
./dld-tabblo-zip.sh $RANGE $i &<br />
done<br />
</pre><br />
<br />
Once you've run the script once, rerun it to check if everything was down- and uploaded successfully.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Ranges<br />
! Downloader<br />
! Status<br />
|-<br />
| 0 - 9<br />
| alard<br />
| Downloading<br />
|-<br />
| 10 - 1845<br />
|<br />
| Unclaimed<br />
|-<br />
| 1846 - 1850<br />
| Wait...<br />
| This is the newest range, please download other ranges first<br />
|-<br />
|}<br />
<br />
== Site structure ==<br />
<br />
=== Tabblos ===<br />
<br />
Tabblos have an url of the form <code><nowiki>http://www.tabblo.com/studio/stories/view/#ID#/</nowiki></code>, where <code>#ID#</code> is the numeric id of the tabblo. Tabblos are numbered sequentially, the last number at the time of writing is 1843370.<br />
<br />
A tabblo consists of one HTML page with some text and one or more images. You can click on the images to get a large version, but apart from the larger image that won't give you more than is on the tabblo page. Most tabblos have comments, which are included in the page's HTML.<br />
<br />
Running <code>wget --page-requisites</code> on a tabblo url will probably save all available information.<br />
<br />
From the Tabblo Lifeboat we learn that Tabblo offers a nice way to download a tabblo in a zip file. This zip file will also give you the original photo files. Download url: <code><nowiki>http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1</nowiki></code>. You have to log in before you can download this zip file (but once you're in you can download <i>any</i> tabblo, not just your own).<br />
<br />
There's one other catch: the zip download will fail first. The first time you download it you'll get an incomplete zip file, the next time you try it you'll get a little bit more. Repeat downloading until you get the complete zip file. (Probably has something to do with caching.)<br />
<br />
Conclusion, to download a tabblo we'll probably want to do something like this:<br />
<pre><br />
wget --page-requisites --warc-file tabblo http://www.tabblo.com/studio/stories/view/#ID#/<br />
while ! unzip -t all.zip ; do<br />
wget -O all.zip --header="Cookie: tabblosesh=###" http://www.tabblo.com/studio/stories/zip/#ID#/?orig=1<br />
done<br />
</pre><br />
<br />
=== Users ===<br />
<br />
'''TODO''' The user pages (e.g. http://www.tabblo.com/studio/person/chilla/) have everything you'd expect from a social network: comments, photos, friends, favorites, messages.<br />
<br />
<ul><br />
<li>profile page: <code>http://www.tabblo.com/studio/person/chilla/</code></li><br />
<li>tabblos: <code>http://www.tabblo.com/studio/view/tabblos/mrsfabulous/</code></li><br />
<li>favorites: <code>http://www.tabblo.com/studio/view/favorites/Candlepower</code></li><br />
</ul></div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7813Wikimedia Commons2012-05-21T08:41:37Z<p>Hydriz: /* Errors */ adding 2</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:db48x|db48x]] || 2005-07-01 || 2005-12-31 || ? || ? || || ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06, 2006-12-13, 2006-12-17.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7812Wikimedia Commons2012-05-21T08:29:14Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:db48x|db48x]] || 2005-07-01 || 2005-12-31 || ? || ? || || ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
* [http://code.google.com/p/wikiteam/issues/detail?id=45 Issue 45]: 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01, 2006-07-13, 2006-07-30, 2006-08-02, 2006-08-05, 2006-08-13, 2006-09-12, 2006-10-22, 2006-10-26, 2006-11-23, 2006-12-06.<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7811Wikimedia Commons2012-05-21T08:26:05Z<p>Hydriz: /* Volunteers */ update</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:db48x|db48x]] || 2005-07-01 || 2005-12-31 || ? || ? || || ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Skipped 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01 (see [http://code.google.com/p/wikiteam/issues/detail?id=45 issue 45])<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />December 2006: http://p.defau.lt/?Ms_TgrcyGDL_0oZQgKCNmw<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Etherpad&diff=7740Etherpad2012-05-14T09:51:37Z<p>Hydriz: some expansion</p>
<hr />
<div>'''Etherpad''' is a web-based collaborative real-time editor, allowing authors to simultaneously edit a text document, and see all of the participants' edits in real-time, with the ability to display each author's text in their own colour.<br />
<br />
It was not until the end of 2009 when Google wanted to remove its opponents, and thus bought over Etherpad for the making of its own project called Google Wave (now already defunct). Under community pressure, Google open sourced the codebase behind Etherpad, thus sparking many new clones of Etherpad.<br />
<br />
== Archives ==<br />
* [http://www.archive.org/details/archiveteam-etherpad-timecapsule Archive Team's Etherpad Time Capsule]<br />
<br />
== External links ==<br />
* http://etherpad.org<br />
<br />
{{expand}}<br />
<br />
{{Navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Ustream&diff=7739Ustream2012-05-14T09:47:19Z<p>Hydriz: </p>
<hr />
<div>{{Infobox project<br />
| title = USTREAM<br />
| image = Need screenshot.png<br />
| description = USTREAM, You're On. Free LIVE VIDEO Streaming, Online Broadcasts. Create webcasts, live stream videos on the Internet. Live streaming videos, TV shows<br />
| URL = http://www.ustream.tv<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
USTREAM is a free live video streaming website, providing streaming videos as opposed to [[Youtube]].<br />
<br />
http://textt.net/mapi/20101018201937 http://www.webcitation.org/67bbmpzFF<br />
<br />
{{navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=File:Need_screenshot.png&diff=7738File:Need screenshot.png2012-05-14T09:46:15Z<p>Hydriz: Placeholder image for projects that don't have a screenshot :)
Feel free to make a better image! Was created out of random using Paint.</p>
<hr />
<div>Placeholder image for projects that don't have a screenshot :)<br />
<br />
Feel free to make a better image! Was created out of random using Paint.</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Ustream&diff=7737Ustream2012-05-14T09:40:29Z<p>Hydriz: Adding infobox</p>
<hr />
<div>{{Infobox project<br />
| title = USTREAM<br />
| image = <br />
| description = USTREAM, You're On. Free LIVE VIDEO Streaming, Online Broadcasts. Create webcasts, live stream videos on the Internet. Live streaming videos, TV shows<br />
| URL = http://www.ustream.tv<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
USTREAM is a free live video streaming website, providing streaming videos as opposed to [[Youtube]].<br />
<br />
http://textt.net/mapi/20101018201937 http://www.webcitation.org/67bbmpzFF<br />
<br />
{{navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=MobileMe&diff=7736MobileMe2012-05-14T09:34:50Z<p>Hydriz: Adding navigation box</p>
<hr />
<div>{{Infobox project<br />
| title = MobileMe<br />
| image = Screenshot-MobileMe_Sign_In_-_Google_Chrome.png<br />
| URL = {{url|1=https://me.com/}}<br />
| project_status = {{closing}} on June 30, 2012<br />
| archiving_status = {{inprogress}}<br />
| tracker = http://memac.heroku.com/<br />
}}<br />
<br />
Apple's MobileMe will close on June 30, 2012.<br />
<br />
From the [http://en.wikipedia.org/wiki/MobileMe Wikipedia page]:<br />
<br />
<blockquote><br />
<p><br />
MobileMe (formerly .Mac and iTools) is a subscription-based collection of online services and software offered by Apple Inc. Originally launched on January 5, 2000, as iTools, a free collection of Internet-based services for users of Mac OS 9, Apple relaunched it as .Mac on July 17, 2002, when it became a paid subscription service primarily designed for users of Mac OS X. Apple relaunched the service again as MobileMe at WWDC 2008 on July 9, 2008, now targeting Mac OS X, Windows, iPad, iPhone, and iPod Touch users.<br />
</p><br />
<p><br />
On February 24, 2011, Apple discontinued offering MobileMe through its retail stores. The MobileMe retail boxes are also not offered through resellers anymore. Apple is also no longer accepting new subscribers for MobileMe. At the WWDC 2011, on June 6, Apple announced it will launch iCloud in the Northern Hemisphere Autumn 2011, which will replace MobileMe for new users. MobileMe itself will continue to function until June 30, 2012, at which point the service will no longer be available, although users are encouraged to migrate to iCloud before that date.<br />
</p><br />
</blockquote><br />
<br />
[http://www.apple.com/mobileme/transition.html Apple.com/MobileMe shutdown notice] ([http://www.webcitation.org/626XlUEck webcite mirror])<br />
<br />
[http://support.apple.com/kb/HT4597 Apple Support - Frequently asked questions about the MobileMe transition and iCloud] ([http://www.webcitation.org/626XlUEck webcite mirror])<br />
<br />
<br />
== How to help archiving ==<br />
<br />
There is a distributed download script that gets usernames from a tracker and downloads the data.<br />
<br />
Make sure you are on Linux, that you have curl, git, a recent version of Bash. Your system must also be able to compile wget.<br />
<br />
<ul><br />
<li>Get the code: <pre>git clone git://github.com/ArchiveTeam/mobileme-grab.git</pre></li><br />
<li>Get and compile the latest version of wget-warc: <pre>./get-wget-warc.sh</pre></li><br />
<li>Think of a nickname for yourself (preferably use your IRC name).</li><br />
<li>Run the download script with <pre>./dld-client.sh "<YOURNICK>"</pre></li><br />
<li>To stop the script gracefully, run <pre>touch STOP</pre> in the script's working directory. It will finish the current task and stop.</li><br />
</ul><br />
<br />
=== Notes ===<br />
<ul><br />
<li>Compiling wget-warc will require dev packages for the various libraries that it needs. Most questions have been about gnutls; install the gnutls-devel or gnutls-dev package with your favorite package manager.</li><br />
<li>Downloading one user's data can take between 10 seconds and a few hours.</li><br />
<li>The data for one user is equally varied, from a few kB to several GB.</li><br />
<li>The downloaded data will be saved in the <code>./data/</code> subdirectory.</li><br />
<li>Download speeds from me.com are not that high. You can run multiple clients to speed things up.</li><br />
</ul><br />
<br />
=== Errors ===<br />
* If you keep getting errors such as ERROR (3) when running <tt>./dld-client.sh</tt>, just forget about that nickname for now and rerun the script, someone would rerun them closer to the closing date. There should be more information about this given here, but there isn't any and this is the best way I know.<br />
* ERROR (3) is a "File I/O error" from wget. I've gotten one so far. Couldn't tell you what causes it though.<br />
<pre> - Running wget --mirror (at least 8721 files)... ERROR (3).<br />
Error downloading from web.me.com.<br />
Error downloading 'ikkeisasaki'.<br />
</pre><br />
* The funny thing is that the log seems to imply wget completed just fine:<br />
<pre><br />
[archive@centosdevbox web.me.com]$ pwd<br />
/home/archive/mobileme-grab/data/i/ik/ikk/ikkeisasaki/web.me.com<br />
[archive@centosdevbox web.me.com]$ tail -n 3 wget.log<br />
FINISHED --2012-03-23 20:00:31--<br />
Total wall clock time: 27m 34s<br />
Downloaded: 7748 files, 239M in 6m 43s (607 KB/s)<br />
</pre><br />
* However, the download files don't line up with what the "at least" was supposed to be.--[[User:Aggroskater|Aggroskater]] 20:28, 24 March 2012 (EDT)<br />
<br />
== Uploading your data ==<br />
<br />
To upload the data you've downloaded, you can run the <code>./upload-finished.sh</code> script to upload your data. For example, run this in your script directory: <code>./upload-finished.sh YOURNICK</code><br />
<br />
Once a user is successfully uploaded, it is moved to the <code>data/uploaded/</code> subdirectory. If you need to clear disk space you can remove things that are in that directory.<br />
<br />
It's generally safe to run the upload script while your download scripts are running; it will only upload users that are finished.<br />
<br />
=== Archive status ===<br />
There is a status board available [http://memac.heroku.com/ here].<br />
<br />
You can see the upload progress on [http://www.archive.org/search.php?query=ArchiveTeam%20MobileMe archive.org]. <br />
<br />
== Seesaw: a combined download/upload script ==<br />
Instead of <code>dld-client.sh</code>, which only downloads and requires you to upload later, you can run the seesaw script. It downloads one user, uploads it to the repository, and removes it from your computer before downloading the next user.<br />
<br />
<pre>./seesaw.sh "<YOURNICK>"</pre><br />
<br />
== Archive directly to archive.org ==<br />
*To reduce overhead, another script has been developed to package users and upload them to archive.org directly via [http://www.archive.org/help/abouts3.txt s3 interface]. Users are put in archives of at least 10 GiB (and max 10 GiB + size of the last downloaded user), collected in items of 40 archives each.<br />
*We've solved some technical issues about upload rate and are currently looking for help to scale up.<br />
*If you have at least 1.5 MiB/s of upload speed capacity (i.e. at least 1.5 MiB/s full duplex), this is the solution for you. A single instance of the <code>seesaw-s3</code> script is able to use all such bandwidth, because MobileMe and IA (only with this script) are quick enough.<br />
**Don't use more instances if you don't have more than such bandwidth, or IA servers will suffer from the excessive number of connections. If you have 3 MiB/s, use two instances, and so on. <br />
**You'll also need at least 20-30 GiB of disk space for each instance for minimum security (it could be much more if you bump into a very big user whose size is added to the 10 GiB limit).<br />
<br />
Ask alard on our IRC channel if you want a copy of the script and start archiving faster than ever!<br />
<br />
== Site structure ==<br />
<br />
(Copied from Wikipedia) There are public subdomain access points to each MobileMe members' individual account functions. These provide direct public web access to each MobileMe users account, via links to each function directly; Gallery, Public folder, published website, and published calendars (not available currently). See list: <br />
<br />
*''<nowiki>http://www.me.com</nowiki>'' – member login.<br />
*''<nowiki>http://gallery.me.com/</nowiki>'''<username>''''' – member public photo/video Gallery.<br />
*''<nowiki>http://public.me.com/</nowiki>'''<username>''''' – member Public folder access.<br />
*''<nowiki>http://web.me.com/</nowiki>'''<username>''''' – member Website access.<br />
*''<nowiki>http://ical.me.com/</nowiki>'''<username>'''/'''<calendar name>''''' – member individual calendar publishing. In the older system, many calendars could be published at the same time. In the current iteration of MobileMe, there is no calendar publishing available.<br />
<br />
=== web.me.com and web.mac.com ===<br />
<br />
The domains [http://www.google.com/search?q=site:web.me.com web.me.com] and [http://www.google.com/search?q=site:web.mac.com web.mac.com] point to the same web pages.<br />
<br />
== Interesting (large) examples ==<br />
<br />
<ul><br />
<li>web.me.com: rightangles</li><br />
<li>homepage.mac.com: russconte</li><br />
<li>gallery.me.com: aaaashy</li><br />
<li>public.me.com: morkjturner</li><br />
</ul><br />
<br />
== Tools/Archiving ==<br />
<br />
There's a repository on the ArchiveTeam Github: https://github.com/ArchiveTeam/mobileme-grab<br />
<br />
The combined tool for downloading all content for one user is <code>dld-user.sh</code>. It needs a [[Wget_with_WARC_output|WARC-enabled wget]] to run.<br />
<br />
=== homepage.mac.com ===<br />
<br />
This is a separate site from web.me.com (older, probably). Almost all of the sites on this domain can be downloaded with <code>wget --mirror</code>.<br />
<br />
A [https://raw.github.com/ArchiveTeam/mobileme-grab/master/dld-homepage-mac-com.sh script] is available in the [https://github.com/ArchiveTeam/mobileme-grab git repository]. You need a [[Wget_with_WARC_output|WARC-enabled wget]] to run this.<br />
<br />
=== web.me.com ===<br />
<br />
web.me.com will give you a list of the files in a user's directory. We can use this list of urls to download the complete site, no <code>wget --mirror</code> necessary.<br />
<br />
A [https://raw.github.com/ArchiveTeam/mobileme-grab/master/dld-web-me-com.sh script] is available in the [https://github.com/ArchiveTeam/mobileme-grab git repository]. You need a [[Wget_with_WARC_output|WARC-enabled wget]] to run this.<br />
<br />
Download procedure:<br />
<br />
<ul><br />
<li>Download <code><nowiki>http://web.me.com/</nowiki>'''<username>'''/?webdav-method=truthget&depth=infinity</code></li><br />
<li>Parse the WebDAV response to find the url of each file and download them.</li><br />
</ul><br />
<br />
=== public.me.com ===<br />
<br />
The files on public.me.com are accessible via WebDAV: ''<nowiki>https://public.me.com/ix/</nowiki>'''<username>'''''<br />
<br />
A [https://raw.github.com/ArchiveTeam/mobileme-grab/master/dld-public-me-com.sh script] is available in the [https://github.com/ArchiveTeam/mobileme-grab git repository].<br />
<br />
Download procedure:<br />
<br />
<ul><br />
<li>Send a <code>PROPFIND</code> request with <code>Depth: infinity</code> to <code><nowiki>https://public.me.com/ix/</nowiki>'''<username>'''</code>. This will return the complete, recursive file list.</li><br />
<li>Parse the WebDAV response to find the href of each file and download them.</li><br />
</ul><br />
<br />
=== gallery.me.com ===<br />
<br />
If you ask nicely, gallery.me.com will give you a zip file of the entire gallery contents.<br />
<br />
A [https://raw.github.com/ArchiveTeam/mobileme-grab/master/dld-gallery-me-com.py script] is available in the [https://github.com/ArchiveTeam/mobileme-grab git repository].<br />
<br />
Download procedure:<br />
<br />
<ul><br />
<li>Send a <code>GET</code> to <code><nowiki>http://gallery.me.com/</nowiki>'''<username>'''?webdav-method=truthget&feedfmt=json&depth=Infinity</code>. This will give you a JSON file that contains details about all albums and all photos/videos in the gallery. ([http://jsonviewer.stack.hu/#http://gallery.me.com/airbrushron?webdav-method=truthget&feedfmt=json&depth=Infinity Example])</li><br />
<li>Search through this file for the properties <code>largeImageUrl</code> and <code>videoUrl</code>, which contain the urls for the largest versions of images and videos that are available.</li><br />
<li><br />
Use these files to construct a ziplist description:<br />
<br />
<pre><br />
<?xml version="1.0" encoding="utf-8" ?><br />
<ziplist xmlns="http://user.mac.com/properties/"><br />
<entry><br />
<name><!-- the target path of the file in the zip file --></name><br />
<href><!-- the url of the image (the largeImageUrl or videoUrl) --></href><br />
</entry><br />
...<br />
</ziplist><br />
</pre><br />
</li><br />
<li>Send this document with a <code>POST</code> request with a <code>Content-Type: text/xml; charset="utf-8"</code> header to <code><nowiki>http://gallery.me.com/</nowiki>'''<username>'''?webdav-method=ZIPLIST</code></li><br />
<li>The server will now generate a zip file for you, containing the files specified in the ziplist document. This may take a short while, but eventually the request will give you a response with a <code>X-Zip-Token</code> header.</li><br />
<li>Use the zip token to download the zip file: <code><nowiki>http://gallery.me.com/</nowiki>'''<username>'''?webdav-method=ZIPGET&token='''<ziptoken>'''</code>.</li><br />
</ul><br />
<br />
Note: I found that with very large galleries the ZIP request fails. Therefore, it's better to make one zip file per album. The Python script does that.<br />
<br />
=== ical.me.com ===<br />
<br />
To download a calendar you need the username and the name of the calendar. (There seems to be no way to list all calendars of a specific user.) Once you have these two names, you can download the ics file using one of these urls:<br />
<br />
* ''<nowiki>http://ical.mac.com/WebObjects/iCal.woa/wa/Download/</nowiki>'''<calendarname>'''.ics?u='''<username>'''&n='''<calendarname>'''.ics''<br />
* ''<nowiki>http://homepage.mac.com/</nowiki>'''<username>'''/.calendars/'''<calendarname>'''.ics''<br />
<br />
=== iDisk ===<br />
<br />
Some of the sites on homepage.mac.com have a section called 'iDisk Public Folder'. You can see the list of files, but can't actually download them. Our current hypothesis is that the files listed in the iDisk Public Folder are also available through public.me.com, so downloading those would be sufficient to get all of the public iDisk content (compare https://public.me.com/ardeshir and http://homepage.mac.com/ardeshir/FileSharing8.html).<br />
<br />
{{navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Fileplanet&diff=7735Fileplanet2012-05-14T09:34:37Z<p>Hydriz: Adding navigation box</p>
<hr />
<div>{{Infobox project<br />
| title = FilePlanet<br />
| logo = Fileplanet_logo.jpg<br />
| description = Website host of game content, 1996-2012<br />
| URL = http://www.fileplanet.com<br />
| image = Fileplanet_snap.png<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = fireplanet<br />
}}<br />
<br />
[http://www.fileplanet.com FilePlanet] is no longer hosting new content, and "is in the process of being archived [by IGN]."<br />
<br />
FilePlanet hosts 87,190 download pages of game-related material (demos, patches, mods, promo stuff, etc.), which needs to be archived. These tend to be larger files, ranging from 10MB patches to 3GB clients. We'll want all the arms we can for this one, since it gets harder the farther the archiving goes (files are numbered chronologically, and Skyrim mods are bigger than Doom ones).<br />
<br />
===What We Need===<br />
<br />
* Files! (approx. 25% done 5/10/12)<br />
* /fileinfo/ pages - get URLs from sitemaps (Schbirid is downloading these)<br />
* [http://blog.fileplanet.com http://blog.fileplanet.com]<br />
* A list of all "site:www.fileplanet.com inurl:hosteddl" URLs since these files seem not to be in the simple ID range<br />
* Where do links like http://dl.fileplanet.com/dl/dl.asp?classicgaming/o2home/rtl.zip come from and can we rescue those too?<br />
<br />
===How to help===<br />
<br />
* Have bash, wget, grep, rev, cut<br />
* >100 gigabytes of space, just to be safe<br />
* Put https://raw.github.com/SpiritQuaddicted/fileplanet-file-download/master/download_pages_and_files_from_fileplanet.sh somewhere (I'd suggest ~/somepath/fileplanetdownload/ ) and "chmod +x" it<br />
* Pick a free increment (eg 110000-114999) and tell people about it (#fireplanet in EFnet or post it here). Be careful. In lower ranges a 5k range might work, but they get HUGE later. In the 220k range and probably lower too, we better use 100 IDs per chunk.<br />
* * Keep the chunk sizes small. <30G would be nice. The less the better.<br />
* Run the script with your start and end IDs as arguments. Eg "<code>./download_pages_and_files_from_fileplanet.sh 110000 114999</code>"<br />
* Take a walk for half a day.<br />
* You can <code>tail</code> the .log files if you are curious. See right below.<br />
* Once you are done with your chunk, you will have a directory named after your range, eg 110000-114999/ . Inside that pages_xx000-xx999.log and files_xx000-xx999.log plus the www.fileplanet.com/ directory.<br />
* Done! GOTO 10<br />
<br />
In the end we'll upload all the parts to archive.org. If you have an account, you can use eg s3cmd. <br />
<br />
<code>s3cmd --add-header x-archive-auto-make-bucket:1 --add-header "x-archive-meta-description:Files from Fileplanet (www.fileplanet.com), all files from the ID range 110000 to 114999." put 110000-114999.tar s3://FileplanetFiles_110000-114999</code><br />
<br />
<code>s3cmd put 110000-114999/*.log s3://FileplanetFiles_110000-114999/</code><br />
<br />
Mind the trailing slash.<br />
<br />
===Notes===<br />
* For planning a good range to download, check http://www.quaddicted.com/stuff/temp/file_IDs_from_sitemaps.txt but be aware that apparently that does not cover all IDs we can get by simply incrementing by 1. Schbirid downloaded eg the file 75059 which is not listed in the sitemaps. So you can not trust that ID list.<br />
* The range 175000-177761 (weird end number since that's when the server ran out of space...) had ~1100 files and 69G. We will need to use 1k ID increments for those ranges.<br />
* Schbirid mailed to FPOps@IGN.com on the 3rd of May, no reply.<br />
<br />
===Status===<br />
{| class="wikitable"<br />
|-<br />
! Range<br />
! Status<br />
! Number of files<br />
| Size in gigabytes<br />
| Downloader<br />
|-<br />
| 00000-09999<br />
| Done, [http://archive.org/details/FileplanetFiles_00000-09999 archived]<br />
| 1991<br />
| 1G<br />
| Schbirid<br />
|-<br />
| 10000-19999<br />
| Done, [http://archive.org/details/FileplanetFiles_10000-19999 archived]<br />
| 3159<br />
| 9G<br />
| Schbirid<br />
|-<br />
| 20000-29999<br />
| Done, [http://archive.org/details/FileplanetFiles_20000-29999 archived]<br />
| 6453<br />
| 7G<br />
| Schbirid<br />
|-<br />
| 30000-39999<br />
| Done, [http://archive.org/details/FileplanetFiles_30000-39999 archived]<br />
| 4085<br />
| 9G<br />
| Schbirid<br />
|-<br />
| 40000-49999<br />
| Done, [http://archive.org/details/FileplanetFiles_40000-49999 archived]<br />
| 5704<br />
| 18G<br />
| Schbirid<br />
|-<br />
| 50000-54999<br />
| Done, locally<br />
| 2706<br />
| 24G<br />
| Schbirid<br />
|-<br />
| 55000-59999<br />
| Done, [http://archive.org/details/FileplanetFiles_50000-559999 archived] (bad URL)<br />
| 2390<br />
| 24G<br />
| Schbirid<br />
|-<br />
| 60000-64999<br />
| Done, [http://archive.org/details/FileplanetFiles_60000-64999 archived]<br />
| 2349<br />
| 24G<br />
| Schbirid<br />
|-<br />
| 65000-69999<br />
| Done, [http://archive.org/details/FileplanetFiles_65000-69999 archived]<br />
| 305<br />
| 4G<br />
| Schbirid<br />
|-<br />
| 70000-79999<br />
| Done, [http://archive.org/details/FileplanetFiles_70000-79999 archived]<br />
| 59<br />
| 0.2G<br />
| Schbirid<br />
|-<br />
| 80000-84999<br />
| Done, locally<br />
| <br />
| 31G<br />
| Debianer<br />
|-<br />
| 85000-89999<br />
| In progress<br />
| <br />
| <br />
| SmileyG<br />
|-<br />
| 90000-109999<br />
| Done, empty<br />
| 0<br />
| 0<br />
| Schbirid<br />
|-<br />
| 110000-114999<br />
| Done, [http://archive.org/details/FileplanetFiles_110000-114999 archived]<br />
| 2139<br />
| 35G<br />
| Schbirid<br />
|-<br />
| 115000-119999<br />
| In progress<br />
| <br />
| <br />
| codebear<br />
|-<br />
| 120000-124999<br />
| In progress<br />
| <br />
| <br />
| codebear<br />
|-<br />
| 125000-129999<br />
| In progress<br />
| <br />
| <br />
| S[h]O[r]T<br />
|-<br />
| 130000-219999<br />
| open<br />
| better use ranges of 1000 here. later 100<br />
| <br />
| <br />
|-<br />
| 220000-220499<br />
| Done, locally<br />
| 250<br />
| 35G<br />
| Schbirid<br />
|-<br />
| 220500+<br />
| <br />
| <br />
| <br />
| <br />
|-<br />
|}<br />
<br />
===Graphs===<br />
[[File:Fileplanet number of IDs from the sitemaps per 1k range.png]]<br />
<br />
{{navigation box}}</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=File:TwitLonger.png&diff=7734File:TwitLonger.png2012-05-14T09:33:52Z<p>Hydriz: fix link</p>
<hr />
<div>Screenshot of http://twitlonger.com. See project page: [[TwitLonger]].</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=TwitLonger&diff=7733TwitLonger2012-05-14T09:33:32Z<p>Hydriz: Adding image</p>
<hr />
<div>{{Infobox project<br />
| title = TwitLonger<br />
| image = TwitLonger.png<br />
| description = <br />
| URL = http://www.twitlonger.com<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''TwitLonger''' is a service to write messages longer than 140 characters.<br />
<br />
== Vital signs ==<br />
<br />
Stable<br />
<br />
== External links ==<br />
<br />
* http://www.twitlonger.com<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Microblogging services]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=File:TwitLonger.png&diff=7732File:TwitLonger.png2012-05-14T09:32:15Z<p>Hydriz: Screenshot of http://twitlonger.com. See project page: Twitlonger</p>
<hr />
<div>Screenshot of http://twitlonger.com. See project page: [[Twitlonger]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=AnyHub&diff=7731AnyHub2012-05-14T09:18:15Z<p>Hydriz: Project is offline</p>
<hr />
<div>{{Infobox project<br />
| title = AnyHub<br />
| image = Anyhub.net_2011-11-15_7-30-3.png<br />
| description = File hosting website<br />
| URL = http://www.anyhub.net/ http://www.archive.org/details/archiveteam-anyhub<br />
| project_status = {{offline}}<br />
| archiving_status = {{saved}}<br />
| irc = AnyHubTeam <br />
}}<br />
'''AnyHub''' is a file hosting service which was about to close and we've archived. Its content is being uploaded on IA: http://www.archive.org/details/archiveteam-anyhub<br />
<br />
== What is AnyHub.net? == <br />
AnyHub is a fast, free and simple file host that anyone can use. Signup not required, and upload files of up to 10 GiB at a time.<br><br />
Files uploaded will generally be kept forever, unless they are in violation of our Terms of Service.<br><br />
AnyHub is developed and run by [http://charliesomerville.com/ Charlie Somerville], a student from Melbourne, Australia. The awesome design was created by [http://z-dev.org/ Matt Anderson], a talented graphics designer from Ohio.<br><br />
[[File:Anyhub.netFAQ_2011-11-15_7-30-44.png|thumb|The original FAQ]]<br />
<br />
== AnyHub's death ==<br />
The official banner said ''AnyHub will be shutting down as of '''Friday, 18th of November'''. Please download any important data immediately, as it will be unavailable past that date.''<br><br />
<br />
On an unknown date, AnyHub seems to be back online again (as checked on December 14, 2011). All its data seems to remain intact, though there was some time it was unavailable around its closing date.<br />
<br />
== Tools we used ==<br />
The filenames it gets assigned seem to be ascending, so we can define a region and start downloading!<br><br />
Github page: https://github.com/ArchiveTeam/anyhub-grab<br><br />
To download all tools: "'''git clone git://github.com/ArchiveTeam/anyhub-grab.git ; cd anyhub-grab ; ./get-wget-warc.sh'''"<br><br />
<br />
=== How does the tool work? ===<br />
The dld-client is one of the easier download-tools.<br><br />
Just start a terminal/screen with "'''./dld-client.sh ''{your_nickname}'''''" (nickname needs to be A-Z, a-z, 0-9, - and _)<br><br />
The download stats/dashboard is here: http://anyhub.heroku.com/<br />
<br />
=== Uploading your data ===<br />
<br />
To upload the data you've downloaded, first contact SketchCow on IRC for an rsync slot. Once you have that you can run the <code>./upload-finished.sh</code> script to upload your data. For example, run this in your script directory: <code>./upload-finished.sh batcave.textfiles.com::YOURNICK/anyhub/</code><br />
<br />
== Info/stats about AnyHub ==<br />
They had great stats at http://www.anyhub.net/stats<br><br />
The json data: http://www.anyhub.net/stats/recent<br><br />
As of 18 November, 2011: '''1122585''' files ('''2.81''' TiB)<br />
<br />
== IRC/Chat ==<br />
See here: [[IRC]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7730Wikimedia Commons2012-05-14T02:27:33Z<p>Hydriz: /* Volunteers */ Adding new links</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:db48x|db48x]] || 2005-07-01 || 2005-12-31 || ? || ? || || ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Skipped 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01 (see [http://code.google.com/p/wikiteam/issues/detail?id=45 issue 45])<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || ''Downloading'' || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />September 2006: http://p.defau.lt/?KBZVE9rJ9hdz4DiKnegnUw<br />October 2006: http://p.defau.lt/?f3F85TyqHtdY0LhpQk_m1w<br />November 2006: http://p.defau.lt/?VZwhzt_2doA_Z3c65_JkXg<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydrizhttps://wiki.archiveteam.org/index.php?title=Wikimedia_Commons&diff=7690Wikimedia Commons2012-05-08T07:16:47Z<p>Hydriz: /* Volunteers */ processing queue</p>
<hr />
<div>{{Infobox project<br />
| title = Wikimedia Commons<br />
| image = Commons screenshot.png<br />
| description = Wikimedia Commons mainpage on 2010-12-13<br />
| URL = http://commons.wikimedia.org<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
'''Wikimedia Commons''' is a database of freely usable media files with more than 10 million files (when it held 6.8M files, the size was 6.6TB).<br />
<br />
Current size (based on January 18, 2012 estimate): '''13.3TB''', old versions '''881GB'''<br />
<br />
== Archiving process ==<br />
<br />
=== Tools ===<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonsdownloader.py Download script] (Python)<br />
* [http://code.google.com/p/wikiteam/source/browse/trunk/commonschecker.py Checker script] (Python)<br />
* [http://toolserver.org/~emijrp/commonsarchive/ Feed lists] (from 2004-09-07 to 2006-12-31; more coming soon)<br />
<br />
=== How-to ===<br />
Download the script and the feed lists (unpack it, it is a .csv file) in the same directory. Then run:<br />
* python commonsdownloader.py 2005-01-01 2005-01-10 [to download that 10 days range; it generates zip files by day and a .csv for every day]<br />
<br />
Don't forget 30th days and 31st days on some months. Also, February 29th in some years.<br />
<br />
To verify the download data use the checker script:<br />
* python commonschecker.py 2005-01-01 2005-01-10 [to check that 10 days range; it works on the .zip and .csv files, not the original folders]<br />
<br />
=== Tools required ===<br />
If downloading using a very new server (i.e. a default virtual machine), you got to download <tt>zip</tt> (Ubuntu: <tt>apt-get install zip</tt>)<br />
<br />
Python should be already installed on your server, if not then just install it!<br />
<br />
Also has a dependency on <tt>curl</tt> and <tt>wget</tt>, which should be installed on your server by default...<br />
<br />
=== Volunteers ===<br />
<br />
:'''''Please, wait until we do some tests. Probably, long filenames bug.'''''<br />
<br />
{| class="wikitable"<br />
! Nick !! Start date !! End date !! Images !! Size !! Revision !! Status !! Notes<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2004-09-07 || 2005-06-30 || ? || || r643 || ''Downloading'' ||<br />
|-<br />
| [[User:db48x|db48x]] || 2005-07-01 || 2005-12-31 || ? || ? || || ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-01 || 2006-01-10 || 13198 || 4.8GB || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive ||<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-01-11 || 2006-06-30 || ? || ? || r349 || '''Downloaded'''<br />'''Uploaded''' to the Internet Archive || Skipped 2006-02-05, 2006-02-11, 2006-02-25, 2006-03-10, 2006-03-23, 2006-04-21, 2006-04-25, 2006-05-01 (see [http://code.google.com/p/wikiteam/issues/detail?id=45 issue 45])<br />
|-<br />
| [[User:Hydriz|Hydriz]] || 2006-07-01 || 2006-12-31 || ? || ? || r643 || ''Downloading'' || Check:<br />July 2006: http://p.defau.lt/?IcMnwkx_j4H09FE_9iVgkQ<br />August 2006: http://p.defau.lt/?EmsKDtM0RXaysFNEABXJCQ<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|-<br />
| ? || ? || ? || ? || ? || ? || ? ||<br />
|}<br />
<br />
=== Errors ===<br />
* oi_archive_name empty fields: http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg<br />
* broken file links: http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory<br />
<br />
I'm going to file a bug in bugzilla.<br />
<br />
=== Uploading ===<br />
'''UPLOAD''' using the format: wikimediacommons-<year><month><br />
<br />
E.g. wikimediacommons-200601 for January 2006 grab.<br />
<br />
If you can, add it into the WikiTeam collection, or else just tag it with the wikiteam keyword, and it will be added in later on.<br />
<br />
== Other dumps ==<br />
There is no public dump of all images. [[WikiTeam]] is working on a scraper (see section above).<br />
<br />
Pictures of the Year (best ones):<br />
* [http://download.wikimedia.org/other/poty/poty2006.zip 2006] ([http://burnbit.com/torrent/177023/poty2006_zip torrent]) ([http://www.archive.org/details/poty2006 IA])<br />
* [http://download.wikimedia.org/other/poty/poty2007.zip 2007] ([http://burnbit.com/torrent/177024/poty2007_zip torrent]) ([http://www.archive.org/details/poty2007 IA])<br />
* [http://download.wikimedia.org/other/poty/2009 2009] ([http://www.archive.org/details/poty2009 IA])<br />
* [http://download.wikimedia.org/other/poty/2010 2010] ([http://www.archive.org/details/poty2010 IA])<br />
<br />
== Featured images ==<br />
<br />
Wikimedia Commons contains a lot [http://commons.wikimedia.org/wiki/Category:Featured_pictures_on_Wikimedia_Commons images of high quality].<br />
<br />
[[File:Featured pictures on Wikimedia Commons - Wikimedia Commons 1294011879617.png|500px]]<br />
<br />
== Size stats ==<br />
Combined image sizes hosted in Wikimedia Commons sorted by month.<br />
<pre><br />
date sum(img_size) in bytes<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
== See also ==<br />
* [[Wikipedia]], some Wikipedias have enabled the local upload form, English Wikipedia contains about 800000 images, a lot of under fair use<br />
<br />
== External links ==<br />
* http://commons.wikimedia.org<br />
* [http://dumps.wikimedia.org/other/poty/ Picture of the Year archives]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hostings]]<br />
[[Category:Wikis]]</div>Hydriz