Difference between revisions of "Posterous"

From Archiveteam
Jump to navigation Jump to search
Line 18: Line 18:


Tools: [https://github.com/ArchiveTeam/smeg git]  
Tools: [https://github.com/ArchiveTeam/smeg git]  
=== Range Claim ===
{| class="wikitable"
! Range
! Chunk(s)
! User
! Status
! Uploaded Hostnames
|-
| 1 - 999,999
| 1-99
| closure
| Done (742846)
| archived
|-
| 1,000,000 - 1,999,999
| 100-199
| db48x / closure
| Done (994303)
| archived
|-
| 2,000,000 - 2,009,999
| 200
| aggroskater
| Done (8907)
| [https://dl.dropbox.com/u/67912136/2000000.hostnames.gz 2000000.hostnames.gz] archived
|-
| 2,010,000 - 2,019,999
| 201
| aggroskater
| Done (8094)
| [https://dl.dropbox.com/u/67912136/2010000.hostnames.gz 2010000.hostnames.gz] archived
|-
| 2,020,000 - 2,999,999
| 202-299
| dcmorton
| Done
| [http://spartacus.networkwhisperer.com/2020000.hostnames.tar.gz 2020000.hostnames.tar.gz] archived
|-
| 3,000,000 - 3,999,999
| 300-399
| closure
| Done (928023)
| archived
|-
| 4,000,000 - 4,999,999
| 400-499
| chazchaz101
| Done
| [https://cpub.us.to/4000000.hostnames.tar.gz 4000000.hostnames.tar.gz]
|-
| 5,000,000 - 5,999,999
| 500-599
| Smiley / Soult
| Done (984360)
| [http://helo.nodes.soultcer.com/posterous/5000000-5999999.hostnames.gz 5000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/5000000-5999999.sqlite.gz 5000000.sqlite.gz] archived
|-
| 6,000,000 - 6,999,999
| 600-699
| dcmorton
| Done
| [http://spartacus.networkwhisperer.com/6000000.hostnames.tar.gz 6000000.hostnames.tar.gz] archived
|-
| 7,000,000 - 7,999,999
| 700-799
| balrog / S[h]O[r]T
| Done
| [http://www.doinkdoink.us/hostnames.tgz hostnames.tgz] archived
|-
| 8,000,000 - 8,999,999
| 800-899
| beardicus/Soult
| Done (984258)
| [http://helo.nodes.soultcer.com/posterous/8000000-8999999.hostnames.gz 8000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/8000000-8999999.sqlite.gz 8000000.sqlite.gz] archived
|-
| 9,000,000 - 9,999,999
| 900-999
| GLaDOS
| Done
| [http://posterous.archivingyoursh.it/9000000.hostnames.tar.gz 9000000.hostnames.tar.gz] archived
|-
| 10,000,000 - 10,019,999
| 1000-1001
| gui77
| Done
| [http://dl.dropbox.com/u/3592423/archive/10000000-10019999.sqlite.gz 10000000-10019999.sqlite.gz] archived
|-
| 10,020,000 - 10,609,999
| 1002-1060
| S[h]O[r]T
| Done
| [http://www.doinkdoink.us/hostnames.tgz hostnames.tgz] archived
|-
| 10,610,000 - 10,709,999
| 1061-1070
| siliconvalleypark
| Done
|[https://dl.dropbox.com/u/286436/archiveteam/posterous/10610000-10709999-posterous.sqlite.gz 10610000-10709999-posterous.sqlite.gz] archived
|-
| 10,710,000 - 11,009,999
| 1071-1100
| S[h]O[r]T
| Done
| [http://www.doinkdoink.us/hostnames.tgz hostnames.tgz] archived
|}


== Archiving a single blog ==
== Archiving a single blog ==

Revision as of 03:40, 22 February 2013

Posterous
Posterous home.png
URL http://posterous.com
Status Closing
Archiving status In progress...
Archiving type Unknown
IRC channel #preposterus (on hackint)

Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. Announcement

Site List Grab

We have assembled a list of Posterous sites that need grabbing. Total found: 9898986

http://archive.org/details/2013-02-22-posterous-hostname-list

Tools: git

Archiving a single blog

Developing a command to archive a single blog, including all images and assets.

 USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
 wget "https://$hostname" --warc-file=$hostname.warc \
   --mirror --no-check-certificate --span-hosts \
   --domains=$hostname,s3.amazonaws.com,files.posterous.com,getfile.posterous.com,getfile0.posterous.com,getfile1.posterous.com,getfile2.posterous.com,getfile3.posterous.com,getfile4.posterous.com,getfile5.posterous.com,getfile6.posterous.com,getfile7.posterous.com,getfile8.posterous.com,getfile9.posterous.com,getfile10.posterous.com \
   -U "$USER_AGENT" -nv -e robots=off --page-requisites \
   --timeout 60 --tries 20 --waitretry 5 \
   --warc-header "operator: Archive Team" \
   --warc-header "posterous-hostname: $hostname" 

Using https because it allows for http pipelining, which may help prevent being banned.