Difference between revisions of "Posterous"

From Archiveteam
Jump to navigation Jump to search
m (Reverted edits by Megalanya1 (talk) to last revision by Jscott)
(37 intermediate revisions by 13 users not shown)
Line 4: Line 4:
| description =  
| description =  
| URL = http://posterous.com
| URL = http://posterous.com
| project_status = {{closing}}
| project_status = {{closed}}
| archiving_status = {{inprogress}}
| source = [https://github.com/ArchiveTeam/posterous-grab posterous-grab]
| archiving_status = {{saved}}
| irc = preposterus
| irc = preposterus
| tracker = [http://tracker.archiveteam.org/posterous/ here]
| tracker = [http://tracker.archiveteam.org/posterous/ here]
}}
}}


Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement] See [[Posterous#Warrior]] below for how to help.
'''Posterous''' was a blogging platform started in May 2008. It was acquired by [[Twitter]] on March 12, 2012 and shut down April 30, 2013.


== Frequently Asked Questions ==
==Shutdown announcement==
[http://web.archive.org/web/20130501011949/http://blog.posterous.com/thanks-from-posterous Posterous will turn off on April 30]:


=== It's going down! How can I help? ===
:Posterous launched in 2008. Our mission was to make it easier to share photos and connect with your social networks. Since joining Twitter almost one year ago, we’ve been able to continue that journey, building features to help you discover and share what’s happening in the world – on an even larger scale.
Glad you're interested! First and foremost, consider running our prepared Virtual Machine. Please see [[Posterous#Warrior]] down below.
:On April 30th, we will turn off posterous.com and our mobile apps in order to focus 100% of our efforts on Twitter. This means that as of April 30, Posterous Spaces will no longer be available either to view or to edit.
:Right now and over the next couple months until May 31, you can download all of your Posterous Spaces including your photos, videos, and documents.
:Here are the steps:
:#Go to <nowiki>http://posterous.com/#backup</nowiki>.
:#Click to request a backup of your Space by clicking “Request Backup” next to your Space name.
:#When your backup is ready, you'll receive an email.
:#Return to <nowiki>http://posterous.com/#backup</nowiki> to download a .zip file.
:If you want to move your site to another service, WordPress and Squarespace offer importers that can move all of your content over to either service. Justmigrate offers a service to move your site to Tumblr.
:More information on these services can be found here:
:*http://en.support.wordpress.com/import/import-from-posterous/
:*http://help.squarespace.com/customer/portal/articles/881311-importing-content...
:*http://justmigrate.com
:We’d like to thank the millions of Posterous users who have supported us on our incredible journey. We hope to provide you with as easy a transition as possible, and look forward to seeing you on Twitter. Thank you.
:Sachin Agarwal
:Founder and CEO


=== What do you guys need? A huge fat pipe, a.k.a Bandwidth? ===
==Archives==
Needed/Wanted: Interested volunteers in general and IP addresses. A lot of bandwidth isn't needed, per se. You don't need a fat monster pipe/Internet tube to help out.
We saved it! Discussion around and details of our efforts have been archived to the [[Posterous/War room|Posterous war room]]. The final moments has been [[Posterous/Story| retold as a story]].
* [http://archive.org/details/archiveteam_posterous Preposterous! The Posterous Grab] on archive.org
* [http://archive.org/details/2013-02-22-posterous-hostname-list List of hostnames]


=== Can I donate some cash instead? ===
===I had a Posterous blog. How can I get my files back?===
Not really, not to the ArchiveTeam specifically. If you feel like you could let go off of a few buckeroo's, consider donating to the [http://archive.org/donate.php Internet Archive]. They're awesome and do awesome things, just like us! (Yes, you're included in "us" - you're here, reading already!)


=== Why aren't we fetching more/bashing the shit out of Posterous to get done already?! ===
There are two ways available:
Easy tiger! We all love us, getting a web service down on its knees good.. But we want to get as much as possible out of Posterous.


We've currently rate limited the project and continue to adjust accordingly as well as try out tactics. We've unfortunately been able to bring Posterous down to it's knees a good few times indeed.
* Check if your blog has been ingested in the [http://archive.org/web Wayback Machine].
* Extract the files from the WARC files with some [[The WARC Ecosystem|WARC tools]].
** This method requires power user skills. In essence, scan each CDX index file and then extract it from the appropriate WARC files. Ask us in [[IRC]] for help.


The problem is that Posterous is not designed for the load it's currently getting. Especially with us. They've designed Posterous so that the front-ends will hit a cache with content, before hitting the back-end. Ok, but why isn't that helping? We're going through *all* of their accounts and posts and we are ruining the cache. That means Posterous's back-end can't take the request rate at all.. Which will make requests return bad data or no data if we go too fast. Please keep this in mind.
==Press==
* [http://www.dailydot.com/news/archive-team-preserving-posterous/ Archive Team races to preserve Posterous before it goes dark], ''The Daily Dot'', 2013-03-13


=== The warrior tells me to ask to run the Posterous project? ===
{{navigation box}}
Yes, yes - it indeed does. If you've read this - feel free to click on it and go on. This message/warning/notice was introduced earlier in the archiving project when we got banned a lot. Other Warrior projects havn't been this bitchy about banning - therefore the notice is still there.
 
=== Will I get banned? ===
If you help out by running the [[ArchiveTeam Warrior]] - it's very unlikely that your IP will get banned. Our objective is to get as much of Posterous as possible. Therefore we have taken measures and continue to take measures to rate limit, check for errors and retry and back off when appropriate to ensure getting as much as possible. There's also some magic over at Posterous end, we won't go into details here though.
 
If you however are starting your own "Rape the Posterous Silly"-project with own code, or are running too many concurrent jobs - with for example the stand alone code mentioned below (seesaw script for advanced users) - yes. It's very likely you'll get banned.
 
=== How do I know if I got banned? ===
If in doubt, shoot of a request to [http://www.posterous.com] from the same IP as your Warrior VM. Or log on your Warrior and do "curl -v http://www.posterous.com". If you don't even get a connection, you're surely banned.
 
=== How long am I banned, if banned? ===
Good question! No good answer! Next!
 
In the beginning of the crawl, individual IPs were banned for days - if not mistaken, a week or so. After experimenting with... overloading... Posterous from different IPs, the ban time have shortened.
 
The answer is: Hours to weeks. It's unclear.
 
=== OK, I'm running the Warrior - I'm getting 502/5XX errors!! ===
That's not a question.
 
Posterous will gag out 50X's occationally - we've taken measures to back off for a period of time and retry for a certain number of times. It's alright.
 
This does not mean your IP has been banned.
 
=== Uh, so.. looks like there's plenty of spam on Posterous? ===
Yep, but we don't care. Grab it all. It's not our thing to decide what gets saved and not - especially if we have the chance to save it all.
 
Maybe it'll be useful for a spam researcher in the future. maybe not.
 
 
=== Where can I see the project status? ===
You can see the status at [[http://tracker.archiveteam.org/posterous/]] - which is the dashboard for this project.
 
=== Cool! So you're almost done with this? ===
Sadly, no! All hostnames are not tracked on the dashboard - because of certain limitations in the current tracker/dashboard. We've unloaded a lot of the users/items. In total, we believe there to be about 10 Million hostnames/sites/users.
 
=== The tracker/status dashboard is barfing! Or giving 502 Bad Gateways. What's up? ===
The tracker/dashboard is a bit fragile - so please don't link it out all too much. It's not optimized for maximum page loads. It's however functional and the source code is freely available on [[https://github.com/ArchiveTeam/universal-tracker GitHub]] - feel free to look into that and if you see anything that can be improved, submit a pull request.
 
Our tracker admins will of course kick it back to life if it's acting up. Please join our IRC Channel for status updates regarding the tracker and such
 
=== My userstats seems to be reset on the dashboard, what gives? ===
The user details are cached for a set of time, we've had the caching act up a few times. Please rest assured that every submitted work DOES get counted and it gets in. If you see your username getting submissions and then resetting the total - feel free to poke us in the IRC Channel (anyone with a @). We'll kick the cache in the butt, and your stats will show like it should. This shouldn't happen all that often though.
 
=== How do I know if my posterous favorite blogobongobloggo will be fetched? ===
There's no super nice way, but if you go to [[Posterous#Site List Grab]] below, you can grab the hostname list that we've spidered forth and check by opening it and searching for your username/hostname. Or you could use 'grep' on it, you know - like a man.
 
=== Can I opt out? I don't want to be saved! ===
Tough luck, it's already public - that's why we're grabbing it. Besides, don't be embarrassed! We all learn through history - let the history be.
 
=== This is cool and all, but where the fuck is the data going? ===
We'll make sure this data stays public after it's been downloaded. We'll make sure that the awesome duders and duduetters at [[http://archive.org Internet Archive]] gets a copy for sure. We're grabbing all the Posterous sites in a Internet Archive friendly file format called WARC (WebARCive) - so they should be able to put this into the Wayback machine - if they'd like to.
 
=== So, my Warrior doesn't get networking with VirtualBox on Ubuntu, what gives? ===
You should do the following:
VBoxManage modifyvm "archiveteam-warrior-2" --natdnshostresolver1 on
VBoxManage modifyvm "archiveteam-warrior-2" --natdnsproxy1 on
 
Thanks and shout outs goes to '''hdevalenc'''
 
 
== How to help ==
 
 
=== Warrior ===
You can help by installing and running the [[ArchiveTeam Warrior]] and selecting the "posterous" project. The Warrior is a virtual machine you can run in Virtualbox seamlessly to help out.
 
=== Seesaw script (for advanced users)===
 
'''Download:'''
 
git clone https://github.com/ArchiveTeam/posterous-grab.git
 
Follow instructions to install seesaw and edit script for IP address.
 
For wget: run ./get-wget-lua.sh
 
'''Commands:'''
 
If you are on a box with more than one public IP address, you can place an IP address after --bind-address= on line 175. Example: "--bind-address=192.168.1.1",
# install prerequisites
sudo apt-get install -y lua5.1 liblua5.1-0-dev python python-setuptools python-dev git-core openssl libssl-dev python-pip rsync gcc make git
# grab the posterous scripts
git clone http://github.com/ArchiveTeam/posterous-grab.git
cd posterous-grab
# grab and install the seesaw kit (for communicating with the tracker)
git clone http://github.com/ArchiveTeam/seesaw-kit
cd seesaw-kit
sudo pip install -r requirements.txt
sudo pip install seesaw
cd ../
# download and compile wget-lua
chmod +x get-wget-lua.sh && ./get-wget-lua.sh
# run the pipeline and start downloading users. Use --help to see additional parameters
# once started progress can be viewed from a browser on port 8001
run-pipeline --concurrent 1 --address <your_ip_address> pipeline.py <your_username>
 
== Site List Grab ==
 
We have assembled a list of Posterous sites that need grabbing. Total found: 9898986
 
http://archive.org/details/2013-02-22-posterous-hostname-list
 
Tools: [https://github.com/ArchiveTeam/smeg git]
 
== Goal ==
 
We found 9.8 million possible posterous accounts. After filtering out the banned/spam accounts we have 6,677,720 left.
 
They close April 30th, 2013. We have 48 days left and 1,200,000 accounts downloaded.
 
60 sec * 60 min * 24 hours = 86,400 seconds a day
 
(6,677,720 - 1,400,000)/86,400 = 61.1 days at 1 account a second.
 
61.1 days (1 fetch a second)/48 days left = 1.27 and round that up to 2 accounts per second actually needed.
 
Now taking into account that not all accounts are the same size and the previous outages we have had the safe number would be 3x the above answer. So we need to download 6 full accounts per second to positively get all of Posterous before it shuts down. This is also based on the assumption that we will not have to re-download any accounts at the end.

Revision as of 16:06, 17 January 2017

Posterous
Posterous home.png
URL http://posterous.com
Status Offline
Archiving status Saved!
Archiving type Unknown
Project source posterous-grab
Project tracker here
IRC channel #preposterus (on hackint)

Posterous was a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and shut down April 30, 2013.

Shutdown announcement

Posterous will turn off on April 30:

Posterous launched in 2008. Our mission was to make it easier to share photos and connect with your social networks. Since joining Twitter almost one year ago, we’ve been able to continue that journey, building features to help you discover and share what’s happening in the world – on an even larger scale.
On April 30th, we will turn off posterous.com and our mobile apps in order to focus 100% of our efforts on Twitter. This means that as of April 30, Posterous Spaces will no longer be available either to view or to edit.
Right now and over the next couple months until May 31, you can download all of your Posterous Spaces including your photos, videos, and documents.
Here are the steps:
  1. Go to http://posterous.com/#backup.
  2. Click to request a backup of your Space by clicking “Request Backup” next to your Space name.
  3. When your backup is ready, you'll receive an email.
  4. Return to http://posterous.com/#backup to download a .zip file.
If you want to move your site to another service, WordPress and Squarespace offer importers that can move all of your content over to either service. Justmigrate offers a service to move your site to Tumblr.
More information on these services can be found here:
We’d like to thank the millions of Posterous users who have supported us on our incredible journey. We hope to provide you with as easy a transition as possible, and look forward to seeing you on Twitter. Thank you.
Sachin Agarwal
Founder and CEO

Archives

We saved it! Discussion around and details of our efforts have been archived to the Posterous war room. The final moments has been retold as a story.

I had a Posterous blog. How can I get my files back?

There are two ways available:

  • Check if your blog has been ingested in the Wayback Machine.
  • Extract the files from the WARC files with some WARC tools.
    • This method requires power user skills. In essence, scan each CDX index file and then extract it from the appropriate WARC files. Ask us in IRC for help.

Press