Friendster

Friendster

URL	http://www.friendster.com/^{[IA•Wcite•.today•MemWeb]}
Status	Closing
Archiving status	In progress...
Archiving type	Unknown
IRC channel	#archiveteam-bs (on hackint)

Friendster is an early social networking site which announced on April 25th, 2011 that most of the user-generated content on the site would be deleted on May 31st, 2011. It's estimated that Friendster has over 115 million registered users.

How to help

Scrape profiles

We're going to break up the user ids into ranges and let individuals claim a range to download. Use this table to mark your territory:

Start	End	Status	Size (Uncompressed)	Claimant
1	999	Uploaded	55MB	closure
1000	1999	Uploaded	283MB	alard
2000	2999	Uploaded	473MB	DoubleJ
3000	3999	Done	234MB	Teaspoon
4000	4999	Uploaded	183MB	Paradoks
5000	5999	Uploaded	202MB	robbiet48/Robbie Trencheny (Amsterdam)
6000	9999	Uploaded	1.1gb	Sketchcow/Jason Scott
10000	29999	Claimed		Sketchcow/Jason Scott
30000	31999	Uploaded	485mb	Sketchcow/Jason Scott
32000	32999	Uploaded	201MB	Paradoks
33000	33999	Uploaded	241mb	closure
34000	100000	Uploaded	unknown (20+ gb?)	closure
100000	101000	Done	205.6 MB	xlene
101001	102000	Uploaded	232MB	robbiet48/Robbie Trencheny (Florida)
102001	103000	Uploaded	241MB	robbiet48/Robbie Trencheny (Amsterdam)
103001	104000	Claimed		yipdw
104001	105000	Done	231MB	Coderjoe
105001	114999	Uploaded	2.1GB	Paradoks
115000	116999	Claimed		yipdw
117000	119999	Done	720MB	Coderjoe
120000	130000	Claimed		robbiet48/Robbie Trencheny (Florida)
130000	140000	Claimed		robbiet48/Robbie Trencheny (Amsterdam)
140001	160000	Claimed		yipdw
160001	180000	Claimed		jch
180001	200000	Claimed		yipdw
200001	220000	Claimed		Coderjoe
220001	230000	Claimed		xlene
230001	240000	Done	4.4GB	alard
240001	250000	Claimed		Teaspoon
250001	260000	Claimed		robbiet48/Robbie Trencheny (Newark)
260001	270000	Uploaded	4.0GB	robbiet48/Robbie Trencheny (Fremont 1)
270001	280000	Uploaded	3.2GB	robbiet48/Robbie Trencheny (Fremont 2)
280001	290000	Claimed		DoubleJ (updated script started at 281783)
290001	300000	Claimed		dnova
310001	320000	Claimed		Coderjoe
320001	330000	Claimed		robbiet48/Robbie Trencheny (Oakland)
330000	340000	Done		closure
340000	400000	Claimed		Sketchcow/Jason Scott
400001	500000	Claimed		DoubleJ
500000	600000	Claimed		closure
600001	700000	Claimed		no2pencil
700001	800000	Claimed		proub/Paul Roub
800001	900000	Claimed		proub/Paul Roub
900001	1000000	Claimed		Soult
1000001	1100000	Claimed		Avram
1100001	1200000	Claimed		Paradoks
1200001	1300000	Claimed		db48x
1300000	1400000	Claimed		closure (penguin)
1400001	1500000	Claimed		alard
1500001	1600000	Claimed		ksh/omglolbah
1600001	1700000	Claimed		ksh/omglolbah
1700001	1800000	Claimed		ksh/omglolbah
1800001	1900000	Claimed		ksh/omglolbah
1900001	2000000	Claimed		ksh/omglolbah
2000001	2100000	Claimed		ksh/omglolbah
2100001	2200000	Claimed		Teaspoon
2200001	2300000	Claimed		Darkstar
2300001	2400000	Claimed		Darkstar
2400001	2500000	Claimed		underscor (snookie)
2500001	2600000	Claimed		Bardicer
	124328261	Pool		— Check for an update to the script!

Please try and claim 100,000 id blocks at this time, or more if your system has adequate space.

(Side note: User IDs below 340000 are suspect for blogs. Jason will run a final blog-check at the end.)

Tools

friendster-scrape-profile

Script to download a Friendster profile download or Github repository

You need a Friendster account to use this script. (Note: if you are creating an account, mailinator email addresses are blocked) Add your login details to a file username.txt and a password.txt and save those in the directory of the download script.

Run with a numeric profile id of a Friendster user: ./friendster-scrape-profile PROFILE_ID

Currently downloads:

the main profile page (profiles.friendster.com/$PROFILE_ID)
the user's profile image from that page
the list of public albums (www.friendster.com/viewalbums.php?uid=$PROFILE_ID)
each of the album pages (www.friendster.com/viewphotos.php?a=$id&uid=$PROFILE_ID)
the original photos from each album
the list of friends (www.friendster.com/fans.php?uid=$PROFILE_ID)
the shoutoutstream (www.friendster.com/shoutoutstream.php?uid=$PROFILE_ID) and the associated comments
the Friendster blog, if any

It does not download any of the widgets.

Downloading one profile takes between 6 to 10 seconds and generates 200-400 kB of data (for normal profiles).

Automating the process

(This is all unix-only; it won't work in Windows.)
1. Create a Friendster account
2. Download the script; name it 'bff.sh'.
3. In the directory that you put the bff.sh, make a username.txt file that has your Friendster e-mail address as the text in it
4. In the directory that you put the bff.sh, make a password.txt file that has your Friendster password as the text in it.
5. Choose your profile range.
6. Edit that section to say what range you'll do.
7. On the command line, type (with your range replacing the '#'s.):

for i in {#..#}; do bash bff.sh $i; done

Note: If you get an error like bff.sh: line 26: $'\r': command not found, you will need to convert the script to use UNIX-style line endings:

$ dos2unix bff.sh

or if you somehow find yourself without the dos2unix command, do this:

$ sed "s/\r//" bff.sh > bff-fixed.sh
$ mv bff-fixed.sh bff.sh

If you have vi (the editor) you can also open the file in vi, type <ESC>:set ff=unix followed by <ESC>:wq Then run the command in step 7 again.

Paralellizing the download

If you, like me, just realized that at a rate of ~10-20 seconds per profile it will take you over 10 days to grab 100000 profiles, you might want to give this quick and dirty script a try:

#!/bin/bash

START=2300001  ## CHANGE THIS FOR YOUR RANGE!
END=2400000    ## CHANGE THIS ALSO!!!

id=$START

while test $id -lt 2400000; do
  ./bff.sh $id >/dev/null 2>/dev/null &
  sleep 2;
  # check number of running processes
  numprocs=$(ps ax | grep bff.sh | grep -v grep|wc -l)
  test $numprocs -gt 15 && (echo "Many processes ($numprocs), sleeping a bit..."; sleep 10)
  id=$((($id + 1)))
done;

If you run this script, it will launch the bff.sh script in the background for an ID, wait 2 seconds, and launch the next script. These scripts will pile up in the background and download a profile each, and then simply terminate. You will get about (avg. download time per profile)/(delay) parallel processes. Don't set the delay too low or you will starve your downlink. 2s works good for me, I had around 10-20 seconds for each profile, and I get around 10 parallel processes (more than the expected 3-5 because some of the profiles are bigger than expected)

the good thing is that, as soon as you recognize that you get too many processes, you can just kill this script and no new processes will be launched. Just wait for those still running to complete and re-run the script

Use ps -ax | grep bff.sh | grep -v grep to check your processes (or, as I do, use watch "ps -ax | grep bff.sh | grep -v grep" to see it continuously updated)

Site Organization

Content on Friendster seems to be primarily organized by the id number of the users, which were sequentially assigned starting at 1. This will make it fairly easy for wget to scrape the site and for us to break it up into convenient work units. The main components we need to scrape are the profile pages, photo albums and blogs, but there may be others. More research is needed

Profiles

Urls of the form 'http://profiles.friendster.com/<userid>'. Many pictures on these pages are hosted on urls that look like 'http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>.jpg', but these folders aren't browsable directly. Profiles will not be easy to scrape with wget.

Photo Albums

A user's photo albums are at urls that look like 'http://www.friendster.com/viewalbums.php?uid=<userid>' with individual albums at 'http://www.friendster.com/viewphotos.php?a=<album id>&uid=<userid>'. It appears that the individual photo pages use javascript to load the images, so they will be very hard to scrape.

On the individual album pages, the photo thumbnails are stored under similar paths as the main images. i.e. if the album thumb is at http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>m.jpg, just drop the final 'm' to get the main photo (or replace it with a 't' to get an even tinier version).

Blogs

Unknown.

Friendster

Contents

How to help

Scrape profiles

Tools

friendster-scrape-profile

Automating the process

Paralellizing the download

Site Organization

Profiles

Photo Albums

Blogs

Navigation menu

Friendster

How to help

Scrape profiles

Tools

friendster-scrape-profile

Automating the process

Paralellizing the download

Site Organization

Profiles

Photo Albums

Blogs

Navigation menu

Search