User:Djsmiley2k

From Archiveteam
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Stuff

  • Need to figure full wiki/site layout - currently everything giant missmash
  • Will set fire to anyone who breaks the nice design changes
  • While html in pages can make them look "nice" its ****ing annoying to try and edit nicely if your not a html expert - look into converting into proper mediawiki mark up instead
    • Can we get some templates for projects (what is a project!?) / archive tasks / other crap

Generic Wget command

 export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
 export SAVE_HOST=""
 export WARC_NAME=""
 wget \
 -e robots=off --mirror --page-requisites \
 --waitretry 5 --timeout 60 --tries 5 --wait 1 \
 --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
 -U "$USER_AGENT" "$SAVE_HOST"

Forum Grab

src/wget --save-cookies team17-cookies.txt --post-data 'vb_login_username=USERNAMEGOESHERE&vb_login_password=PASSWORDGOESHERE&securitytoken=guest&cookieuser=1&do=login' http://forum.team17.com/login.php?do=login
src/wget --load-cookies team17-cookies.txt -e robots=off --wait 0.25 "http://forum.team17.com/" --mirror --warc-file="at-team17-forum"

Limit Warrior b/w

VBoxManage bandwidthctl archiveteam-warrior-2 --name Limit --add network --limit 3

Must be done while VM is powered off - can't be done with saved state. :(

Remote warrior control

Either ssh forward to local system:

ssh -L 8001:localhost:8001 tim.bowers@xxx.xxx.xxx.xxx -f -N 

OR

curl -d "project_name=punchfork" http://localhost:8001/api/select-project

New Versions

main page


Build your own EC2 ami/instance

select which ever instance type you want - this is built out on ubuntu 13.04/lowest tier (free!)

login (on ubuntu you login as ubuntu) via ssh

Firstly we need to setup the basic system

sudo apt-get install build-essential lua5.1 liblua5.1-0-dev python python-setuptools python-dev git-core openssl libssl-dev python-pip rsync gcc make git screen

Then we need the seesaw kit, which is used for the grabbing parts

sudo git clone https://github.com/ArchiveTeam/seesaw-kit.git
cd ./seesaw-kit
sudo pip install -r requirements.txt

Now we move onto the project specific stuff, for xanga we'd do:

cd..
sudo git clone https://github.com/ArchiveTeam/xanga-grab.git
cd ./xanga-grab
./get-wget-lua.sh ### building wget-lua

And finally, we start the pipeline in a screensession

screen ../seesaw-kit/run-pipeline --concurrent 3 pipeline.py YOURNICKNAME

Important URLs

Is the rsync host up?


EC2 Instance setups

debian-squeeze-i386-warrior (ami-9c69f1f5)

User Text: {"downloader": "Smiley", "selected_project": "posterous", "concurrent_items": "6", "shared:rsync_threads": "4"}

Add second disk - 10Gb

Open port 22 0.0.0.0/0

Setup SSH forwarding: ssh -i ./.ssh/amazonkey.pem -N -f -L 8002:localhost:8001 ubuntu@***********.compute-1.amazonaws.com

Set automatic shutdown : echo "0 20 * * * root /sbin/shutdown -h now" | sudo tee /etc/cron.d/shutdown

Digital Ocean

sign up for DO -> use SSDTWEET code -> make a $10 payment -> unleash 500 instances upon the world

apt-get update && apt-get -y install git make python-pip libgnutls-dev liblua5.1-dev && pip install seesaw && git clone https://github.com/ArchiveTeam/yahoomessages-grab.git && cd yahoomessages-grab/ && ./get-wget-lua.sh && run-pipeline pipeline.py --disable-web-server Smiley