MS Paint Fan Adventures

From Archiveteam
Jump to: navigation, search

Stub to write down notes about project in progress

Archiving the contents of https://mspfa.com/ - uses a JS app to read text-based stories with embedded images and Flash content. Often link to YouTube alternates for flashes.

Custom archiver code is at https://github.com/riking/mspfa-archiver . Giant tangled mess of Go scripts.

Contact @riking to get upload permissions for the Archive collection

Operating the custom archiver

Prep

  1. Clone into $GOPATH - there will probably be updates to the script
  2. Get wpull: pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt
  3. Symlink it into the current directory: ln -s $(which wpull)
  4. Get youtube-dl, youtube-dl -U as needed
  5. If needed, symlink ./target to a bigger drive or specify -o=folder every time you run it
  6. Build the code (TODO: patches to datatogether/warc) go build -v .
  7. Put IAS3 credentials into ./ias3.json - {"AccessKeyID": "...", "SecretAccessKey": "..."}

Running

While testing, make sure to include both -test -ident MSPFA_Test_12345 so that the uploaded Archive items go into test_collection.

When testing changes to the script, use -devScript to load the script from the local folder.

Basic usage: ./mspfa-archiver -dl -ident auto -s 1234

TODO - Script to automatically run on each story ID and save a list of failures

= Recovering from Errors

If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the -fu ("F"orce "U"pload) flag.

If you encounter a dead domain and don't want to wait for wpull, include -wpullArgs '--exclude-domains majhost.com,g0m.yore.ma' etc etc.

If the download works fine but the IA upload fails, run the script again with just -ident and no -dl flag. That will avoid modifying the WARC and just upload what's already downloaded.

Work Division

1-999

Archiver: riking

Stage: Initial archiving

Problematic story IDs:

EDIT: using this google sheet instead for now https://docs.google.com/spreadsheets/d/1DQ52iRQ3w9M6hARTFhgzxTplqPjMfIMiZNxfBySa6Vw/edit#gid=0

(note: this table should not be taken as an example, it's translated from my bad notes)

Story ID Archived OK? Problems
1 Mostly Single failed Photobucket URL.
4 Yes Photobucket
12 Yes Photobucket
14 Yes 404s
17 No Dead domain: forum-files2.fobby.net
19 Mostly broken URL: pasted twice in a row
21 No, Contact Author "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com
22 Mostly broken URL: pasted twice in a row
24 Yes Photobucket
25 Yes 404s: imageshack, 22/22 rescued from Wayback
26 Mostly 404s: imageshack, -3(41)/44 rescued from Wayback
33 No Dead domain: yore.ma
35 Mostly broken URL: single photobucket image ends in "89.pngp"
45 No Dead webhost customer: http://armada.lostsignalweb.com; Dead domain: imageplay.net; 404s: imageshack
46 No Dead domain: myfrogbag
48 Mostly 404s: imageshack, -10(105)/115 rescued from Wayback
53 No Dead domain: ardekantur.com
55 No 404s: imageshack
59 No Photobucket
61 No 404s
63 No 404s
66 No Dead domain: myfrogbag
67 No 404s: photobucket
76 No 404s: photobucket
77 No 404s: imageshack
78 No 404s: imageshack
81 No Dead domain: TBD
87 No 404s; Pulling HTML pages
89 Yes 404s: imageshack
93 No Dead domain: TBD
104 No Dead domain: suspended webhost
106 No Dead domain: TBD
108 No 404s: imagebin
109 No 404s; Dead domain
111 No 404s: imageshack
135 No 404s: photobucket
158 No 404s
160 Yes 404s: imageshack
227 No uses blob: urls??
241 No Dropbox public folder
263 No HTML: imageshack homepage
270 No Dropbox public folder
277 No Dead domain:
285 No HTML
307 No 404s
308 No 404s
314 No Dead domain: blacktourney.com
319 No Dropbox public folder
323 No i.minus.com
325 No Dead domain: majhost.com
331 No Dropbox public folder
339 Mostly A Single Photobucket BWE
341 No Dropbox public folder
350 No Dead domain: majhost.com
351 Some Imageshack 404s
352 Mostly SWFs hosted at files.myfrogbag.com
353 Mostly SWFs hosted at files.myfrogbag.com
357 No Deleted imgur files
367 Yes Single photobucket BWE, retrieved from Wayback
728 No Dropbox public folder

1000-1999

2000-2999

3000-3999