User:Vitzli
Jump to navigation
Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Saved stuff
- JBG Travels youtube channel, partial download, 847 videos total: part 1, part 2, part 3.
Several videos were either marked private or removed at the request of his employer, although they contained only road video. - Encyclopedia Astronautica snapshot (2015-10-22) according to Alive... OR ARE THEY - is on the watchlist
- Pole shift survival library — hasn't been updated since 2013, was quite popular among survival/prepping folks, not endangered as website is still online, but torrent is decaying.
- Amazon reviews webdata 1995-2013 — still available, but links were hidden.
- CGP Grey youtube channel, tar archive per year: 2010,2011, 2012, 2013, 2014, 2015
- SmarterEveryDay youtube channel, tar archive per year: 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
Prospecting IA.BAK collections
Tools required: Python 3 libraries/modules - internetarchive, ia-mine; jq - json processing; parallel - run multiple programs in for each fashion.
archive.org account required (S3 keys) for ia-mine and internetarchive (ia) tools
2016-02-03 census
- 10 shards
- 79 collections
- 142462 items total, 106054 unique items (my mistake, do uniq before doing large batch)
jq code
Remove 'collection' items:
parallel --jobs 4 'jq '"'"'. | select(.mediatype != "collection") | .identifier'"'"' '"$F_PREFIX"'/{}.col.json | tr -d '"'"'"'"'"
' > '"$F_PREFIX"'/{}.items.json'
Remove 'uploader' field:
parallel --jobs 4 'jq -c '"'"'del(.metadata.uploader)'"'"' '"$F_PREFIX"'/{}.mined.json > '"SHARDS-20160203-cleaned/$F_PREFIX"'/{}.cleaned.json'