Talk:Internet Archive Census

From Archiveteam
Revision as of 11:42, 20 November 2016 by Emijrp (talk | contribs) (2012 census: new section)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Tools

The jq command line for parsing the census json was not obvious to me, so here are two examples to get you started. To get the id and total_size for each item on the same row, separated by spaces:

jq -r '[.id, " ", .total_size | tostring] | add'

To get the hash and name for each file, you have to split up the "files" array and get the info from each element:

jq -r '.files | .[] | [.md5, " ", .name | tostring] | add'

--Sep332 10:01, 12 March 2015 (EDT)

2012 census

On August 2012 I did a "census" using the search engine exporting capabilities. Internet Archive had 4.9 million items on that date. Emijrp (talk) 06:42, 20 November 2016 (EST)