Difference between revisions of "Talk:Internet Archive Census"

From Archiveteam
Jump to navigation Jump to search
 
(→‎2012 census: new section)
 
(One intermediate revision by one other user not shown)
Line 2: Line 2:
The jq command line for parsing the census json was not obvious to me, so here are two examples to get you started. To get the id and total_size for each item on the same row, separated by spaces:
The jq command line for parsing the census json was not obvious to me, so here are two examples to get you started. To get the id and total_size for each item on the same row, separated by spaces:
  jq -r '[.id, " ", .total_size | tostring] | add'
  jq -r '[.id, " ", .total_size | tostring] | add'
To get the hash and name for each file:
To get the hash and name for each file, you have to split up the "files" array and get the info from each element:
  jq -r '.files | .[] | [.md5, " ", .name | tostring] | add'
  jq -r '.files | .[] | [.md5, " ", .name | tostring] | add'
--[[User:Sep332|Sep332]] 10:01, 12 March 2015 (EDT)
--[[User:Sep332|Sep332]] 10:01, 12 March 2015 (EDT)
== 2012 census ==
On August 2012 I did a "census" using the search engine exporting capabilities. Internet Archive had [https://archive.org/details/InternetArchive4.9MillionItemsMetadata 4.9 million items] on that date. [[User:Emijrp|Emijrp]] ([[User talk:Emijrp|talk]]) 06:42, 20 November 2016 (EST)

Latest revision as of 11:42, 20 November 2016

Tools

The jq command line for parsing the census json was not obvious to me, so here are two examples to get you started. To get the id and total_size for each item on the same row, separated by spaces:

jq -r '[.id, " ", .total_size | tostring] | add'

To get the hash and name for each file, you have to split up the "files" array and get the info from each element:

jq -r '.files | .[] | [.md5, " ", .name | tostring] | add'

--Sep332 10:01, 12 March 2015 (EDT)

2012 census

On August 2012 I did a "census" using the search engine exporting capabilities. Internet Archive had 4.9 million items on that date. Emijrp (talk) 06:42, 20 November 2016 (EST)