Nominations for IABAK Collections to Save
As the project is taking off, we've started to "wing it" with regards to what to save, and to get ahead of that, this will be a wikified list of potential collections to use for future shards. Please link to the collection, and describe why it might be of use.
Getting ahead of one of the issues, the reasons a collection might NOT be added yet is because:
- Too many items in it, causing it to spray against dozens of shards
- Too massive (for now), causing a shard to be a huge amount and we're stuck forever
- The collection is actually a mirror of another collection elsewhere.
Some Potential Shard Additions
- https://archive.org/details/bibliothequesaintegenevieve - rare incunabula -- 327k files, too many for 1 shard
- https://archive.org/details/archiveteam_ancestry - family history -- this is 650 files (3 TB)
- https://archive.org/details/archiveteam-fortunecity - not backed up via torrents like the GeoCities grab, not as huge as some of the other AT projects (2.7 TB)
Accepted nominations, in progress
- https://archive.org/details/archiveteam-fire - many great archives of websites -- 313k files, too many for 1 shard. created SHARD12, which contains all the items from 2011 through 2015, approximately 30% of the total files.
- https://archive.org/details/archivebot - WARCs saved by the ArchiveBot. This is a small number of very large files; looks like it's going to be split into ~30 shards.
Accepted nominations, in shards now
- https://archive.org/details/Bali - entire literature of Bali
- https://archive.org/details/jcbmexicoincunables - rare incunabula
- https://archive.org/details/cdbbsarchive - historical software
- https://archive.org/details/prelingerhomemovies - more prelinger
- https://archive.org/details/prelinger_library - more prelinger
- https://archive.org/details/starr - rare old Asian books
- https://archive.org/details/archiveteam-googlegroups - about 1TB of webpages and files from Google Group mailing lists
- https://archive.org/details/googlegroups-part2 - related to archiveteam-googlegroups, ~200-300GB