Difference between revisions of "INTERNETARCHIVE.BAK/torrents implementation"

From Archiveteam
Jump to: navigation, search
Line 21: Line 21:
  
 
The user needs to keep their torrent client running, or they won't be counted as a seed. Offline or rarely online storage can be used, but won't be counted. So counting seeds will undercounf the number of copies.
 
The user needs to keep their torrent client running, or they won't be counted as a seed. Offline or rarely online storage can be used, but won't be counted. So counting seeds will undercounf the number of copies.
 +
 +
'''Someone needs to seed all these torrents in the first place for users to download. Who?''' The IA can't double their storage to store all those zip files.
 +
 +
.zip files don't recover well if some peice in the middle is missing. It would be better to use a file format that can allow extracting the available files when part of it is missing.
  
 
== a simplification ==
 
== a simplification ==
Line 27: Line 31:
  
 
This has the additional advantages of storing the backed-up files on disk in a format which is readily usable by the user, and of requiring little to no additional work on IA's part.
 
This has the additional advantages of storing the backed-up files on disk in a format which is readily usable by the user, and of requiring little to no additional work on IA's part.
 +
 +
> Except, there are millions of IA items.. Even with a custom controller, that presents problems such as: scalability when loading half a million torrents in a torrent client; tracker scalability; analizing so many torrents to find ones that need more seeders assigned, etc  --closure

Revision as of 04:35, 5 March 2015

Create 42000 chunks of 500 GB of the IA, each a zip file.

Make 42000 torrents.

Make an interface to suggest a torrent, at random (or the one most needing seeds), to a user.

Let users add one or more torrents, and seed.

Every 500 GB added/changed in the Internet Archive, make a new zip file, and torrent, and wait for some users to add that one. (Maybe needs a mechanism to ensure that users who have free space remember to check for new torrents.)

This seems like the simplest possible solution.

comments

Note that some bittorrent trackers have torrents that sum to a larger total size than this, seeded healthily. Their torrents tend to be smaller than 500 gb though.

The Geocities torrent, at 900 gb, was an exceedingly large torrent, and there was some trouble keeping it seeded.

At 500 GB, this leaves out users who have some smaller fraction of a disk available to donate. This might reduce contributors significantly. A smaller chunk size might be better.

The user needs to keep their torrent client running, or they won't be counted as a seed. Offline or rarely online storage can be used, but won't be counted. So counting seeds will undercounf the number of copies.

Someone needs to seed all these torrents in the first place for users to download. Who? The IA can't double their storage to store all those zip files.

.zip files don't recover well if some peice in the middle is missing. It would be better to use a file format that can allow extracting the available files when part of it is missing.

a simplification

Every IA item already has a torrent associated with it. The torrent includes the derived files, but that can be amended (each one could have the current torrent plus one that includes only original files.) The simplest possible solution then is to get a few seeders into each of these swarms (IA is used as a web seed). One way to accomplish that is to write a custom BitTorrent client which automates the process of deciding which swarms each user joins, allows the user to decide how much space to use, etc. A custom BitTorrent client wouldn't be a very simple thing on it's own, but it could be quite simple for users who just want to donate some space without having to think about BitTorrent.

This has the additional advantages of storing the backed-up files on disk in a format which is readily usable by the user, and of requiring little to no additional work on IA's part.

> Except, there are millions of IA items.. Even with a custom controller, that presents problems such as: scalability when loading half a million torrents in a torrent client; tracker scalability; analizing so many torrents to find ones that need more seeders assigned, etc --closure