BitTorrent DHT

From Archiveteam
Jump to: navigation, search

The BitTorrent DHT (Kademlia) is a decentralized alternative to trackers for BitTorrent. However, it can also be used to discover torrents and build an index. While downloading the contents would be prohibitively expensive (and have legal issues), the metadata is valuable and only 200-300gb in size.

The following bash oneliner can be used to download all torrents that coppersurfer.tk has peers for:

mkdir torrents
wget http://coppersurfer.tk/full_scrape_not_a_tracker.tar.gz -O - | tar --to-stdout -xz | xxd -ps -c1 | tr -d "\n" | LC_ALL=C grep --only-matching -P "32303a[0-9a-f]{40}64383a636f6d706c65746569(3[0-9])+6531303a646f776e6c6f6164656469(3[0-9])+6531303a696e636f6d706c65746569(3[0-9])+6565" |grep -v "646f776e6c6f6164656469306531303a696e636f6d706c65746569306565" | cut -c 7-46 | sed 's/^/magnet:?xt=urn:btih:/g' | sed 's/$/\&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969/g' | aria2c -d ./torrents -i - --bt-metadata-only=true --bt-save-metadata=true -j 100

This can obviously be used for other trackers as well, for an incomplete list of trackers sorted by indexed torrents see [1]. Note that some trackers do not publish their scrape files, and some publish them in a non-standard format. Also see [2] for another list and links to some more lists.

DHT crawling

http://labs.boramalper.org/magnetico/
https://github.com/kevinlynx/dhtcrawler2
https://github.com/FlyersWeb/dhtbay

DHT indexers

These have large databases that should be archived, as some of the torrent metadata is probably unavailable by now.
List: https://opentrackers.org/links/publicly-tracked-torrents/#searchengines
Note that most of the Chinese indexes are run by the same person/group/organization.

Vuze DHT

There are two competing BitTorrent DHTs, the one used in Vuze/Azureus (Vuze DHT) and the one used in all the other clients (Mainline DHT/Kademlia).

There are no Vuze DHT indexing/archival projects. It should be easier, as Vuze DHT shares information more readily and has a pseudo-search engine build in the client. On the other hand, the only implementation is in Java.