|Archiving status||Not saved yet|
Several free FLV downloaders and video-to-URL converters exist on the web.
AT rescue projects usually use youtube-dl.
YouTube annotations (speech bubbles and notes) are available as XML
To transform this XML to SRT, use ann2srt
(Automatic) tubeup.py - Youtube Video IA Archiver
tubeup.py is an automated archival script that uses youtube-dl to download a Youtube video (or any other provider supported by youtube-dl), and then uploads it with all metadata to the Internet Archive.
This way, all metadata from the video, such as title, tags, categories, and description, are preserved in the corresponding Internet Archive item, without having to manually enter it.
It also creates a standardized Internet Archive item name format that makes it easy to find the video using the Youtube ID, and reduces duplication: https://archive.org/details/youtube-DWfYzulsHbU
Youtube-dl also works with many other video sites.
(Manual) Recommended way to archive Youtube videos
First, download the video/playlist/channel/user using youtube-dl:
youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio URL
This can be simplified by running the script by emijrp and others, which also handles upload.
You need a recent (2014) ffmpeg or avconv for the bestvideo+bestaudio muxing to work. On Windows, you also need to run youtube-dl with Python 3.3/3.4 instead of Python 2.7, otherwise non-ASCII filenames will fail to mux.
Also, make sure you're using the most recent version of youtube-dl. Previous versions didn't work if the highest quality video+audio was webm+m4a. New versions should automagically merge incompatible formats into a .mkv file.
Then, upload it to https://archive.org/upload/ Make sure to upload not only the video itself (.mp4 and/or .mkv files), but also the metadata files created along with it (.info.json, .jpg, .annotations.xml and .description).
kyan likes this method:
Youtube sucker (look out it leaves some incompletes in the directory afterward. Can clean up w/ rm -v ./*.mp4 ./*.webm then ls | grep \.part$ and get the video IDs out of that and redownload them and repeat etc etc). Can upload the WARCs only e.g. using ia (Python Internet Archive client) or warcdealer (automated uploader I hacked together) — or if you want, can upload the other stuff too, but that's kind of wasteful of storage space. In my opinion, getting stuff without a WARC is a great crime, given the ready availability of tools to create WARCs. Note that this method also works for other Web sites supported by youtube-dl too, although it maybe would need different cleanup commands afterward. Depends on youtube-dl and warcprox running on localhost:8000.
youtube-dl --title --continue --retries 100 --write-info-json --write-description --write-thumbnail --proxy="localhost:8000" --write-annotations --all-subs --no-check-certificate --ignore-errors -k -f bestvideo+bestaudio/best (stick the video/channel/playlist/whatever URL in here)
vxbinaca likes this method for entire channels:
Put this in ~/.config/youtube-dl.conf
-q --download-archive ~/.ytdlarchive --retries 100 --no-overwrites --call-home --continue --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --sub-format srt --convert-subs srt --write-sub --add-metadata -f bestvideo+bestaudio/best --merge-output-format 'mkv'
- Let's severely cut down on all that output and see only things that matter like errors.
- Use archiving file so we don't download the same video over and over on subsequent channel rips
- No over-writes so video/metadata aren't touched often
- Reduces traffic and lookups
- Standard-based file formats, for both subs and video (as well as embedding the subs in the video file, instead of some files being webm, some mp4, some mkv. I pick one free format that's more expansive than webm and go with that.
Optional flags, can be safely turned off:
- Some of the subtitle stuff
- Call home to aid in development of the script
Little is known about its database, but according to data from 2006, it was 45TB and doubling every 4 months. At this rate it would be 660 Petabytes (Oct 2014) by now.
According to Leo Leung's calculations based on available information, an often updated Google spreadsheet estimates that in early 2015 YouTube's content reached 500 petabytes in size.
FYI, all of Google Video was about 45TB, and the Archive Team's biggest project, MobileMe was 200TB. The Internet Archive's total capacity is 50PB as of August 2014. So let's hope YouTube stays healthy, because the Archive Team may have finally met its match.
Will be living off Google for a long time if nothing changes.