Difference between revisions of "YouTube"

From Archiveteam
Jump to navigation Jump to search
(specify ext in youtube-dl command, re: https://github.com/rg3/youtube-dl/issues/5298)
(use merge-output-format instead of asking for mp4s)
Line 25: Line 25:
First, download the video/playlist/channel/user using youtube-dl:
First, download the video/playlist/channel/user using youtube-dl:


<tt>youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo[ext=mp4]+bestaudio[ext=m4a] URL</tt>
<tt>youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors --merge-output-format mkv -f bestvideo+bestaudio URL</tt>


This can be simplified by running the [https://github.com/matthazinski/youtube2internetarchive script by emijrp and others], which also handles upload.
This can be simplified by running the [https://github.com/matthazinski/youtube2internetarchive script by emijrp and others], which also handles upload.


You need a recent (2014) ffmpeg or avconv for the <tt>bestvideo+bestaudio</tt> muxing to work.  On Windows, you also need to run youtube-dl with Python 3.3/3.4 instead of Python 2.7, otherwise non-ASCII filenames will fail to mux.
You need a recent (2014) ffmpeg or avconv for the <tt>bestvideo+bestaudio</tt> muxing to work.  On Windows, you also need to run youtube-dl with Python 3.3/3.4 instead of Python 2.7, otherwise non-ASCII filenames will fail to mux. The <tt>--merge-output-format mkv</tt> flag is used for cases where the highest quality video/audio is webm/m4a.<ref>https://github.com/rg3/youtube-dl/issues/5298</ref>


Then, upload it to https://archive.org/upload/ Make sure to upload not only the video itself (.mp4 files), but also the metadata files created along with it (.info.json, .jpg, .mp4.annotations.xml and .mp4.description).
Then, upload it to https://archive.org/upload/ Make sure to upload not only the video itself (.mp4 files), but also the metadata files created along with it (.info.json, .jpg, .mp4.annotations.xml and .mp4.description).
Line 44: Line 44:


Will be living off Google for a long time if nothing changes.
Will be living off Google for a long time if nothing changes.
== References ==
<references />


== See also ==
== See also ==

Revision as of 01:47, 11 April 2015

YouTube
YouTube logo
YouTube - Broadcast Yourself. 1303512848647.png
URL http://youtube.com[IAWcite.todayMemWeb]
Status Online!
Archiving status Not saved yet
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

YouTube is a video sharing website currently owned by Google. YouTube is currently the most popular video hosting website on the planet.

Archiving tools

Several free FLV downloaders and video-to-URL converters exist on the web. AT rescue projects usually use youtube-dl.
YouTube annotations (speech bubbles and notes) are available as XML

http://www.youtube.com/api/reviews/y/read2?feat=TCS&video_id=

To transform this XML to SRT, use ann2srt

Recomended way to archive Youtube videos

First, download the video/playlist/channel/user using youtube-dl:

youtube-dl --title --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors --merge-output-format mkv -f bestvideo+bestaudio URL

This can be simplified by running the script by emijrp and others, which also handles upload.

You need a recent (2014) ffmpeg or avconv for the bestvideo+bestaudio muxing to work. On Windows, you also need to run youtube-dl with Python 3.3/3.4 instead of Python 2.7, otherwise non-ASCII filenames will fail to mux. The --merge-output-format mkv flag is used for cases where the highest quality video/audio is webm/m4a.[1]

Then, upload it to https://archive.org/upload/ Make sure to upload not only the video itself (.mp4 files), but also the metadata files created along with it (.info.json, .jpg, .mp4.annotations.xml and .mp4.description).

Site reconnaissance

Little is known about its database, but according to data from 2006, it was 45TB and doubling every 4 months. At this rate it would be 660 Petabytes (Oct 2014) by now.

According to Leo Leung's calculations based on available information, an often updated Google spreadsheet estimates that in early 2015 YouTube's content reached 500 petabytes in size.

FYI, all of Google Video was about 45TB, and the Archive Team's biggest project, MobileMe was 200TB. The Internet Archive's total capacity is 50PB as of August 2014. So let's hope YouTube stays healthy, because the Archive Team may have finally met its match.

Vital signs

Will be living off Google for a long time if nothing changes.

References

See also

External links