Difference between revisions of "YouTube"

From Archiveteam
Jump to navigation Jump to search
(→‎Vital signs: Explaining YouTube comments technically.)
m (→‎Annotations: Using official URL template and adding tooltip.)
Line 95: Line 95:


=== [[#Annotations_removal|Annotations]] ===
=== [[#Annotations_removal|Annotations]] ===
On November 27th, 2018, YouTube updated its [https://support.google.com/youtube/answer/7342737 help page]([http://archive.fo/6saFC archive.fo]) to include that all annotations (which had been disabled for new videos and replaced with "cards" early May 2017, but old annotations remained visible) will be removed from videos hosted on the platform on 15 January 2019.
On November 27th, 2018, YouTube updated its <span title="Contents:&#x0A;“Update: We will stop showing existing annotations to viewers starting January 15, 2019. All existing annotations will be removed.”">{{url|https://support.google.com/youtube/answer/7342737|help page}}</span> to include that all annotations (which had been disabled for new videos and replaced with "cards" early May 2017, but old annotations remained visible) will be removed from videos hosted on the platform on 15 January 2019.


=== Channel Comments ===
=== Channel Comments ===

Revision as of 11:16, 27 April 2019

YouTube
YouTube logo
YouTube2018.png
URL https://www.youtube.com[IAWcite.todayMemWeb]
Status Online! but possibly Endangered, see Vital signs
Archiving status Not saved yet
Archiving type Unknown
IRC channel #youtubearchive (on hackint)
YouTube Annotations
YouTube logo
Youtube-annotations-example-fukkireta.jpg
URL https://www.youtube.com[IAWcite.todayMemWeb]
Status Offline
Archiving status Partially saved
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)
Project lead /u/omarroth

YouTube is a video sharing website currently owned by Google. YouTube is currently the most popular video hosting website on Earth.

Archiving tools

Several free FLV downloaders and video-to-URL converters exist on the web. AT rescue projects usually use youtube-dl.
YouTube annotations (speech bubbles and notes) are available as XML

http://www.youtube.com/api/reviews/y/read2?feat=TCS&video_id=

To transform this XML to SRT, use ann2srt

(Automatic) tubeup.py - Youtube Video IA Archiver

Note: When uploading to the Internet Archive, please avoid exposing the site to legal risk by adhering to their terms of service for blatantly copyrighted content. Unfortunately, they are subject to similar threats of DMCA takedowns as YouTube, so do use discretion.
Note: Be very careful dumping channels over 100 videos with this script. Let an admin know what you're doing, dump 50 videos, and have a collection created. Work is being started on adding a flag to specify a collection name instead of "Community Video" which is what it defaults to. Always try to create an item. For the time being the script will have to be hand edited to specify a different collection.

tubeup.py is an automated archival script that uses youtube-dl to download a Youtube video (or any other provider supported by youtube-dl), and then uploads it with all metadata to the Internet Archive.

This way, all metadata from the video, such as title, tags, categories, and description, are preserved in the corresponding Internet Archive item, without having to manually enter it.

It also creates a standardized Internet Archive item name format that makes it easy to find the video using the Youtube ID, and reduces duplication: https://archive.org/details/youtube-v9sGhNoSG3o

Youtube-dl also works with many other video sites.

(Manual) Recommended way to archive YouTube videos

First, download the video/playlist/channel/user using youtube-dl:

youtube-dl --continue --retries 4 --write-info-json --write-description --write-thumbnail --write-annotations --all-subs --ignore-errors -f bestvideo+bestaudio URL

This can be simplified by running the script by emijrp and others, which also handles upload.

You need a recent (2014) ffmpeg or avconv for the bestvideo+bestaudio muxing to work. On Windows, you also need to run youtube-dl with Python 3.3/3.4 instead of Python 2.7, otherwise non-ASCII filenames will fail to mux.

Also, make sure you're using the most recent version of youtube-dl. Previous versions didn't work if the highest quality video+audio was webm+m4a. New versions should automagically merge incompatible formats into a .mkv file.[1]

Then, upload it to https://archive.org/upload/ Make sure to upload not only the video itself (.mp4 and/or .mkv files), but also the metadata files created along with it (.info.json, .jpg, .annotations.xml and .description).

kyan likes this method:

Youtube sucker (look out it leaves some incompletes in the directory afterward. Can clean up w/ rm -v ./*.mp4 ./*.webm then ls | grep \.part$ and get the video IDs out of that and redownload them and repeat etc etc). Can upload the WARCs only e.g. using ia (Python Internet Archive client) or warcdealer (automated uploader I hacked together) — or if you want, can upload the other stuff too, but that's kind of wasteful of storage space. In my opinion, getting stuff without a WARC is a great crime, given the ready availability of tools to create WARCs. Note that this method also works for other Web sites supported by youtube-dl too, although it maybe would need different cleanup commands afterward. Depends on youtube-dl and warcprox running on localhost:8000.

youtube-dl --continue --retries 100 --write-info-json --write-description --write-thumbnail --proxy="localhost:8000" --write-annotations --all-subs --no-check-certificate --ignore-errors -k -f bestvideo+bestaudio/best (stick the video/channel/playlist/whatever URL in here)

Annotations removal

Annotations[2] were notes that could be added to videos after the upload, they provided plenty customization options such as color, size and internal & external links. People used them to correct mistakes in videos, created mini-games within YouTube and abused them too: "Please like and subscribe", "Watch in HD!!!"

On May 2nd, 2017 YouTube disabled editing and creating annotations for videos. On January 16th, 2019 they were removed completely from the site and API responses (~15:00 UTC, same time when they disabled editing).

An archiving project "YouTube Annotation Archive" organized by u/omarroth that achieved to discover ~1.4 billion videos and download the annotations, if they had any. The project did not completely finish due to the deadline. The annotation archive will be released at a later point (Update).

16GB of just video IDs that were encompassed by the project can be downloaded here.

Site reconnaissance

Little is known about its database, but according to data from 2006, it was 45TB and doubling every 4 months. At this rate it would be 660 Petabytes (Oct 2014) by now.

According to Leo Leung's calculations based on available information, an often updated Google spreadsheet estimates that in early 2015 YouTube's content reached 500 petabytes in size.

FYI, all of Google Video was about 45TB, and the Archive Team's previously biggest project, MobileMe was 200TB. The Internet Archive's total capacity is 50PB as of August 2014. So let's hope YouTube stays healthy, because the Archive Team may have finally met its match.

Vital signs

Will be living off Google for a long time if nothing changes.

Advertising policies

Around early 2017, numerous content creators have expressed concerns about recent changes with YouTube's advertising policies, and many have also noticed sharp drops in ad revenue as a result, with some creators like Casey Neistat and h3h3Productions expressing existential fears. While not necessarily a cause for imminent alarm, the situation should be watched closely in the event that a positive feedback loop was to begin with a creator exodus.

Annotations

On November 27th, 2018, YouTube updated its help page[IAWcite.todayMemWeb] to include that all annotations (which had been disabled for new videos and replaced with "cards" early May 2017, but old annotations remained visible) will be removed from videos hosted on the platform on 15 January 2019.

Channel Comments

In addition, YouTube decided that their new “Community Post” feature can not co-exist with the previous “Discussion” feature, earlier known as “Channel Comments”. Therefore, all channels that reach a certain subscriber threshold (formerly 10000, last known threshold: 1500), all channel comments will be permanently erased instead of being merged or co-existing.

Video Statistics

Without a clear prior warning, YouTube has also entirely removed the public “video statistics” feature. The only warning sign was that their new “Polymer” website layout did not feature the statistics feature, only their “One” layout they introduced in 2013 and revised in late 2014, then barely changed for years, co-existing with “Polymer” since late 2017.
When YouTube released their “One” website layout, the number of parameters displayed by the video statistics was also reduced.[3]

Parameters (legacy)[4]

  • Total view count + graph.
  • Total comments count + graph.
  • Total favourited count + graph (removed YouTube feature).
  • Total rating counts (no graph)
  • Like and dislike count (no graph)
  • Referral sources with date of initial referral (exact dates available as of 20071130).
    • View counts from each referral source.
      • First featured video view
      • First referral from related videos (separately mentioned)
      • First view from embededd video
      • First view from embededd video (specified website)
      • First referral from YouTube search (for separate search terms)
  • Most popular audiences (age and gender)
  • World map that highlights countries in which the video is more popular.
    • Countries with higher popularity are highlighted in darker green.
  • Honour badges
    • Total honour counts
    • List of honours
      • Rank: Most viewed of all time
      • This list is incomplete yet.

Parameters (recent)

  • Views
  • Total Watch time
    • Average watch time per user.
  • Subscribers gained from said video
  • Share counter: How often the video was shared.

Each of these parameters could be viewed as total cumulative count or counts per day, of which the latter was the default setting.


Custom channel layouts

Due to the lack of customizability of the “One Channel Layout”, as of March 2013, all custom creative channel designs have been deleted.

Comment loading

Since YouTube's “One Channel Layout” redesign, comments “extraload” using AJAX, which means that they do not get loaded within the page itself, but only start loading after scrolling down towards the comments. This made the comment section inaccessible to the Wayback Machine. However, there was a page called youtube.com/all_comments?v=<video ID>. But that one was discontinued and started redirecting to the main /watch?v= page since January of 2016. However, archive.today still was able to archive YouTube comments until late 2017. As of April 2019, archiving YouTube comments using http://archive.today/ is still possible by linking directly to a comment using YouTube's lc URL parameter.


Chromebot, operated via #ArchiveBot (on hackint) IRC, can still be used to archive YouTube comments thanks to it's bottomless page scrolling capabilities.

Comments on YouTube can be sorted by Top Comments or by Newest Comments. The uploader of a video can specify which way of sorting is used for the video by default, but it can be adjusted manually by the user. The default preset is Top Comments. However, no known URL parameter is able to select the sorting methods for the comments so far. Therefore, crawlers that can access the comments can only crawl them in the preselected way of sorting.

Trivia:

  • The Top Comments are not necessarily the comments with the highest number of upvotes, probably because YouTube does not want always the same comments to stay on top for too long. Older comments get pushed down despite having a high rating.
  • At some point, YouTube started hiding negative comment ratings and only shows how often a comment has been rated positively. Rating a comment negatively is still possible however. It's effect is pushing comments further down from the Top Comments.

YouTu.be

YouTu.be used to be an image hoster back in 2006. In late 2015, they created a robots.txt file that disallows all crawlers (or “user-agent: * disallow:/”).

Removed or blocked channels


Trivia

  • All ytimg servers (where YouTube saved images, stylesheets and the .swf file of the flash-based YouTube video player) used to be intoxicated by robots.txt (“user-agent:* disallow:/ noindex:/”). When browsing YouTube layouts starting circa 2008 using the Wayback Machine, the website could only be viewed as black text on white background, just with different sizes (e.g. <h1> video title), due to the missing images and stylesheet information. Only information visible in the HTML source code of the page itself could be rendered. On February 29th of 2012, the robots.txt file vanished off the ytimg servers, lifting the restrictions and making YouTube more properly browse-able through the Wayback Machine.

References

See also

External links

v · t · e         YouTube

GLAM · Governments · Local TV News · VHS