UC Berkeley Course Captures
UC Berkeley Course Captures | |
Status | Closing |
Archiving status | In progress... |
Archiving type | Unknown |
IRC channel | #berklost (on hackint) |
The University of California, Berkeley is planning to remove their public lecture recordings ("course captures", audio and video) and put them behind authentication. The planned date for the change is 2017-03-15.
The removal will affect at least these public channels:
- https://www.youtube.com/user/UCBerkeley
- https://itunes.apple.com/institution/uc-berkeley/id354813951
- http://webcast.berkeley.edu/series (index of links to YouTube and iTunes)
The #Shutdown notice makes it sound as if YouTube videos will remain online at youtube.com, but will no longer be publicly listed. The new hosting behind authentication will lose playlist information (which links individual lecture videos together for one course). Therefore the pressing thing to do before 2017-03-15 (as regards the YouTube content) is to download indexes of videos and playlists—see #Indexes of files.
On the other hand, "iTunesU Course Capture content will be removed." It's not clear if iTunes content will continue to exist, even behind authentication.
Ideas
Proposed YouTube archiving format:
- Sample: https://archive.org/details/TEST_UCB_ART_8_Fall2013
- One item per YouTube playlist
- Identifier includes the course number and semester (there's a list of course subject abbreviations at http://guide.berkeley.edu/courses/)
- Should perhaps also have "YouTube" somewhere in the identifier, because the iTunes content can be different for the same course.
- Upload YouTube API playlist information as as playlist.json
- Videos in the preview are YouTube's highest-quality muxed format (format 22?)
- Video file naming convention is
%(playlist_index)s-%(title)s.%(ext)s
(in youtube-dl's output template format) - All other formats stored in tar files, one file per format (maybe overkill, as these are derived anyway?)
- Include stderr output of youtube-dl, in order to have a record of videos that weren't accessible (e.g.,
ERROR: Zrzh3Fz8DhQ: YouTube said: This video contains content from BBC Worldwide, who has blocked it on copyright grounds.
)
There's an existing https://archive.org/details/ucberkeleylectures collection to which the newly archived files could perhaps be added.
Proposed iTunes archiving format: none yet.
Archiving efforts
- October 17 2016 (YouTube): https://np.reddit.com/r/DataHoarder/comments/5804np/youtube_archiver_and_uc_berkeley/
"And lastly I finished downloading all of the UC Berkeley. Videos, any transcriptions/captions and all other video info. I made a torrent as they are the most efficient at sharing. All 3.1TB of it, it's not hosted on the fastest server, but with a few seeds it should go quick enough. If you want to keep this great learning resource alive, feel free to seed or partial seed, I will seed it for as long as I can. [4] For video listings please look at this list [5]."
- March 2 2017 (YouTube): https://www.reddit.com/r/YouTubeBackups/comments/5x4kv8/ucberkeley_to_remove_10k_hours_of_lectures_posted/
"Currently pulling down to a few locations in parallel at 720p."
- March 5 2017 (YouTube): https://www.reddit.com/r/DataHoarder/comments/5x3o51/ucberkeley_to_remove_10k_hours_of_lectures_posted/dejmb1c/
Already started uploading to the Internet Archive, about 75 courses so far. Uploads are in zip format (the videos don't play in the in-browser player).
"I'm mirroring it to archive.org, 1.2TB in on
Sun Mar 5 18:04:31 GMT 2017
" -
March 2017 (YouTube/webcast.berkeley.edu):
According to #berklost IRC, "Waybackmachine is already grabbing these." Additionally, webcast.berkeley.edu has been crawled by archivebot: http://archive.fart.website/archivebot/viewer/domain/webcast.berkeley.edu
- March 9 2017 (iTunes): https://www.reddit.com/r/DataHoarder/comments/5yflnr/half_the_berkeley_webcasts_being_removed_on_march/
"VERY IMPORTANT PSA, many of the lectures are only available on iTunes, not Youtube, and have to be downloaded manually! We should get started on that ASAP."
- March 13 2017 (YouTube): https://www.reddit.com/r/DataHoarder/comments/5z2499/many_of_the_uc_berkeley_youtube_videos_still_need/
"Everything has been accounted for and is in the process of downloading, thank you to everyone who helped, especially lawpetex who backed up the full amount of all the Youtube and iTunes content!"
Archiving scripts
Scripts for downloading YouTube playlists and extracting metadata from them in an Internet Archive–compatible CSV format. The repo also includes #Indexes of files.
git clone https://repo.eecs.berkeley.edu/git-anon/users/fifield/archive-ucberkeley-webcast.git
Prerequisites:
- Python 2.7
- jq
- youtube-dl
- Wget
How to download a YouTube playlist
Get a list of playlist IDs, titles, and video counts:
jq --compact-output '[.id,.snippet.title,(.items|length)]' indexes/playlists-20170307.json | less
Choose a playlist to download. Then run:
./download-playlist.sh PLAYLIST_ID
It may fail partway through; you can keep running it again and again until it finishes. Check if there were any youtube-dl errors (such as incomplete downloads) and keep running it until there are no errors.
If you only want to download the highest-quality file-format, edit the download-playlist.sh script to use --format=best
in place of --all-formats
in the youtube-dl command. By default (without any --format
option), youtube-dl will use --format=bestvideo+bestaudio
, which could locally mux together two separate video and audio streams, resulting in a file that never actually existed on YouTube.
How to create a metadata CSV file from a YouTube playlist
The metadata.py script converts the metadata in the JSON file into CSV format. It's currently hardcoded to always set collection=test_collection
, so any uploads will not yet be permanent. You have to edit the script if you want to change that.
Think of an identifier for the item. The identifier should contain the course subject and number, and the semester. A list of course subject abbreviations is at http://guide.berkeley.edu/courses/. Then run the metadata.py script:
./metadata.py "$IDENTIFIER" "PLAYLIST_ID/playlist.json" > "PLAYLIST_ID.metadata.csv"
How to upload YouTube playlists and set metadata
We're not uploading anything yet, in order to coordinate naming conventions, etc.
How to download iTunes RSS
Most of the iTunes items have an RSS feed that allows for direct download of media files. indexes/id354813951_2-rss.txt under #Indexes of files is an index of them. To download all the RSS files and put them in a directory tree(only takes a couple of minutes), do:
while read id url; do mkdir -p "itunes/$id" && wget -c -P "itunes/$id" "$url"; done < indexes/id354813951_2-rss.txt
Then, to download one of the items, do
./download-itunes-rss.sh itunes/$ID/*.rss 2>&1 | tee -a download-itunes-rss.log
where $ID
is a numeric item identifier like 354820721.
Indexes of files
- playlists-20170307.json
- JSON list of UCBerkeley channel playlists from the YouTube API. Each line is a
playlists
resource, with the addition of an array ofplaylistItems
resources, which are the individual videos in the playlist.
- uploads-20170307.json
- JSON list of all uploads of the UCBerkeley channel from the YouTube API. The format is the same as playlists-20170307 (there is only one line because "uploads" is treated as its own playlist). playlists-20170307.json and uploads-20170307.json almost completely overlap in the videos they contain, but there are about 125 videos that are only in one or the other.
- https://gist.github.com/Wundark/5a56ee2c9e49d441646ad2a6e7a2c0c0
- List of YouTube videos, from a Reddit thread.
id354813951.tar.xz(missing a few videos)id354813951_2.tar.xz- Index of iTunes files. To download the video/audio files for a lecture, first fetch the URLs containing
downloadTrack
from course.json. This returns some XML containing a second URL (and some metadata) which points to the actual download location. All these requests need to use the iTunes user agent string ("iTunes/12.5"
works).
- itunes-minus-youtube-20170304.txt
- List of 729 iTunes downloads that don't seem to be among the YouTube playlists (by comparison of course titles). It was produced like this:
jq -j '.id,"\t",.snippet.title,"\n"' indexes/playlists-20170307.json | sort | uniq > youtube.txt tar -O -xf indexes/id354813951_2.tar.xz --wildcards -- '*/course.json' | jq -j '.storePlatformData."product-dv-product".results[]|(.id,"\t",.name,"\n")' | sort | uniq > itunes.txt ./dedup-youtube-itunes.py youtube.txt itunes.txt > indexes/itunes-minus-youtube.txt ./dedup-youtube-itunes.py youtube.txt itunes.txt
- id354813951_2-rss.txt
- RSS feed URLs for each iTunes item. These RSS feeds download from wbe-itunes.berkeley.edu rather than from iTunes directly, which is reportedly faster.
- webcast.berkeley.edu-series-20170301.html.gz
- HTML of http://webcast.berkeley.edu/series on 2017-03-01. The page is dynamically generated using JavaScript, so the HTML is taken from the inspector in a browser after the page has loaded. The page contains links to YouTube and iTunes.
Sample commands for working with JSON indexes (using jq):
jq -j '.id,"\t",.snippet.title,"\n"' indexes/playlists-20170307.json
- Extract all playlist IDs and titles. Convert an ID into a URL as: https://www.youtube.com/playlist?list=id.
jq -r '.items[].snippet.resourceId.videoId' indexes/playlists-20170307.json
- Get a list of all video IDs in playlists-20170307.json. Convert an ID into a URL as: https://www.youtube.com/watch?v=id.
jq -r '.items[].snippet.resourceId.videoId' indexes/uploads-20170307.json
- Get a list of all video IDs in uploads-20170307.json.
Unlisted/private captures
Since after Spring 2015, course captures have been by default unlisted or private behind authentication. We don't know of any central listing of all such captures. In some cases, video URLs may only be available to students who actually took the course. Here is a list of some that we know of.
- CS 61B Spring 2016
- CS 189 Spring 2016 with Jonathan Shewchuk; course home page. (From a Hacker News comment.)
- CS 160 Fall 2016
- CS 61B Spring 2017
- CS 161 Spring 17 lecture 1.
Status
YouTube playlists
These three courses are erroneously split across multiple playlists. It may be good to keep them together in one item.
playlist | downloaded | uploaded |
---|---|---|
Public Health 150E, 001 - Spring 2015 (14 videos) | ||
Public Health 150E, 001 - Spring 2015 (1 video) |
playlist | downloaded | uploaded |
---|---|---|
Computer Science 198, 032 - Spring 2015 (6 videos) | ||
Computer Science 198, 032 - Spring 2015 (1 video) | ||
Computer Science 198, 032 - Spring 2015 (1 video) | ||
Computer Science 198, 032 - Spring 2015 (1 video) |
playlist | downloaded | uploaded |
---|---|---|
Cognitive Science C103, 001 - Spring 2015 (13 videos) | ||
Cognitive Science C103, 001 - Spring 2015 (1 video) | ||
Cognitive Science C103, 001 - Spring 2015 (1 video) | ||
Cognitive Science C103, 001 - Spring 2015 (1 video) | ||
Cognitive Science C103, 001 - Spring 2015 (2 videos) | ||
Cognitive Science C103, 001 - Spring 2015 (1 video) | ||
Cognitive Science C103, 001 - Spring 2015 (1 video) | ||
Cognitive Science C103, 001 - Spring 2015 (3 videos) | ||
Cognitive Science C103, 001 - Spring 2015 (2 videos) |
There are 28 semester-long playlists whose constituent videos are almost a subset of the individual course videos found above. There are just 61 videos, most of them private, found in the semester-long playlists that are not accounted for above. Except for the exceptions listed in the table below, we don't have to download the videos of these playlists (we already have their metadata).
- Spring 2013 Courses (116 videos)
- Spring 2013 Courses, Part 3 (199 videos)
- Spring 2013 Courses, Part 2 (198 videos)
- Spring 2013 Courses, Part 1 (193 videos)
- Fall 2012 Courses (14 videos)
- Fall 2012 Courses, Part 4 (200 videos)
- Fall 2012 Courses, Part 3 (194 videos)
- Fall 2012 Courses, Part 2 (185 videos)
- Fall 2012 Courses, Part 1 (88 videos)
- Spring 2012 Courses (105 videos)
- Spring 2012 Courses, Part 3 (199 videos)
- Spring 2012 Courses, Part 2 (190 videos)
- Spring 2012 Courses Part 1 (192 videos)
- Fall 2011 Courses (88 videos)
- Fall 2011 Courses Part 4 (198 videos)
- Fall 2011 Courses Part 3 (196 videos)
- Fall 2011 Courses Part 2 (198 videos)
- Fall 2011 Courses Part 1 (191 videos)
- Spring 2011 Courses (68 videos)
- Spring 2011 Courses Part 3 (199 videos)
- Spring 2011 Courses Part 2 (199 videos)
- Spring 2011 Courses Part 1 (194 videos)
- Fall 2010 Courses (169 videos)
- Fall 2010 Courses Part 2 (199 videos)
- Fall 2010 Courses Part 1 (199 videos)
- Spring 2010 Courses (85 videos)
- Spring 2010 Courses (1) (197 videos)
- Fall 2009 Courses (25 videos)
These are the 61 videos that are found in the semester-long playlists that don't have their own individual course playlist:
YouTube videos without playlists
playlists-20170307.json and uploads-20170307.json cover almost the same set of videos, but not quite. playlists has 9,881 videos and uploads has 9,897. Their union has 9,953 videos and their intersection has 9,825. There are 56 videos that are in playlists but not uploads, and 72 videos that are in uploads but not playlists.
Here are the 72 videos that are in uploads but not playlists:
iTunes U
Because of its length (1,174 items), the table of iTunes status is in a subpage: UC Berkeley Course Captures/iTunes Status.
Two people have downloaded the entirety of the iTunes files, with the exception of https://wrya.net/services/paste/p/WHPxDB7ZNuyMkHu9k2ACLH.html, which are 404.
Some files that do not have licensing information in the YouTube/iTunes metadata actually have licensing metadata in the media files themselves. ffprobe
can show this information.
Shutdown notice
2017-03-01
http://news.berkeley.edu/2017/03/01/course-capture/[IA•Wcite•.today•MemWeb]
Cathy Koshland, UC Berkeley vice chancellor for undergraduate education, sent this message to the campus community today:
Dear Campus Community,
I wanted to share with you the decision to restrict access to our legacy Course Capture (classroom lecture) videos and podcasts, currently searchable at webcast.berkeley.edu and found on YouTube and UC Berkeley iTunesU, to members of the campus community.
As part of the campus’s ongoing effort to improve the accessibility of online content, we have determined that instead of focusing on legacy content that is 3-10 years old, much of which sees very limited use, we will work to create new public content that includes accessible features. Our public legacy libraries on YouTube and iTunesU include over 20,000 publications. This move will also partially address recent findings by the Department of Justice which suggests that the YouTube and iTunesU content meet higher accessibility standards as a condition of remaining publicly available. Finally, moving our content behind authentication allows us to better protect instructor intellectual property from “pirates” who have reused content for personal profit without consent.
Since fall 2015 we have piloted publishing all of our Course Capture content behind CAS/CalNet authentication. This strategy has enhanced our ability to accommodate students and UC Berkeley community members who have demonstrated an accessibility need, and we have concluded that authentication is an intervention that is appropriately responsive to the Berkeley community.
We will continue to evaluate the role of online Course Capture and distribution in tandem with advances in technology befitting the No. 1 public institution in the country. Berkeley will maintain its commitment to sharing content to the public through our partnership with EdX (edx.org). This free and accessible content includes a wide range of educational opportunities and topics from across higher ed.
Beginning March 15, 2017, access to iTunesU course content will be suspended. On the same day we will begin the process of moving the publicly offered YouTube content made from the current legacy channel [youtube.com/ucberkeley] to a new authentication login required channel. The entire process is expected to take three to five months. During this time the ETS team will migrate the videos into the new channel behind CalNet/CAS authentication. Berkeley users seeking to view this older content will be able to access it by logging into YouTube with their bConnected/Google-supported identity.
To help manage the instructional impact, instructors with legacy content have been contacted. Instructors utilizing the ETS Course Capture service since fall 2015 will experience no changes in viewing or accessing content.
Enrolled Berkeley students requiring accommodations will continue to receive support through the Disabled Students Program.
Finally, as we continue to strive for inclusion and effective teaching and learning for all members of the campus community, we encourage you to reference a new campus website designed to help instructors identify best practices and techniques in creating accessible course content for all users: accesscontent.berkeley.edu.
For additional information, please review this FAQ document.
2017-02-24 http://news.berkeley.edu/2017/02/24/faq-on-legacy-public-course-capture-content/[IA•Wcite•.today•MemWeb]
Here is additional information to assist the campus community and the public with upcoming changes to UC Berkeley’s library of legacy public Course Capture (classroom lecture) content from webcast.berkeley.edu, located on YouTube and UC Berkeley iTunesU.
- Who uses this content? How much of the content is used/watched?
- Course recordings are a study-tool for current students. Results from a recent review of our legacy (2006-2015) public course recordings on YouTube show that the average video is watched for less than eight minutes.
- Who are the “pirates” mentioned in the CalMessage?
- Pirates is a term used to describe websites that embed YouTube content without the permission of the original copyright holder for profit. UC Berkeley legacy Course Capture content has been discovered on for-profit websites, which use either a subscription fee or on-page advertising.
- Why now? Is this related to the DOJ letter?
- UC Berkeley stopped posting course lecture videos publicly through webcast.berkeley.edu in 2015 as a way to reduce costs and increase adoption. However, we left legacy content from 2006-2015 in place. The Department of Justice letter indicates that they believe our legacy Course Capture content from webcast.berkeley.edu and located on YouTube and iTunesU is in violation of the Americans with Disabilities Act. We are removing the legacy webcast.berkeley.edu content from public access to focus on making future public content more accessible. Instructors are encouraged to reference accesscontent.berkeley.edu for best practices and resources for making course content accessible.
- If we don’t add captions and descriptions, what happens?
- Failure to meet the expectations of the Department of Justice could mean potential legal and financial ramifications.
- What about current students who need captioning?
- ETS and the Disabled Students Program (DSP) have been partnering over the last several years to identify courses requiring captioning based on student need. The partnership and support of students working with DSP will continue.
- What will happen to the recordings?
- Beginning March 15, 2017, iTunesU Course Capture content will be removed. You may continue to use/download course capture content until that date. Other content in this location such as events, KALX and Public Affairs content will remain available after March 15. On the same day ETS will begin moving the publicly offered YouTube course capture content from the current legacy channel [youtube.com/ucberkeley] to a new authentication login-required channel. The entire process is expected to take three to five months. Berkeley users seeking to view this older content will be able to access it by logging into YouTube with their bConnected/Google supported identity. Instructors with course recordings on YouTube recorded fall 2015 or later will experience no change. Individual video URLs (links) will remain unchanged. Instructors currently using impacted recordings are encouraged to contact the Course Capture team to identify ways to mitigate any effect on their courses: coursecapture@berkeley.edu
- How long will videos be interrupted?
- The entire process to migrate the public YouTube videos from their current location to a new YouTube channel that will be accessible with campus member’s bConnected/Google supported identity will take 8-10 weeks and begin on March 15, 2017. Each video will be unavailable on bCourses for 2-3 business days. If you are a current instructor using impacted legacy recordings please contact the Course Capture team to review your needs: coursecapture@berkeley.edu
- If I have other videos that I want to get captioned or audio described, how would I do that?
- While speech-to-text tools continue to improve, effective captioning remains a very manual process. The UC System has recently introduced contracts with several vendors to provide captioning services.The vendor transcribes a recording and adds the text to the appropriate YouTube video, or a transcriber may be hired to caption an event live. At UC Berkeley, content created/captured by Berkeley Video and Berkeley AV is now being captioned. Information on audio description best practices are available at: https://webaccess.berkeley.edu/resources/tips/audio-description and https://webaccess.berkeley.edu/ask-pecan/descriptive-audio
- I’m using the impacted recordings (iTunesU or spring 2015 or earlier YouTube content) in my course now. What should I do?
- ETS is working hard to mitigate impacts to current instruction. If you already have a list of your video links, you have no additional steps to take. Video URLs will remain unchanged. If you need assistance or have additional concerns, please contact the Course Capture team to review your needs: coursecapture@berkeley.edu
- I am an instructor who is using impacted recordings (iTunesU or spring 2015 or earlier YouTube content) for something outside of UC Berkeley. What should I do?
- If you are an instructor using legacy recordings currently available to the public as an extension of your research or teaching, please contact the Course Capture team: coursecapture@berkeley.edu
- Why was the public not notified before webcast.berkeley.edu content disappeared so that we had a chance to download iTunes legacy content?
- We added notifications to our sites and provided a warning before content began to be removed. The legacy content on webcast.berkeley.edu located on YouTube and UC Berkeley’s iTunes U is three to ten years old.
- I am a Berkeley instructor who wants to use old content in my class, where can I find the URL to share with my students?
- Before videos are migrated: Instructors can copy/paste their YouTube links for future reference. Link URLs will remain unchanged. Educational Technology Services (ETS) is working to modify webcast.berkeley.edu so that videos are accessible to UC Berkeley CalNet users starting in April Instructors with immediate questions can contact the Course Capture team: coursecapture@berkeley.edu
- Can I get a copy of my old lectures from YouTube to use personally?
- Currently, ETS doesn’t have a service that provides copies of recordings to individuals.
- I am a Berkeley CalNet user, so why can’t I search for videos and playlists that I used to be able to see on webcast.berkeley.edu?
- The process that allows us to place the videos behind authentication removes playlists and content search options. ETS is working to provide campus users a new website that will function as a directory of recordings that should launch sometime in April on the existing webcast.berkeley.edu site.
- Can I still find previous events and other non-Course Capture recordings on YouTube?
- The public UC Berkeley Events Channel (youtube.com/ucberkeleyevents) will continue to be available. Many recordings at this location are already captioned and plans are in place to caption future content.