Difference between revisions of "Quizlet"

From Archiveteam
Jump to navigation Jump to search
(add logo)
(Fixed logo link, added warrior source and info, as well as IRC channel)
Line 1: Line 1:
{{Infobox project
{{Infobox project
| title = Quizlet
| title = Quizlet
| logo = Quizlet logo.png
| logo = Quizlet-logo.png
| image = Quizlet Home Page.png
| image = Quizlet Home Page.png
| URL = https://quizlet.com/
| URL = https://quizlet.com/
| project_status = {{online}}
| project_status = {{online}}
| archiving_status = {{nosavedyet}}
| archiving_status = {{upcoming}}
| source = [https://github.com/ArchiveTeam/quizlet-grab quizlet-grab]
| irc = quizletusin
}}
}}


Line 16: Line 18:


== Grabbing the Data ==
== Grabbing the Data ==
As of now, I have been unsuccessful in finding a reliable way to get everything downloaded. The [https://gist.github.com/adinbied/1c3673280fa0970297af01b03ce40227 '''initial python script'''] I wrote to incrementally grab all of the sets via the API and save them as txt files works, but is painfully slow (after a week of running it on three machines, I only got about 3 million downloaded). I have tried multithreading and multiprocessing, but have been unable to get the same amount downloaded using those methods. Maybe someone else might have some more luck.
Warrior scripts are being worked on to archive the API responses for all sets - stay tuned!

Revision as of 17:31, 8 August 2018

Quizlet
Quizlet logo
Quizlet Home Page.png
URL https://quizlet.com/
Status Online!
Archiving status Upcoming...
Archiving type Unknown
Project source quizlet-grab
IRC channel #quizletusin (on hackint)

Quizlet is a mobile and web-based study application that allows students to study information via learning tools and games. It is currently used by 1-in-2 high school students and 1-in-3 college students in the United States. Quizlet trains students via flashcards and various games and tests. As of April 30, 2018, Quizlet has over 200 million user-generated flashcard sets and more than 30 million active users. It now ranks among the top 50 websites in the U.S.

While there is not any risk of it disappearing soon, it contains a wealth of knowledge and info that AFAIK has no backup. It's always better to be prepared!

Archival

Quizlet ‘sets’ are incremental, with the earliest public set having the id ‘173’ and one of the more recent sets being above ‘300000000’. They do have an open API (see https://quizlet.com/api/2.0/docs) that returns a JSON copy of each set. An example API result can be seen here. Back of the napkin math shows that 300,000,000 public sets would take about 400 GB to store uncompressed.

Grabbing the Data

Warrior scripts are being worked on to archive the API responses for all sets - stay tuned!