Difference between revisions of "Reddit"

From Archiveteam
Jump to: navigation, search
(Vital signs)
m (Add project tracker and source code)
 
(44 intermediate revisions by 12 users not shown)
Line 1: Line 1:
 
{{Infobox project
 
{{Infobox project
 
| title = reddit
 
| title = reddit
| image = Reddit logo.png
+
| logo = Reddit logo.png
| image = Reddit home page 2013-03-26.png
+
| image = Reddit home page - 2019-12-14.png
| description = reddit home page as seen on March 26, 2013
+
| description = reddit home page as seen on December 14, 2019
| URL = http://www.reddit.com/
+
| URL = https://www.reddit.com/<br />https://old.reddit.com/<br />https://i.reddit.com/
| project_status = {{endangered}}
+
| project_status = {{online}}
| archiving_status = {{upcoming}}
+
| archiving_status = {{Partiallysaved}}
| irc = deaddit
+
| source = [https://github.com/ArchiveTeam/reddit-grab reddit-grab]
 +
| tracker = [https://tracker.archiveteam.org/reddit/ reddit]
 +
| irc = shreddit
 +
| irc_network = hackint
 +
| source = [https://github.com/ArchiveTeam/reddit-grab reddit-grab]
 
}}
 
}}
  
'''reddit''' is a content aggregator and social bookmarking service similar to the likes of Digg. Users can submit links, submit text posts, vote and comment on submissions in communities called "subreddits". It received considerable attention from its twelve hour SOPA blackout early in January of 2012.
+
'''Reddit''' is a content aggregator and social bookmarking service similar to the likes of Digg. Users can submit links, text posts, images and videos, vote and comment on submissions in communities called "subreddits". It received considerable attention from its twelve-hour SOPA blackout early in January 2012.
 +
 
 +
Reddit "quarantines" some controversial subreddits. Many of such quarantine subreddits have been deleted, and to date no quarantined subreddit has ever emerged unscathed, so it is important to make backups of them. [https://www.reddit.com/r/thequarantinelist/ Here is a list of quarantined reddits.]
 +
 
 +
It contains some subredits devoted to similar goals as [https://www.reddit.com/r/ArchiveTeam ArchiveTeam], including [https://www.reddit.com/r/AbandonedWebsites /r/AbandonedWebsites], [https://www.reddit.com/r/ForgottenWebsites /r/ForgottenWebsites], & [https://www.reddit.com/r/DataHoarder /r/DataHoarder], which are worth checking for material to be added to [[ArchiveBot]] or otherwise benefit from the attention of the team.
  
 
== Vital signs ==
 
== Vital signs ==
  
* <s>Appears stable, though a small to medium size team is a concern.
+
* <s>Appears stable, though a small to medium size team is a concern.</s>
* '''Update (6/10/15)''': the admins carried out bannings of several subreddits claiming they were harassing people, the most notable of which was /r/fatpeoplehate. This has instilled some fear, uncertainty, and doubt in some part of the userbase, with a few claiming that reddit will soon become what Digg is now: nearly dead.</s>
+
* 2015-10-06: The admins carried out bannings of several subreddits claiming they were harassing people, the most notable of which was /r/fatpeoplehate. This has instilled some fear, uncertainty, and doubt in some part of the userbase, with a few claiming that reddit will soon become what Digg is now: nearly dead.
* '''<s>Extremely endangered</s> - many subreddits were picketing after the firing of a reddit employee named Victoria by turning themselves private or restricting submissions.'''
+
* ''<s>Extremely endangered</s> - many subreddits were picketing after the firing of a reddit employee named Victoria by turning themselves private or restricting submissions.''
 
* ''''Caution'''' - Reddit seems to have calmed down and returned to normal functionality after Ellen Pao's firing, and the Reddit team is making serious reforms (reducing shadowbanning, more mod tools). However, the revolt left unresolved issues and sour grapes within the community, and it seems Reddit was only saved by the lack of a practical alternative (Voat.co was crushed and went offline due to floods of refugees). '''It would be wise to preemptively archive the site''' before another crisis occurs.
 
* ''''Caution'''' - Reddit seems to have calmed down and returned to normal functionality after Ellen Pao's firing, and the Reddit team is making serious reforms (reducing shadowbanning, more mod tools). However, the revolt left unresolved issues and sour grapes within the community, and it seems Reddit was only saved by the lack of a practical alternative (Voat.co was crushed and went offline due to floods of refugees). '''It would be wise to preemptively archive the site''' before another crisis occurs.
 +
* On July 3rd, 2015, Jason Baumgartner '''completed his 14-month effort to archive Reddit's entire publicly available textual content''', just in time before the onset of the Reddit revolt. The archive is still updated monthly. '''[http://files.pushshift.io/reddit/ The files are available here.]''' However, images and videos hosted by Reddit are not archived.
 +
* In 2017-2018, Reddit has carried out bannings of several subreddits including r/incels and r/maleforeveralone, which had tens of thousands of subscribers each. Other subreddits including r/Braincels, r/foreveralone, r/TheRedPill and r/MGTOW are endangered. Discussions and [https://www.thepetitionsite.com/takeaction/308/200/042/?TAP=1007&cid=causes_petition_postinfo petitions] about banning those subreddits are currently taking place.[https://babe.net/2018/03/07/incel-40474][https://www.reddit.com/r/IncelTears/comments/83irsc/why_isnt_rbraincels_banned_yet/]
 +
* In 2018, a new, redesigned website became the default version of Reddit. This redesigned version has numerous usability issues. It heavily relies on JS and is essentially uncrawlable without dedicated code. The pre-redesign version of Reddit continues to be available at [https://old.reddit.com/ old.reddit.com].
 +
* In March 2019, /r/watchpeopledie, /r/Gore, and some other subs were banned after the Christchurch shooting – this was clearly not due to the video recording of that shooting getting shared (that was forbidden on WPD at least) but due to the negative press coverage, just like for previous bans.
 +
* Also in March 2019, /r/Piracy got threatened by Reddit's legal team with a ban due to the mods allegedly doing too little against copyright infringement.<ref>{{URL|https://old.reddit.com/r/Piracy/comments/b28d9q/rpiracy_has_received_a_notice_of_multiple/}}</ref>
 +
* Reddit has quarantined manosphere subreddits including /r/Braincels, /r/TheRedPill. /r/Braincels was banned on October 30, 2019.
 +
* Users began to spot in December 2019 that comment threads, at least on the "new" version of the site, were being locked behind a registration wall in an apparent A/B test.<ref>{{URL|https://news.ycombinator.com/item?id=21780092}}</ref><ref>{{URL|https://old.reddit.com/r/mobileweb/comments/e7yivg/join_reddit_to_keep_reading_an_account_is_now/?sort=top}}</ref>
 +
 +
== Textual Archive (Without Images or Videos) ==
 +
 +
On July 3rd, 2015, Jason Baumgartner completed his 14-month effort to archive Reddit's entire publicly available textual content, just in time before the onset of the Reddit revolt. The archive is still being updated monthly. '''[http://files.pushshift.io/reddit/ The files are available here.]'''
 +
 +
* Does not include images and videos hosted by Reddit
 +
* Reddit JSON API output. Posts are archived incrementally in real-time.
 +
* Some comments not accessible due to private subreddits or comment deletion or other API issues
 +
* [https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/ Reddit /r/datasets - I have every publicly available Reddit comment for research. ~ 1.7 billion comments @ 250 GB compressed. Any interest in this?]
 +
* [https://www.reddit.com/r/bigquery/comments/3cej2b/17_billion_reddit_comments_loaded_on_bigquery/ Google BigQuery Analysis of Reddit]
 +
 +
The scripts used to generate this API dump were not made public, but it likely used PRAW, and it would probably be better to rewrite from scratch.
 +
 +
Also, this only preserves textual submissions and comments. All images and videos hosted on Reddit are not archived. All sidebar, wiki, and live thread data are not retrieved, so these should be scraped in an expansion pack.
 +
 +
===API===
 +
Jason Baumgartner also provides an API for accessing Reddit's textual archive available [https://github.com/pushshift/api here]. The archive is updated in real-time. This API does not have the limitations of Reddit's API. For example, it does not impose limits on the number of submissions or comments that are retrieved.
 +
 +
To search for submissions of a subreddit (500 limit):
 +
 +
https://api.pushshift.io/reddit/search/submission/?subreddit=Archiveteam&size=500
  
== Dealing with Private Subreddits ==
+
To retrieve ''all'' comments for a submission (with tens of thousands of comments):
  
In response to the firing of Victoria, the subreddit /r/IAMA set their community to private, making all it's posts from the entire history of Reddit totally inaccessible. Almost every large subreddit has followed suit.
+
https://api.pushshift.io/reddit/submission/comment_ids/6uey5x
  
It is important to understand that archiving these communities is not a lost cause as of yet. We can still access the data in three ways:
+
Note that posts are archived in real-time after they are created. Newer versions of edited posts are not archived. One may have to re-fetch the content on Reddit's site to get the latest revision of an edited post.
  
# Ask the mods of each subreddit to give an Archive Team account access to their Private subreddit. That way, the subreddit can still stay offline, but we can still grab the threads. The disadvantage is that we have to contact every mod and hope that they cooperate, but I'm sure that most mod teams are willing to at least preserve the legacy of their community, if not the current incarnation.
+
Also, one may also have to fetch the images and videos as they are not archived by the API.
# The Reddit Admins may have full backups of Reddit. Because of their stated ideals, they are more likely than Digg was to release the data in event of their closure. This is not something we should count on, however.
 
# Google Cache and Internet Archive. The last resort is to grab whatever is left of the threads from archival services. The deep disadvantage of this that each reddit thread has more replies than at first glance, which are lost from this method.
 
  
 
== Data liberation ==
 
== Data liberation ==
  
Currently (as of March 26, 2013), users can only see up to 1,000 posts and comments on a profile page. However, it was stated by admin "spladug" [http://www.reddit.com/r/ideasfortheadmins/comments/10tai6/ever_wondered_the_data_liberation_policy_of_reddit/c6gicdf that older comments and posts are still in the database]. spladug also states that the team is in favor for retrieving dumps of a user's data, but that the task would be taxing on the servers. Since this comment was posted, there appears to have been no progress on a dump system. Archiving would be nearly impossible using the old-fashioned way (without wget) if things do wind up FUBAR in the future because of this limitation.
+
As of March 26, 2013, users can only see up to 1,000 posts and comments on a profile page. However, it was stated by admin "spladug" [http://www.reddit.com/r/ideasfortheadmins/comments/10tai6/ever_wondered_the_data_liberation_policy_of_reddit/c6gicdf that older comments and posts are still in the database]. spladug also states that the team is in favor for retrieving dumps of a user's data, but that the task would be taxing on the servers. <s>Since this comment was posted, there appears to have been no progress on a dump system.</s> Archiving would be nearly impossible using the old-fashioned way (without wget) if things do wind up FUBAR in the future because of this limitation.
 +
 
 +
Instead, any archival methods should scrape from the Reddit API (which would have to run over several months). The API provides all nested comments that are not noticed by HTML. In addition, it significantly reduces server load.
  
No further progress appears to have been made since then as of June 2015.
+
Because of EU GDPR, progress was forcibly made to be compliant and the site [https://www.reddit.com/settings/data-request now has a request form]. Users can specify that they want a copy of all of their data, or data from specific date ranges. The site says requests may take up to 30 days to be processed.
  
== External Links ==
+
== Gallery ==
 +
 
 +
<gallery>
 +
File:Reddit home page 2013-03-26.png|reddit home page as seen on March 26, 2013, using the "old" version of the site still available today when logged in or through old.reddit.com
 +
</gallery>
 +
 
 +
== Lists ==
 +
 
 +
* [[List of Reddit subs by country and territory]]
 +
* [[List of Reddit subs by language]]
 +
 
 +
== Potentially endangered subreddits ==
 +
 
 +
* https://old.reddit.com/r/WatchRedditDie/ | Anti-Reddit
 +
* https://old.reddit.com/r/opendirectories/ | Piracy
 +
* https://old.reddit.com/r/DeadorVegetable/ | Gore and death
 +
* https://old.reddit.com/r/FiftyFifty/ | Gore and death
 +
* https://old.reddit.com/r/Piracy/ | Piracy
 +
 
 +
== References ==
 +
 
 +
<references/>
 +
 
 +
<!--
 +
 
 +
== Dealing with Private Subreddits ==
 +
 
 +
In response to the firing of Victoria, the subreddit /r/IAMA set their community to private, making all it's posts from the entire history of Reddit totally inaccessible. Almost every large subreddit followed suit. While they are back to normal today, it is possible that this will become a common protest measure in the future.
 +
 
 +
In case a future crisis occurs where we need access to a private subreddit, we will have to request three ways:
 +
 
 +
# Ask the mods of each subreddit to give an Archive Team account access to their Private subreddit. That way, the subreddit can still stay offline, but we can still grab the threads. The disadvantage is that we have to contact every mod and hope that they cooperate, but I'm sure that most mod teams are willing to at least preserve the legacy of their community, if not the current incarnation.
 +
# The Reddit Admins may have full backups of Reddit. Because of their stated ideals, they are more likely than Digg was to release the data in event of their closure. This is not something we should count on, however.
 +
# Google Cache and Internet Archive. The last resort is to grab whatever is left of the threads from archival services. The deep disadvantage of this that each reddit thread has more replies than at first glance, which are lost from this method.
  
* {{url|1=http://www.reddit.com|2=reddit}}
+
-->
  
 
{{Navigation box}}
 
{{Navigation box}}

Latest revision as of 08:28, 26 July 2020

reddit
Reddit logo
reddit home page as seen on December 14, 2019
reddit home page as seen on December 14, 2019
URL https://www.reddit.com/
https://old.reddit.com/
https://i.reddit.com/
Project status Online!
Archiving status Partially saved
Project source reddit-grab
Project tracker reddit
IRC channel #shreddit (on hackint)
Project lead Unknown

Reddit is a content aggregator and social bookmarking service similar to the likes of Digg. Users can submit links, text posts, images and videos, vote and comment on submissions in communities called "subreddits". It received considerable attention from its twelve-hour SOPA blackout early in January 2012.

Reddit "quarantines" some controversial subreddits. Many of such quarantine subreddits have been deleted, and to date no quarantined subreddit has ever emerged unscathed, so it is important to make backups of them. Here is a list of quarantined reddits.

It contains some subredits devoted to similar goals as ArchiveTeam, including /r/AbandonedWebsites, /r/ForgottenWebsites, & /r/DataHoarder, which are worth checking for material to be added to ArchiveBot or otherwise benefit from the attention of the team.

Vital signs

  • Appears stable, though a small to medium size team is a concern.
  • 2015-10-06: The admins carried out bannings of several subreddits claiming they were harassing people, the most notable of which was /r/fatpeoplehate. This has instilled some fear, uncertainty, and doubt in some part of the userbase, with a few claiming that reddit will soon become what Digg is now: nearly dead.
  • Extremely endangered - many subreddits were picketing after the firing of a reddit employee named Victoria by turning themselves private or restricting submissions.
  • 'Caution' - Reddit seems to have calmed down and returned to normal functionality after Ellen Pao's firing, and the Reddit team is making serious reforms (reducing shadowbanning, more mod tools). However, the revolt left unresolved issues and sour grapes within the community, and it seems Reddit was only saved by the lack of a practical alternative (Voat.co was crushed and went offline due to floods of refugees). It would be wise to preemptively archive the site before another crisis occurs.
  • On July 3rd, 2015, Jason Baumgartner completed his 14-month effort to archive Reddit's entire publicly available textual content, just in time before the onset of the Reddit revolt. The archive is still updated monthly. The files are available here. However, images and videos hosted by Reddit are not archived.
  • In 2017-2018, Reddit has carried out bannings of several subreddits including r/incels and r/maleforeveralone, which had tens of thousands of subscribers each. Other subreddits including r/Braincels, r/foreveralone, r/TheRedPill and r/MGTOW are endangered. Discussions and petitions about banning those subreddits are currently taking place.[1][2]
  • In 2018, a new, redesigned website became the default version of Reddit. This redesigned version has numerous usability issues. It heavily relies on JS and is essentially uncrawlable without dedicated code. The pre-redesign version of Reddit continues to be available at old.reddit.com.
  • In March 2019, /r/watchpeopledie, /r/Gore, and some other subs were banned after the Christchurch shooting – this was clearly not due to the video recording of that shooting getting shared (that was forbidden on WPD at least) but due to the negative press coverage, just like for previous bans.
  • Also in March 2019, /r/Piracy got threatened by Reddit's legal team with a ban due to the mods allegedly doing too little against copyright infringement.[1]
  • Reddit has quarantined manosphere subreddits including /r/Braincels, /r/TheRedPill. /r/Braincels was banned on October 30, 2019.
  • Users began to spot in December 2019 that comment threads, at least on the "new" version of the site, were being locked behind a registration wall in an apparent A/B test.[2][3]

Textual Archive (Without Images or Videos)

On July 3rd, 2015, Jason Baumgartner completed his 14-month effort to archive Reddit's entire publicly available textual content, just in time before the onset of the Reddit revolt. The archive is still being updated monthly. The files are available here.

The scripts used to generate this API dump were not made public, but it likely used PRAW, and it would probably be better to rewrite from scratch.

Also, this only preserves textual submissions and comments. All images and videos hosted on Reddit are not archived. All sidebar, wiki, and live thread data are not retrieved, so these should be scraped in an expansion pack.

API

Jason Baumgartner also provides an API for accessing Reddit's textual archive available here. The archive is updated in real-time. This API does not have the limitations of Reddit's API. For example, it does not impose limits on the number of submissions or comments that are retrieved.

To search for submissions of a subreddit (500 limit):

https://api.pushshift.io/reddit/search/submission/?subreddit=Archiveteam&size=500

To retrieve all comments for a submission (with tens of thousands of comments):

https://api.pushshift.io/reddit/submission/comment_ids/6uey5x

Note that posts are archived in real-time after they are created. Newer versions of edited posts are not archived. One may have to re-fetch the content on Reddit's site to get the latest revision of an edited post.

Also, one may also have to fetch the images and videos as they are not archived by the API.

Data liberation

As of March 26, 2013, users can only see up to 1,000 posts and comments on a profile page. However, it was stated by admin "spladug" that older comments and posts are still in the database. spladug also states that the team is in favor for retrieving dumps of a user's data, but that the task would be taxing on the servers. Since this comment was posted, there appears to have been no progress on a dump system. Archiving would be nearly impossible using the old-fashioned way (without wget) if things do wind up FUBAR in the future because of this limitation.

Instead, any archival methods should scrape from the Reddit API (which would have to run over several months). The API provides all nested comments that are not noticed by HTML. In addition, it significantly reduces server load.

Because of EU GDPR, progress was forcibly made to be compliant and the site now has a request form. Users can specify that they want a copy of all of their data, or data from specific date ranges. The site says requests may take up to 30 days to be processed.

Gallery

Lists

Potentially endangered subreddits

References



v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · forums.starwars.com · HeavenGames · Invisionfree · NeoGAF · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger

Music/Audio

AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Oddshot.tv · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ