Difference between revisions of "Yahoo! Groups"

From Archiveteam
Jump to: navigation, search
(Adding channel)
 
(35 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
{{Infobox project
 
{{Infobox project
 
| title = Yahoo! Groups
 
| title = Yahoo! Groups
| url = http://groups.yahoo.com/
+
| url = https://groups.yahoo.com/
 
| image = groups-yahoo-com.png
 
| image = groups-yahoo-com.png
 
| logo = yahoo-groups-logo.png
 
| logo = yahoo-groups-logo.png
| project_status = {{online}}
+
| project_status = {{closing}}
 
| archiving_status = {{inprogress}}
 
| archiving_status = {{inprogress}}
 
| irc = yahoosucks
 
| irc = yahoosucks
 
}}
 
}}
  
'''Yahoo! Groups''' is Yahoo's email service; it's the result of the acquisition of eGroups and some other Yahoo! stuff.
+
'''Yahoo! Groups''' is Yahoo's combination mailing list service/web forum; it's the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offers file uploads, photo uploads, links, polls, and an events calendar.
 +
 
 +
Uploading of new content will be disabled 28 October 2019, and all content, including message history, will be deleted 14 December 2019.<ref>https://help.yahoo.com/kb/groups/SLN31010.html</ref> (The mailing lists themselves will continue to function.)
  
 
It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)
 
It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)
  
== Python Yahoo! Group Archiver ==  
+
== Nominating Notable Non-Private Groups for Archival ==
 +
 
 +
Groups can be nominated for archival using [https://tinyurl.com/savegroups this form]. Please note that this form should not be used for groups that require administrator approval to join.
 +
 
 +
== Adding Private Groups to the Public Archive ==
 +
 
 +
Administrators / Moderators can request that their private group (we consider a private group to be one that requires approval for new members) be included in the public archive. Before you do this, please ensure that the members of the group are happy about being part of the public archive.
 +
 
 +
To add the group to the list of private groups to be archived, all you need to do is [https://help.yahoo.com/kb/SLN2567.html send a membership invite] to the email ''archiveteamprivateyahoogroup@gmail.com''. (Note that only group admins can do this). We'll be monitoring that email regularly to accept any membership requests we receive. Once that account is a member, the group should be scheduled to be part of the public archive.
 +
 
 +
Please make sure that when you invite the Archive Team account, you do '''not''' select the ''Add only to mailing list'' option, as this will prevent Archive Team from archiving the group.
 +
== Statistics ==
 +
 
 +
As of 2019-10-16 the [https://groups.yahoo.com/neo/dir directory] lists 5619351 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.
 +
 
 +
The following graphs are slightly outdated:
 +
 
 +
[[File:Yahoo_groups_date_created.png‎]]
 +
[[File:Yahoo_groups_messages_per_group.png‎]]
 +
[[File:Yahoo_groups_post_date.png‎]]
 +
 
 +
== Private groups of interest ==
 +
 
 +
{| class="wikitable"
 +
! Group
 +
! Notes
 +
! Admin consent?
 +
|-
 +
| [https://groups.yahoo.com/neo/groups/numberactivation/info numberactivation]
 +
| see all [https://reclaimthenet.org/ofcom-oftel-uk-phone-numbers-yahoo-groups/ the] [https://www.axios.com/yahoo-groups-ofcom-cell-phone-number-porting-51949f81-446e-4b4b-82eb-26790146e9a0.html press] [https://techupdatess.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-the-verge/ coverage]
 +
| Not yet contacted; [https://www.whatdotheyknow.com/request/all_data_held_in_yahoo_groups_us FOI request] made
 +
|-
 +
| [https://groups.yahoo.com/neo/groups/hpslash/info hpslash]
 +
| see [https://fanlore.org/wiki/Hpslash_%28mailing_list%29 Fanlore page]
 +
| Not yet contacted
 +
|}
 +
 
 +
Potentially relevant: [https://fanlore.org/wiki/Category:Yahoo!_Groups List of groups with Fanlore pages] (contains both private and public groups), [https://archivetransyahoo.noblogs.org/list-of-known-trans-groups/ Archive Trans Yahoo's list] (all private at last check)
 +
 
 +
== Site structure ==
 +
 
 +
There’s a convenient JSON API. Some endpoints require logged-in group membership or other permissions (depending on group settings).
 +
 
 +
===Groups===
 +
 
 +
* https://groups.yahoo.com/api/v1/search/groups (search)
 +
:- Known params: maxHits, offset, query, sortBy (values: OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST)
 +
 
 +
* https://groups.yahoo.com/api/v1/dir/categories/0/ (list of subcategories and discoverable groups under the root)
 +
:- Known params: start, intlCode (ar, au, br, ca, cf, de, e1, es, fr, hk, in, it, mx, ph, sg, uk, us...; must be supplied in lower-case, the given list is likely not exhaustive, we'd love to know about more!)
 +
:- Pagination: Page size is 10. Does ''not'' have a count param. start is the result index, not the group id. start values 500 and up all return the same set of results.
 +
: Groups in subcategories can be listed by swapping '0' for the subcategory id (the full "idList" value is not required). There is a /1/ with a small number of groups. Defaults to the English directory tree; other languages' directories can be accessed using the intlCode parameter (including at the /0/ node).
  
The [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] is a Python script which allows a scraping of the group. So far only messages are scraped. It puts all the info and metadata (both rendered message body and raw email) into a Mongo database, and provides a script to dump a static version of the site that can be read off of the filesystem. It works with Neo and with private groups by clunkily using Selenium to do the scraping.
+
* https://groups.yahoo.com/api/v1/groups/concatenative/ (specific group information)
  
Another Python-based Archiver is [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver], which is a simple Python script to dump the messages into individual JSON files. No further processing of the messages is done to preserve them in the format Yahoo uses for displaying them. Private groups can be archived by providing the contents of two cookies that Yahoo uses to verify a logged-in user.
+
===Messages===
  
== Perl Yahoo! Group Archiver ==
+
* https://groups.yahoo.com/api/v1/groups/concatenative/messages (list)
 +
:- Known params: count, start, sortOrder (ASC, DESC), direction (1, -1)
 +
:- Pagination: Page size defaults to 10, with no known limit. start is the message id, not the result index. sortOrder adjusts the order of results in the json response's array, whereas direction determines which way to iterate through ids from start (default: DESC, -1).
  
Update: Apparently since Yahoo! Groups changed to the neo interface the script no longer functions and is no longer actively maintained.
+
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/ (specific message)
 +
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw (specific message, raw content including headers)
 +
: Some messages may have encoding issues.<ref>https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean</ref> Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client.
  
<s>The [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver] is a Perl script which allows an export of "the messages (without the attachments), everything from the files section and all the images from the photo section along with their hierarchy on Yahoo".  
+
* https://groups.yahoo.com/api/v1/groups/concatenative/history (calendar summary)
 +
:- Known params: ts, tz, chrome
 +
:- Redundancy: Generatable from /messages data.
  
It appears that, if you get the "Couldn't get message count" error when trying to use it, the solution is to edit the yahoo2maildir.pl file and replace the bottom line <code>my $url = $HTTP::URI_CLASS->new($redirect, $base)->abs($base);</code> (under the heading <code>sub GetJSRedirect</code>) with <code><nowiki>my $url = "http://groups.yahoo.com/group/$group/messages/$begin_msgid"; </nowiki></code>
+
===Topics===
  
More frustratingly, it appears that Yahoo blocks your IP temporarily after hitting some invisible limit of data downloaded (the Archiver will continue to "download" messages for a bit, ending up with a bunch of 0-byte files, then stop completely). It's unknown if there is a solution.  
+
* https://groups.yahoo.com/api/v1/groups/concatenative/topics (list)
 +
:- Known params: count, startTopicId, sortOrder (ASC, DESC), direction (1, -1)
 +
:- Pagination: Page size defaults to 25, with a limit of 100. sortOrder and direction as for messages.
  
Also: sometimes, some of the downloaded messages, in the middle of an otherwise normal batch, are 0 in size - almost as if Yahoo blocked your IP for a few seconds, then stopped. Watch out for these so that you can re-download them later.</s>
+
* https://groups.yahoo.com/api/v1/groups/concatenative/topics/1 (specific topic)
 +
:- Known params: maxResults.
 +
:- Pagination: Page size defaults to 30 (messages in topic), with no known limit (maximum tested: 57). No known start param.
 +
:- Redundancy: Generatable from /messages data.
 +
: "messages" field is an array, each element of which seems to have the same contents as the corresponding /message/<id>/ (non-raw) endpoint; metadata ("totalMsgInTopic", "prevTopicId", "nextTopicId") could be reconstructed. Not known whether a message can fail to be associated with any topic.
  
== Site Structure ==
+
===Attachments===
  
There’s a convenient JSON API. May require logging in and joining a group to use all endpoints:
+
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments (list)
 +
:- Known params: count, start, sort (TITLE, TIME), order (ASC, DESC)
 +
:- Pagination: Page size defaults to 20, with no known limit (maximum tested: 93).
  
* Group Information: https://groups.yahoo.com/api/v1/groups/concatenative/
+
* https://groups.yahoo.com/api/v1/groups/<groupname>/attachments/<attachmentId> (specific attachment)
* List of Messages: https://groups.yahoo.com/api/v1/groups/concatenative/messages?count=100
+
Attachment may be of several types: photo, file, ...?
* Specific Message: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/
 
* Raw Message Content: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw – note that there seems to be a [https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean message encoding problem]
 
* List of Topics: https://groups.yahoo.com/api/v1/groups/concatenative/topics?count=100
 
* Specific Topic: https://groups.yahoo.com/api/v1/groups/concatenative/topics/1
 
* List of Tables: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database
 
* Specific Table: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/
 
* Table Content: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records
 
* List of Files: https://groups.yahoo.com/api/v1/groups/a_furrys_world/files
 
* List of Attachments: https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments
 
* List of Polls: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls?count=100
 
* Specific Poll: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106
 
* List of Photos: https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos
 
* List of Albums: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums
 
* Specific Album: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391
 
* List Moderators: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/moderators
 
* Members With Incorrect Emails: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/bouncing
 
* List of Links: https://groups.yahoo.com/api/v1/groups/a_furrys_world/links
 
* Search: https://groups.yahoo.com/api/v1/search/groups?offset=0&maxHits=20&sortBy=&query=abcdef – sort can be one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST
 
* Categories: https://groups.yahoo.com/api/v1/dir/categories/0/?start=0
 
  
Note that all paginated responses are limited to the first 500 results and do not return anything new beyond that.
+
===Files===
  
== Statistics ==
+
* https://groups.yahoo.com/api/v2/groups/a_furrys_world/files (list)
 +
:- Known params: sfpath (pass in a pathURI to retrieve the file listings of this subdirectory)
 +
:- Pagination: None.
 +
: Entries with "type" 0 are files; 1, directories.
 +
 
 +
===Photos===
 +
 
 +
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/photos (list of photos)
 +
:- Known params: count, start, orderBy (MTIME), sortOrder (ASC, DESC), ownedByMe (TRUE, FALSE), lastFetchTime, photoFilter (ALL, PHOTOS_WITH_EXIF "Originals", PHOTOS_WITHOUT_EXIF "Shared")
 +
:- Pagination: Page size defaults to 20, with no known limit.
 +
: "totalPhotos" field in response gives total in group.
 +
 
 +
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/albums (list of albums)
 +
:- Known params: count, start, albumType (PHOTOMATIC, NORMAL), orderBy (MTIME, TITLE), sortOrder (ASC, DESC)
 +
:- Pagination: Page size defaults to 12, with no known limit.
 +
: albumType defaults to NORMAL. PHOTOMATIC albumType requires the "READ" permission for "ATTACHMENTS". "total" field in response gives total number of albums of the selected type in group; however, this seems to have an off-by-one error for the NORMAL type of albums.
 +
 
 +
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/albums/1841906391 (specific album)
 +
:- Observed parameters similar to photos and albums endpoints, with additional ordinal sortOrder option
 +
: Photomatic albums ''must'' be loaded with the albumType parameter set to PHOTOMATIC.
 +
 
 +
===Links===
 +
 
 +
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/links (list)
 +
:- Known params: linkdir
 +
:- Pagination: None.
 +
: linkdir takes the folder parameter from a dir. Nested folders should be joined with '/'. You need to keep track of the path to a given folder yourself (eg, linkdir + '/' + folder).
 +
 
 +
===Polls===
 +
 
 +
* https://groups.yahoo.com/api/v1/groups/relationship-poll/polls (list)
 +
:- Known params: count, start
 +
:- Pagination: Page size defaults to 10, with no known limit. There is no "total" field in the response.
 +
 
 +
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106 (specific poll)
 +
: Polls return all votes cast, non-anonymised, including identifying metadata for all viewers.
 +
 
 +
===Databases===
 +
 
 +
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database (list of tables)
 +
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/ (specific table)
 +
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records (table contents)
 +
:- Pagination: None.
 +
 
 +
* https://groups.yahoo.com/neo/groups/groupname/database/1/records/export (export target)
 +
:- Known params: format (CSV, TSV)
 +
 
 +
===Members===
 +
 
 +
* https://groups.yahoo.com/api/v1/groups/iswipe/members/confirmed (list of confirmed members)
 +
:- Known params: count, start, sortBy, sortOrder, ts, tz, chrome.
 +
:- Pagination: Page size defaults to 10, with a limit of 100. No known limit on total results.
 +
: May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
 +
* https://groups.yahoo.com/api/v1/groups/iswipe/members/moderators (list of moderators)
 +
* https://groups.yahoo.com/api/v1/groups/iswipe/members/bouncing (list of bouncing members)
 +
* https://groups.yahoo.com/api/v1/groups/iswipe/members/suspended (list of suspended members)
 +
: Very often (always?) blocked for normal members.
 +
* https://groups.yahoo.com/api/v1/groups/iswipe/members/banned (list of banned members)
 +
: Very often (always?) blocked for normal members.
 +
 
 +
===Events===
 +
 
 +
Overlaps with Yahoo Calendar API, check yahoo-group-archiver code.
 +
 
 +
== Python Yahoo! Group archivers ==
 +
 
 +
* [https://github.com/IgnoredAmbience/yahoo-group-archiver/network/members yahoo-group-archiver] scrapes a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. <s>Relevant forks include [https://github.com/Frankkkkk/yahoo-group-archiver Frankkkkk] and [https://github.com/nsapa/yahoo-group-archiver nsapa]. Needs merging. Various branches have support (largely untested) for file attachments, photos, links, folders, and events.</s> Most stuff has been merged back into IgnoredAmbience's master. (Exceptions: full WARC support?, mtime work from Frankkkkk.) Needs consistent/WARC-appropriate handling for random 500 errors (require retries), attachment 404s (appear permanent), and 502 permissions errors (definitely permanent, currently halt script).
  
As of 2017-07-16 the [https://groups.yahoo.com/neo/dir directory] lists 5599562 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.
+
* [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver] is similar, but scrapes only messages (not files or any other data). It is not currently under active development.
  
The following graphs are slightly outdated:
+
* [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] scrapes a group using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.
  
[[File:Yahoo_groups_date_created.png‎]]
+
== Other archivers ==
[[File:Yahoo_groups_messages_per_group.png‎]]
 
[[File:Yahoo_groups_post_date.png‎]]
 
  
== Software for backups ==
+
* [https://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver]: Perl, defunct.
* [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver], Sourceforge
+
* [http://www.personalgroupware.com/ PGOffline]: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Includes attachments. Stores data in a SQLite database internally.
 +
* [http://yahoogroupedia.pbworks.com/w/page/93006447/Chrome%20Application%20To%20Download%20Messages Yahoo Messages Export]: Chrome extension. Messages only. Saves as mbox.
  
 
== External Links ==
 
== External Links ==
  
* https://archive.org/details/yahoo_groups
+
* https://archive.org/details/archiveteam_yahoogroups
 +
 
 +
== Coverage ==
 +
 
 +
* https://www.usatoday.com/story/tech/talkingtech/2019/10/17/yahoo-groups-online-forum-shutdown/4007150002/
  
 
== References ==
 
== References ==

Latest revision as of 02:03, 11 November 2019

Yahoo! Groups
Yahoo! Groups logo
Groups-yahoo-com.png
URL https://groups.yahoo.com/
Project status Closing
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #yahoosucks
Project lead Unknown

Yahoo! Groups is Yahoo's combination mailing list service/web forum; it's the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offers file uploads, photo uploads, links, polls, and an events calendar.

Uploading of new content will be disabled 28 October 2019, and all content, including message history, will be deleted 14 December 2019.[1] (The mailing lists themselves will continue to function.)

It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say that.)

Nominating Notable Non-Private Groups for Archival

Groups can be nominated for archival using this form. Please note that this form should not be used for groups that require administrator approval to join.

Adding Private Groups to the Public Archive

Administrators / Moderators can request that their private group (we consider a private group to be one that requires approval for new members) be included in the public archive. Before you do this, please ensure that the members of the group are happy about being part of the public archive.

To add the group to the list of private groups to be archived, all you need to do is send a membership invite to the email archiveteamprivateyahoogroup@gmail.com. (Note that only group admins can do this). We'll be monitoring that email regularly to accept any membership requests we receive. Once that account is a member, the group should be scheduled to be part of the public archive.

Please make sure that when you invite the Archive Team account, you do not select the Add only to mailing list option, as this will prevent Archive Team from archiving the group.

Statistics

As of 2019-10-16 the directory lists 5619351 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.

The following graphs are slightly outdated:

Yahoo groups date created.png Yahoo groups messages per group.png Yahoo groups post date.png

Private groups of interest

Group Notes Admin consent?
numberactivation see all the press coverage Not yet contacted; FOI request made
hpslash see Fanlore page Not yet contacted

Potentially relevant: List of groups with Fanlore pages (contains both private and public groups), Archive Trans Yahoo's list (all private at last check)

Site structure

There’s a convenient JSON API. Some endpoints require logged-in group membership or other permissions (depending on group settings).

Groups

- Known params: maxHits, offset, query, sortBy (values: OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST)
- Known params: start, intlCode (ar, au, br, ca, cf, de, e1, es, fr, hk, in, it, mx, ph, sg, uk, us...; must be supplied in lower-case, the given list is likely not exhaustive, we'd love to know about more!)
- Pagination: Page size is 10. Does not have a count param. start is the result index, not the group id. start values 500 and up all return the same set of results.
Groups in subcategories can be listed by swapping '0' for the subcategory id (the full "idList" value is not required). There is a /1/ with a small number of groups. Defaults to the English directory tree; other languages' directories can be accessed using the intlCode parameter (including at the /0/ node).

Messages

- Known params: count, start, sortOrder (ASC, DESC), direction (1, -1)
- Pagination: Page size defaults to 10, with no known limit. start is the message id, not the result index. sortOrder adjusts the order of results in the json response's array, whereas direction determines which way to iterate through ids from start (default: DESC, -1).
Some messages may have encoding issues.[2] Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client.
- Known params: ts, tz, chrome
- Redundancy: Generatable from /messages data.

Topics

- Known params: count, startTopicId, sortOrder (ASC, DESC), direction (1, -1)
- Pagination: Page size defaults to 25, with a limit of 100. sortOrder and direction as for messages.
- Known params: maxResults.
- Pagination: Page size defaults to 30 (messages in topic), with no known limit (maximum tested: 57). No known start param.
- Redundancy: Generatable from /messages data.
"messages" field is an array, each element of which seems to have the same contents as the corresponding /message/<id>/ (non-raw) endpoint; metadata ("totalMsgInTopic", "prevTopicId", "nextTopicId") could be reconstructed. Not known whether a message can fail to be associated with any topic.

Attachments

- Known params: count, start, sort (TITLE, TIME), order (ASC, DESC)
- Pagination: Page size defaults to 20, with no known limit (maximum tested: 93).

Attachment may be of several types: photo, file, ...?

Files

- Known params: sfpath (pass in a pathURI to retrieve the file listings of this subdirectory)
- Pagination: None.
Entries with "type" 0 are files; 1, directories.

Photos

- Known params: count, start, orderBy (MTIME), sortOrder (ASC, DESC), ownedByMe (TRUE, FALSE), lastFetchTime, photoFilter (ALL, PHOTOS_WITH_EXIF "Originals", PHOTOS_WITHOUT_EXIF "Shared")
- Pagination: Page size defaults to 20, with no known limit.
"totalPhotos" field in response gives total in group.
- Known params: count, start, albumType (PHOTOMATIC, NORMAL), orderBy (MTIME, TITLE), sortOrder (ASC, DESC)
- Pagination: Page size defaults to 12, with no known limit.
albumType defaults to NORMAL. PHOTOMATIC albumType requires the "READ" permission for "ATTACHMENTS". "total" field in response gives total number of albums of the selected type in group; however, this seems to have an off-by-one error for the NORMAL type of albums.
- Observed parameters similar to photos and albums endpoints, with additional ordinal sortOrder option
Photomatic albums must be loaded with the albumType parameter set to PHOTOMATIC.

Links

- Known params: linkdir
- Pagination: None.
linkdir takes the folder parameter from a dir. Nested folders should be joined with '/'. You need to keep track of the path to a given folder yourself (eg, linkdir + '/' + folder).

Polls

- Known params: count, start
- Pagination: Page size defaults to 10, with no known limit. There is no "total" field in the response.
Polls return all votes cast, non-anonymised, including identifying metadata for all viewers.

Databases

- Pagination: None.
- Known params: format (CSV, TSV)

Members

- Known params: count, start, sortBy, sortOrder, ts, tz, chrome.
- Pagination: Page size defaults to 10, with a limit of 100. No known limit on total results.
May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
Very often (always?) blocked for normal members.
Very often (always?) blocked for normal members.

Events

Overlaps with Yahoo Calendar API, check yahoo-group-archiver code.

Python Yahoo! Group archivers

  • yahoo-group-archiver scrapes a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. Relevant forks include Frankkkkk and nsapa. Needs merging. Various branches have support (largely untested) for file attachments, photos, links, folders, and events. Most stuff has been merged back into IgnoredAmbience's master. (Exceptions: full WARC support?, mtime work from Frankkkkk.) Needs consistent/WARC-appropriate handling for random 500 errors (require retries), attachment 404s (appear permanent), and 502 permissions errors (definitely permanent, currently halt script).
  • YahooGroups-Archiver is similar, but scrapes only messages (not files or any other data). It is not currently under active development.
  • yahoo-groups-backup scrapes a group using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.

Other archivers

  • Yahoo Group Archiver: Perl, defunct.
  • PGOffline: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Includes attachments. Stores data in a SQLite database internally.
  • Yahoo Messages Export: Chrome extension. Messages only. Saves as mbox.

External Links

Coverage

References


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · forums.starwars.com · HeavenGames · Invisionfree · NeoGAF · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · Steam · SteamDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger

Music/Audio

AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Oddshot.tv · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ