Difference between revisions of "Yahoo! Groups"

From Archiveteam
Jump to navigation Jump to search
(added link to https://github.com/davidferguson/yahoogroups-joiner)
(46 intermediate revisions by 8 users not shown)
Line 1: Line 1:
{{Infobox project
{{Infobox project
| title = Yahoo! Groups
| title = Yahoo! Groups
| url = http://groups.yahoo.com/
| url = https://groups.yahoo.com/
| image = groups-yahoo-com.png
| image = groups-yahoo-com.png
| logo = yahoo-groups-logo.png
| logo = yahoo-groups-logo.png
| project_status = {{closing}}
| project_status = {{closing}}
| archiving_status = {{inprogress}}
| archiving_status = {{inprogress}}
| tracker = [https://tracker.archiveteam.org/yahoogroups/ yahoogroups], [http://tracker-test.ddns.net/yahoo-groups-api/ yahoo-groups-api]
| source = [https://github.com/ArchiveTeam/yahoogroups-grab yahoogroups-grab], [https://github.com/ArchiveTeam/yahoo-group-archiver, yahoo-group-archiver]
| irc = yahoosucks
| irc = yahoosucks
}}
}}
Line 11: Line 13:
'''Yahoo! Groups''' is Yahoo's combination mailing list service/web forum; it's the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offers file uploads, photo uploads, links, polls, and an events calendar.
'''Yahoo! Groups''' is Yahoo's combination mailing list service/web forum; it's the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offers file uploads, photo uploads, links, polls, and an events calendar.


Uploading of new content will be disabled 28 October 2019, and all content, including message history, will be deleted 14 December 2019.<ref>https://help.yahoo.com/kb/groups/SLN31010.html</ref> (The mailing lists themselves will continue to function.)
Uploading of new content was disabled 28 October 2019, and all content, including message history, will be made unavailable on 14 December 2019.<ref>https://help.yahoo.com/kb/groups/SLN31010.html</ref> (The mailing lists themselves will continue to function.) After negative media attention, Yahoo said that they were extending the deadline for users to use their official incomplete data export tool to 31 January 2020<ref>{{URL|https://www.theverge.com/2019/12/10/21004883/yahoo-groups-extend-deadline-download-data-date-time}}</ref><ref>{{URL|https://twitter.com/YahooCare/status/1204312076379926528}}</ref>.


Public groups can be nominated for archival using [https://tinyurl.com/savegroups this form].
Yahoo's data export tool misses a plethora of attachments, databases, polls, photos, and metadata.


It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)
It had been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)
 
'''2019-12-21 Update:'''
* Yahoo has hidden or deleted files/photos/messages from the web interface.
 
* Please continue joining groups with your accounts. Try to limit the number of groups you join per account to hopefully minimize GetMyData (GMD) processing time and limit potential damage due to account bans. Yahoo Groups Joiner extension for Chrome is available at https://github.com/davidferguson/yahoogroups-joiner . Additional info is at https://df58.host.cs.st-andrews.ac.uk/yahoogroups/leaderboard
 
* Please submit your Yahoo GetMyData (GMD) requests as soon as possible. Go to [https://groups.yahoo.com/neo/getmydata this page] and follow the directions to make a request.
It may take 10 days to process a request. When the request comes through, save all the files that come back. Recent GMDs have been split into 2 GB ZIP files.
 
* Be aware that there may be old malware buried inside the ZIP files. Modern email software and operating systems are expected to be mostly resistant to this old malware. **Some antivirus software may see the malware and may modify or delete the ZIP file. We don't want that to happen, so please make sure that they don't get deleted or modified in any way, except to remove private groups.**
 
* After you've saved 2+ copies of all your GMD files, consider making a new GMD request. This will give you a second chance to collect group content in case a glitch happened on the first GMD attempt, in case Yahoo changes what content they are including/not-including in the GMDs, or in case you have joined additional groups since the first request.
 
* We are looking to collect public and publicly-shareable group data for submission to Internet Archive. We would appreciate it if you make a GMD request, to save the data, and upload the data to an online file-hosting service, and then send a link to archiveteamprivateyahoogroup@gmail.com
 
* Instructions on Viewing mbox Files recovered from GMDs in Sylpheed (thanks Doranwen): https://docs.google.com/document/d/1dXeXfY5Huri_8NTUn4hl-iUZq9MMRL1Qbo7bp5YZpmE/edit Sylpheed can be found here:  https://sylpheed.sraoss.jp/en/download.html Mozilla Thunderbird can also be used to view mbox files.
Source: https://yahoo-geddon.tumblr.com/post/189779155159/sylpheed-tutorial
 
* Content Upload Instructions for both #yahoosucks and the #pythons-attack-y! efforts: See https://codeshare.io/5QJbBm
 
== Communication Centers ==
 
* ArchiveTeam - EFNet #yahoosucks and #pythons-attack-y!. Also try #archiveteam-bs if there is no answer in those channels
* Yahoo Groups Fandom Rescue Project - Yahoo-Gedden Discord Channel: https://discord.gg/DyCNddf
* Yahoo Groups Fandom Rescue Project - Media inquiries: archiver1.fandom@gmail.com
* Mods and Members - https://modsandmembersblog.wordpress.com/, https://mmsanctuary.groups.io/g/main, and https://twitter.com/featheredleader
* https://twitter.com/textfiles/status/1184461099237814273
* https://twitter.com/textfiles/status/1203857144346546176
* https://twitter.com/hashtag/yahoosucks?f=live&vertical=default
 
== Nominating Notable Non-Private Groups for Archival ==
 
Groups can be nominated for archival using [https://tinyurl.com/savegroups this form]. Please note that this form should not be used for groups that require administrator approval to join.
 
== Adding Private Groups to the Public Archive ==
 
Administrators / Moderators can request that their private group (we consider a private group to be one that requires approval for new members) be included in the public archive. Before you do this, please ensure that the members of the group are happy about being part of the public archive.
 
To add the group to the list of private groups to be archived, all you need to do is [https://help.yahoo.com/kb/SLN2567.html send a membership invite] to the email ''archiveteamprivateyahoogroup@gmail.com''. (Note that only group admins can do this). We'll be monitoring that email regularly to accept any membership requests we receive. Once that account is a member, the group should be scheduled to be part of the public archive.
 
Please make sure that when you invite the Archive Team account, you do '''not''' select the ''Add only to mailing list'' option, as this will prevent Archive Team from archiving the group.


== Statistics ==
== Statistics ==
Line 35: Line 78:
|-
|-
| [https://groups.yahoo.com/neo/groups/numberactivation/info numberactivation]
| [https://groups.yahoo.com/neo/groups/numberactivation/info numberactivation]
| see [https://trendingpress.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-which-is-shutting-down/ all] [https://reclaimthenet.org/ofcom-oftel-uk-phone-numbers-yahoo-groups/ the] [https://www.axios.com/yahoo-groups-ofcom-cell-phone-number-porting-51949f81-446e-4b4b-82eb-26790146e9a0.html press] [https://techupdatess.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-the-verge/ coverage]
| see all [https://reclaimthenet.org/ofcom-oftel-uk-phone-numbers-yahoo-groups/ the] [https://www.axios.com/yahoo-groups-ofcom-cell-phone-number-porting-51949f81-446e-4b4b-82eb-26790146e9a0.html press] [https://techupdatess.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-the-verge/ coverage]
| Not yet contacted; [https://www.whatdotheyknow.com/request/all_data_held_in_yahoo_groups_us FOI request] made
| Not yet contacted; [https://www.whatdotheyknow.com/request/all_data_held_in_yahoo_groups_us FOI request] made
|-
|-
Line 43: Line 86:
|}
|}


Potentially relevant: [https://fanlore.org/wiki/Category:Yahoo!_Groups List of groups with Fanlore pages] (contains both private and public groups), [https://archivetransyahoo.noblogs.org/list-of-known-trans-groups/ Archive Trans Yahoo's list] (all private at last check)
Potentially relevant: [https://fanlore.org/wiki/Category:Yahoo!_Groups List of groups with Fanlore pages] (contains both private and public groups), [https://archivetransyahoo.noblogs.org/list-of-known-trans-groups/ Archive Trans Yahoo's list] (all private at last check), [https://yahoogroups.southasianamerican.org/ Archive South Asian American Yahoo Groups] (all public), and [https://queerdigital.com/ygpresproject Queer Digital History Project] (no groups listed, presumably all private).


== Site structure ==
== Site structure ==
Line 50: Line 93:


===Groups===
===Groups===
* https://groups.yahoo.com/api/v1/search/groups (search)
* https://groups.yahoo.com/api/v1/search/groups (search)
:- Known params: maxHits, offset, query, sortBy (one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST).
:- Known params: maxHits, offset, query, sortBy (values: OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST)


* https://groups.yahoo.com/api/v1/dir/categories/0/ (list of subcategories and discoverable groups under the root)
* https://groups.yahoo.com/api/v1/dir/categories/0/ (list of subcategories and discoverable groups under the root)
:- Known params: start (result index, not group id).
:- Known params: start, intlCode (au, in, sg, uk, us; ar, e1, es, mx; br; cf, fr; de; hk; it...)
:- Pagination: Limited to 10. Does ''not'' have a count param. May be limited to 500 total results regardless of start param.
:- Pagination: Page size is 10. Does ''not'' have a count param. start is the result index, not the group id. start values 500 and up all return the same set of results.
: Groups in subcategories can be listed by swapping '0' for the subcategory id (the full idList is not required). There is a /1/ with a small number of groups.
: Groups are listed in fixed but arbitrary order. /0/ is a special value that shows the root node; subcategories can be accessed by using the subcategory id instead (the full "idList" value is not required).
: Defaults to the US view of the English directory tree. Different languages have different directory trees. Supplying a different intlCode parameter (list not exhaustive, must be lower case) accesses the corresponding view of the appropriate language's tree. Subcategory ids are language-specific and must be used with an appropriate intlCode. The intlCode -> language mapping may be checked at the /0/ endpoint; the root "name" is always "ROOT", but "id" is language-specific.<ref>This id can also be accessed with an appropriate intlCode, but contains the same twelve groups for all languages: the groups in the categories for musical artists "Roots, The" and "Rusted Root", three groups which appear to be Yahoo tests, and one group which appears to be a spam test.</ref> Different intlCode views of the same language list groups in a different order, may have slightly different category names, and appear to have slightly different numbers of categories in the full tree; their group overlap is about 99%.
: The "count" field appears totally inaccurate.


* https://groups.yahoo.com/api/v1/groups/concatenative/ (specific group information)
* https://groups.yahoo.com/api/v1/groups/concatenative/ (specific group information)
Line 63: Line 109:


* https://groups.yahoo.com/api/v1/groups/concatenative/messages (list)
* https://groups.yahoo.com/api/v1/groups/concatenative/messages (list)
:- Known params: count, start (message id, not result index).
:- Known params: count, start, sortOrder (ASC, DESC), direction (1, -1)
:- Pagination: Limited to 10 by default. No known limit on count or total results.
:- Pagination: Page size defaults to 10, with no known limit. start is the message id, not the result index. sortOrder adjusts the order of results in the json response's array, whereas direction determines which way to iterate through ids from start (default: DESC, -1).


* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/ (specific message)
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/ (specific message)
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw (specific message, raw content including headers)
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw (specific message, raw content including headers)
: Some messages may have encoding issues.<ref>https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean</ref> Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client.
:- Original email is largely recoverable from ''rawEmail' field.
:- Some messages may have encoding issues.<ref>https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean</ref> Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client. Remove non-ASCII characters and ^M characters from the 7-bit text should result in valid RFC822 emails.
:- All message headers and textual body parts have email addresses redacted, with the hosts replaced with "...". For example, "From: ceo@ford.com" and "From: ceo@toyota.com" both get turned into "From: ceo@..."
:- Some emails longer than 64kb (minus attachments) may be truncated. This truncation affects not just plain text, but also HTML and encoded Base64 content. To address this, delete the string "\n(Message over 64 KB, truncated)" from the end of the message part, so HTML/Base64/etc. parsers are somewhat less likely to break.
:- All attachments are separated, with attachment bodies replaced with the string "[ Attachment content not displayed ]". Recovering the emails involves finding those MIME parts, looking at the filenames, comparing with the list of filenames listed in the "attachmentInfo" section, matching on similarity, and replacing the contents with the downloaded attachments. In very rare cases where a matching MIME section isn't found, it may be necessary to append those attachments as new MIME attachments to the email while reconstructing.
 
* https://groups.yahoo.com/api/v1/groups/concatenative/history (calendar summary)
:- Known params: ts, tz, chrome
:- Redundancy: Generatable from /messages data.


===Topics===
===Topics===
* https://groups.yahoo.com/api/v1/groups/concatenative/topics (list)
* https://groups.yahoo.com/api/v1/groups/concatenative/topics (list)
:- Known params: count.
:- Known params: count, startTopicId, sortOrder (ASC, DESC), direction (1, -1)
:- Pagination: Page size defaults to 25, with a limit of 100. sortOrder and direction as for messages.


* https://groups.yahoo.com/api/v1/groups/concatenative/topics/1 (specific topic)
* https://groups.yahoo.com/api/v1/groups/concatenative/topics/1 (specific topic)
:- Known params: maxResults.
:- Pagination: Page size defaults to 30 (messages in topic), with no known limit (maximum tested: 57). No known start param.
:- Redundancy: Generatable from /messages data.
: "messages" field is an array, each element of which seems to have the same contents as the corresponding /message/<id>/ (non-raw) endpoint; metadata ("totalMsgInTopic", "prevTopicId", "nextTopicId") could be reconstructed. Not known whether a message can fail to be associated with any topic.


===Attachments===
===Attachments===
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments (list)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments (list)
:- Known params: count, start.
:- Known params: count, start, sort (TITLE, TIME), order (ASC, DESC)
:- Pagination: Limited to 30 by default.
:- Pagination: Page size defaults to 20, with no known limit (maximum tested: 93).
 
* https://groups.yahoo.com/api/v1/groups/<groupname>/attachments/<attachmentId> (specific attachment)
Attachment may be of several types: photo, file, ...?


===Files===
===Files===
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/files (list)
 
: What do we know about folders and folder contents (for files, photos, links, and possibly attachments)?
* https://groups.yahoo.com/api/v2/groups/a_furrys_world/files (list)
:- Known params: sfpath (pass in a pathURI to retrieve the file listings of this subdirectory)
:- Pagination: None.
: Entries with "type" 0 are files; 1, directories.


===Photos===
===Photos===
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos (list of photos)
 
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums (list of albums)
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/photos (list of photos)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391 (specific album)
:- Known params: count, start, orderBy (MTIME), sortOrder (ASC, DESC), ownedByMe (TRUE, FALSE), lastFetchTime, photoFilter (ALL, PHOTOS_WITH_EXIF "Originals", PHOTOS_WITHOUT_EXIF "Shared")
:- Pagination: Page size defaults to 20, with no known limit.
: "totalPhotos" field in response gives total in group.
 
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/albums (list of albums)
:- Known params: count, start, albumType (PHOTOMATIC, NORMAL), orderBy (MTIME, TITLE), sortOrder (ASC, DESC)
:- Pagination: Page size defaults to 12, with no known limit.
: albumType defaults to NORMAL. PHOTOMATIC albumType requires the "READ" permission for "ATTACHMENTS". "total" field in response gives total number of albums of the selected type in group; however, this seems to have an off-by-one error for the NORMAL type of albums.
 
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/albums/1841906391 (specific album)
:- Observed parameters similar to photos and albums endpoints, with additional ordinal sortOrder option
: Photomatic albums ''must'' be loaded with the albumType parameter set to PHOTOMATIC.


===Links===
===Links===
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/links (list)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/links (list)
:- Known params: linkdir
:- Pagination: None.
: linkdir takes the folder parameter from a dir. Nested folders should be joined with '/'. You need to keep track of the path to a given folder yourself (eg, linkdir + '/' + folder).


===Polls===
===Polls===
* https://groups.yahoo.com/api/v1/groups/relationship-poll/polls (list)
* https://groups.yahoo.com/api/v1/groups/relationship-poll/polls (list)
:- Known params: count.
:- Known params: count, start
:- Pagination: Page size defaults to 10, with no known limit. There is no "total" field in the response.


* https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106 (specific poll)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106 (specific poll)
: Polls return all votes cast, non-anonymised, including identifying metadata for all viewers.


===Databases===
===Databases===
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database (list of tables)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database (list of tables)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/ (specific table)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/ (specific table)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records (table contents)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records (table contents)
:- Pagination: None.
* https://groups.yahoo.com/neo/groups/groupname/database/1/records/export (export target)
:- Known params: format (CSV, TSV)


===Members===
===Members===
* https://groups.yahoo.com/api/v1/groups/iswipe/members/confirmed (list of confirmed members)
* https://groups.yahoo.com/api/v1/groups/iswipe/members/confirmed (list of confirmed members)
:- Known params: count, start, sortBy, sortOrder, ts, tz, chrome.
:- Known params: count, start, sortBy, sortOrder, ts, tz, chrome.
:- Pagination: Limited to 10 by default or a count of 100. No known limit on total results.
:- Pagination: Page size defaults to 10, with a limit of 100. No known limit on total results.
: May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
: May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
* https://groups.yahoo.com/api/v1/groups/iswipe/members/moderators (list of moderators)
* https://groups.yahoo.com/api/v1/groups/iswipe/members/moderators (list of moderators)
Line 117: Line 208:


===Events===
===Events===
Overlaps with Yahoo Calendar API, check nsapa's branch for the code.
 
Overlaps with Yahoo Calendar API, check yahoo-group-archiver code.


== Python Yahoo! Group archivers ==  
== Python Yahoo! Group archivers ==  


* [https://github.com/IgnoredAmbience/yahoo-group-archiver/network/members yahoo-group-archiver] scrapes a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. Relevant forks include [https://github.com/Frankkkkk/yahoo-group-archiver Frankkkkk] and [https://github.com/nsapa/yahoo-group-archiver nsapa]. Needs merging. Various branches have support (largely untested) for file attachments, photos, links, folders, and events.
* '''[https://github.com/IgnoredAmbience/yahoo-group-archiver/network/members yahoo-group-archiver]''' scrapes a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. <s>Relevant forks include [https://github.com/Frankkkkk/yahoo-group-archiver Frankkkkk] and [https://github.com/nsapa/yahoo-group-archiver nsapa]. Needs merging. Various branches have support (largely untested) for file attachments, photos, links, folders, and events.</s> Most stuff has been merged back into IgnoredAmbience's master. (Exceptions: full WARC support?, mtime work from Frankkkkk.) Needs consistent/WARC-appropriate handling for random 500 errors (require retries), attachment 404s (appear permanent), and 502 permissions errors (definitely permanent, currently halt script).
** [https://github.com/anirvan/yahoo-group-archive-tools Yahoo Group Archive Tools] Perl script converts yahoo-group-archiver output into clean rfc822 and mbox files, with separated attachments correctly reattached, and many Yahoo truncation/redaction bugs corrected. It also turns list archives into PDF, using [https://github.com/andrewferrier/email2pdf email2pdf], which many non-technical list owners prefer.
 
* [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] scrapes a group using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.


* [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver] is similar, but scrapes only messages (not files or any other data). It is not currently under active development.
* [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver] is similar, but scrapes only messages (not files or any other data). It is not currently under active development.
* [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] scrapes a group using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.


== Other archivers ==
== Other archivers ==


* [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver]: Perl, defunct.
* [http://www.personalgroupware.com/ PGOffline]: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Includes attachments. Stores data in a SQLite database internally.
* [http://www.personalgroupware.com/ PGOffline]: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Includes attachments. Stores data in a SQLite database internally.
* [http://yahoogroupedia.pbworks.com/w/page/93006447/Chrome%20Application%20To%20Download%20Messages Yahoo Messages Export]: Chrome extension. Messages only. Saves as mbox.
* [http://yahoogroupedia.pbworks.com/w/page/93006447/Chrome%20Application%20To%20Download%20Messages Yahoo Messages Export]: Chrome extension. Messages only. Saves as mbox.
* [https://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver]: Perl, defunct.


== External Links ==
== External Links ==


* https://archive.org/details/yahoo_groups
* https://archive.org/details/archiveteam_yahoogroups


== Coverage ==
== Coverage ==

Revision as of 01:46, 2 January 2020

Yahoo! Groups
Yahoo! Groups logo
Groups-yahoo-com.png
URL https://groups.yahoo.com/
Status Closing
Archiving status In progress...
Archiving type Unknown
Project source yahoogroups-grab, yahoo-group-archiver
Project tracker yahoogroups, yahoo-groups-api
IRC channel #yahoosucks (on hackint)

Yahoo! Groups is Yahoo's combination mailing list service/web forum; it's the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offers file uploads, photo uploads, links, polls, and an events calendar.

Uploading of new content was disabled 28 October 2019, and all content, including message history, will be made unavailable on 14 December 2019.[1] (The mailing lists themselves will continue to function.) After negative media attention, Yahoo said that they were extending the deadline for users to use their official incomplete data export tool to 31 January 2020[2][3].

Yahoo's data export tool misses a plethora of attachments, databases, polls, photos, and metadata.

It had been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say that.)

2019-12-21 Update:

  • Yahoo has hidden or deleted files/photos/messages from the web interface.
  • Please submit your Yahoo GetMyData (GMD) requests as soon as possible. Go to this page and follow the directions to make a request.

It may take 10 days to process a request. When the request comes through, save all the files that come back. Recent GMDs have been split into 2 GB ZIP files.

  • Be aware that there may be old malware buried inside the ZIP files. Modern email software and operating systems are expected to be mostly resistant to this old malware. **Some antivirus software may see the malware and may modify or delete the ZIP file. We don't want that to happen, so please make sure that they don't get deleted or modified in any way, except to remove private groups.**
  • After you've saved 2+ copies of all your GMD files, consider making a new GMD request. This will give you a second chance to collect group content in case a glitch happened on the first GMD attempt, in case Yahoo changes what content they are including/not-including in the GMDs, or in case you have joined additional groups since the first request.
  • We are looking to collect public and publicly-shareable group data for submission to Internet Archive. We would appreciate it if you make a GMD request, to save the data, and upload the data to an online file-hosting service, and then send a link to archiveteamprivateyahoogroup@gmail.com

Source: https://yahoo-geddon.tumblr.com/post/189779155159/sylpheed-tutorial

Communication Centers

Nominating Notable Non-Private Groups for Archival

Groups can be nominated for archival using this form. Please note that this form should not be used for groups that require administrator approval to join.

Adding Private Groups to the Public Archive

Administrators / Moderators can request that their private group (we consider a private group to be one that requires approval for new members) be included in the public archive. Before you do this, please ensure that the members of the group are happy about being part of the public archive.

To add the group to the list of private groups to be archived, all you need to do is send a membership invite to the email archiveteamprivateyahoogroup@gmail.com. (Note that only group admins can do this). We'll be monitoring that email regularly to accept any membership requests we receive. Once that account is a member, the group should be scheduled to be part of the public archive.

Please make sure that when you invite the Archive Team account, you do not select the Add only to mailing list option, as this will prevent Archive Team from archiving the group.

Statistics

As of 2019-10-16 the directory lists 5619351 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.

The following graphs are slightly outdated:

Yahoo groups date created.png Yahoo groups messages per group.png Yahoo groups post date.png

Private groups of interest

Group Notes Admin consent?
numberactivation see all the press coverage Not yet contacted; FOI request made
hpslash see Fanlore page Not yet contacted

Potentially relevant: List of groups with Fanlore pages (contains both private and public groups), Archive Trans Yahoo's list (all private at last check), Archive South Asian American Yahoo Groups (all public), and Queer Digital History Project (no groups listed, presumably all private).

Site structure

There’s a convenient JSON API. Some endpoints require logged-in group membership or other permissions (depending on group settings).

Groups

- Known params: maxHits, offset, query, sortBy (values: OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST)
- Known params: start, intlCode (au, in, sg, uk, us; ar, e1, es, mx; br; cf, fr; de; hk; it...)
- Pagination: Page size is 10. Does not have a count param. start is the result index, not the group id. start values 500 and up all return the same set of results.
Groups are listed in fixed but arbitrary order. /0/ is a special value that shows the root node; subcategories can be accessed by using the subcategory id instead (the full "idList" value is not required).
Defaults to the US view of the English directory tree. Different languages have different directory trees. Supplying a different intlCode parameter (list not exhaustive, must be lower case) accesses the corresponding view of the appropriate language's tree. Subcategory ids are language-specific and must be used with an appropriate intlCode. The intlCode -> language mapping may be checked at the /0/ endpoint; the root "name" is always "ROOT", but "id" is language-specific.[4] Different intlCode views of the same language list groups in a different order, may have slightly different category names, and appear to have slightly different numbers of categories in the full tree; their group overlap is about 99%.
The "count" field appears totally inaccurate.

Messages

- Known params: count, start, sortOrder (ASC, DESC), direction (1, -1)
- Pagination: Page size defaults to 10, with no known limit. start is the message id, not the result index. sortOrder adjusts the order of results in the json response's array, whereas direction determines which way to iterate through ids from start (default: DESC, -1).
- Original email is largely recoverable from rawEmail' field.
- Some messages may have encoding issues.[5] Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client. Remove non-ASCII characters and ^M characters from the 7-bit text should result in valid RFC822 emails.
- All message headers and textual body parts have email addresses redacted, with the hosts replaced with "...". For example, "From: ceo@ford.com" and "From: ceo@toyota.com" both get turned into "From: ceo@..."
- Some emails longer than 64kb (minus attachments) may be truncated. This truncation affects not just plain text, but also HTML and encoded Base64 content. To address this, delete the string "\n(Message over 64 KB, truncated)" from the end of the message part, so HTML/Base64/etc. parsers are somewhat less likely to break.
- All attachments are separated, with attachment bodies replaced with the string "[ Attachment content not displayed ]". Recovering the emails involves finding those MIME parts, looking at the filenames, comparing with the list of filenames listed in the "attachmentInfo" section, matching on similarity, and replacing the contents with the downloaded attachments. In very rare cases where a matching MIME section isn't found, it may be necessary to append those attachments as new MIME attachments to the email while reconstructing.
- Known params: ts, tz, chrome
- Redundancy: Generatable from /messages data.

Topics

- Known params: count, startTopicId, sortOrder (ASC, DESC), direction (1, -1)
- Pagination: Page size defaults to 25, with a limit of 100. sortOrder and direction as for messages.
- Known params: maxResults.
- Pagination: Page size defaults to 30 (messages in topic), with no known limit (maximum tested: 57). No known start param.
- Redundancy: Generatable from /messages data.
"messages" field is an array, each element of which seems to have the same contents as the corresponding /message/<id>/ (non-raw) endpoint; metadata ("totalMsgInTopic", "prevTopicId", "nextTopicId") could be reconstructed. Not known whether a message can fail to be associated with any topic.

Attachments

- Known params: count, start, sort (TITLE, TIME), order (ASC, DESC)
- Pagination: Page size defaults to 20, with no known limit (maximum tested: 93).

Attachment may be of several types: photo, file, ...?

Files

- Known params: sfpath (pass in a pathURI to retrieve the file listings of this subdirectory)
- Pagination: None.
Entries with "type" 0 are files; 1, directories.

Photos

- Known params: count, start, orderBy (MTIME), sortOrder (ASC, DESC), ownedByMe (TRUE, FALSE), lastFetchTime, photoFilter (ALL, PHOTOS_WITH_EXIF "Originals", PHOTOS_WITHOUT_EXIF "Shared")
- Pagination: Page size defaults to 20, with no known limit.
"totalPhotos" field in response gives total in group.
- Known params: count, start, albumType (PHOTOMATIC, NORMAL), orderBy (MTIME, TITLE), sortOrder (ASC, DESC)
- Pagination: Page size defaults to 12, with no known limit.
albumType defaults to NORMAL. PHOTOMATIC albumType requires the "READ" permission for "ATTACHMENTS". "total" field in response gives total number of albums of the selected type in group; however, this seems to have an off-by-one error for the NORMAL type of albums.
- Observed parameters similar to photos and albums endpoints, with additional ordinal sortOrder option
Photomatic albums must be loaded with the albumType parameter set to PHOTOMATIC.

Links

- Known params: linkdir
- Pagination: None.
linkdir takes the folder parameter from a dir. Nested folders should be joined with '/'. You need to keep track of the path to a given folder yourself (eg, linkdir + '/' + folder).

Polls

- Known params: count, start
- Pagination: Page size defaults to 10, with no known limit. There is no "total" field in the response.
Polls return all votes cast, non-anonymised, including identifying metadata for all viewers.

Databases

- Pagination: None.
- Known params: format (CSV, TSV)

Members

- Known params: count, start, sortBy, sortOrder, ts, tz, chrome.
- Pagination: Page size defaults to 10, with a limit of 100. No known limit on total results.
May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
Very often (always?) blocked for normal members.
Very often (always?) blocked for normal members.

Events

Overlaps with Yahoo Calendar API, check yahoo-group-archiver code.

Python Yahoo! Group archivers

  • yahoo-group-archiver scrapes a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. Relevant forks include Frankkkkk and nsapa. Needs merging. Various branches have support (largely untested) for file attachments, photos, links, folders, and events. Most stuff has been merged back into IgnoredAmbience's master. (Exceptions: full WARC support?, mtime work from Frankkkkk.) Needs consistent/WARC-appropriate handling for random 500 errors (require retries), attachment 404s (appear permanent), and 502 permissions errors (definitely permanent, currently halt script).
    • Yahoo Group Archive Tools Perl script converts yahoo-group-archiver output into clean rfc822 and mbox files, with separated attachments correctly reattached, and many Yahoo truncation/redaction bugs corrected. It also turns list archives into PDF, using email2pdf, which many non-technical list owners prefer.
  • yahoo-groups-backup scrapes a group using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.
  • YahooGroups-Archiver is similar, but scrapes only messages (not files or any other data). It is not currently under active development.

Other archivers

  • PGOffline: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Includes attachments. Stores data in a SQLite database internally.
  • Yahoo Messages Export: Chrome extension. Messages only. Saves as mbox.
  • Yahoo Group Archiver: Perl, defunct.

External Links

Coverage

References

  1. https://help.yahoo.com/kb/groups/SLN31010.html
  2. https://www.theverge.com/2019/12/10/21004883/yahoo-groups-extend-deadline-download-data-date-time[IAWcite.todayMemWeb]
  3. https://twitter.com/YahooCare/status/1204312076379926528[IAWcite.todayMemWeb]
  4. This id can also be accessed with an appropriate intlCode, but contains the same twelve groups for all languages: the groups in the categories for musical artists "Roots, The" and "Rusted Root", three groups which appear to be Yahoo tests, and one group which appears to be a spam test.
  5. https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean