Difference between revisions of "Yahoo! Groups"
Switchnode (talk | contribs) |
(→Private groups of interest: Added that stupid OfCom group that everyone's so excited about) |
||
Line 74: | Line 74: | ||
* [https://groups.yahoo.com/neo/groups/hpslash/info hpslash]: see [https://fanlore.org/wiki/Hpslash_%28mailing_list%29 Fanlore page] | * [https://groups.yahoo.com/neo/groups/hpslash/info hpslash]: see [https://fanlore.org/wiki/Hpslash_%28mailing_list%29 Fanlore page] | ||
* [https://groups.yahoo.com/neo/groups/numberactivation/info numberactivation]: see [https://trendingpress.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-which-is-shutting-down/ all] [https://reclaimthenet.org/ofcom-oftel-uk-phone-numbers-yahoo-groups/ the] [https://www.axios.com/yahoo-groups-ofcom-cell-phone-number-porting-51949f81-446e-4b4b-82eb-26790146e9a0.html press] [https://techupdatess.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-the-verge/ coverage]. A [https://www.whatdotheyknow.com/request/all_data_held_in_yahoo_groups_us FOI request] has been made to try and get the data. | |||
== Software for backups == | == Software for backups == |
Revision as of 22:45, 18 October 2019
Yahoo! Groups | |
URL | http://groups.yahoo.com/ |
Status | Online! |
Archiving status | In progress... |
Archiving type | Unknown |
IRC channel | #yahoosucks (on hackint) |
Yahoo! Groups is Yahoo's email service; it's the result of the acquisition of eGroups and some other Yahoo! stuff.
It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say that.)
Python Yahoo! Group Archiver
The yahoo-groups-backup is a Python script which allows a scraping of the group. So far only messages are scraped. It puts all the info and metadata (both rendered message body and raw email) into a Mongo database, and provides a script to dump a static version of the site that can be read off of the filesystem. It works with Neo and with private groups by clunkily using Selenium to do the scraping.
Another Python-based Archiver is YahooGroups-Archiver, which is a simple Python script to dump the messages into individual JSON files. No further processing of the messages is done to preserve them in the format Yahoo uses for displaying them. Private groups can be archived by providing the contents of two cookies that Yahoo uses to verify a logged-in user.
Yet another Python-based Archiver is https://github.com/philpem/yahoo-group-archiver.
Perl Yahoo! Group Archiver
Update: Apparently since Yahoo! Groups changed to the neo interface the script no longer functions and is no longer actively maintained.
The Yahoo Group Archiver is a Perl script which allows an export of "the messages (without the attachments), everything from the files section and all the images from the photo section along with their hierarchy on Yahoo".
It appears that, if you get the "Couldn't get message count" error when trying to use it, the solution is to edit the yahoo2maildir.pl file and replace the bottom line my $url = $HTTP::URI_CLASS->new($redirect, $base)->abs($base);
(under the heading sub GetJSRedirect
) with my $url = "http://groups.yahoo.com/group/$group/messages/$begin_msgid";
More frustratingly, it appears that Yahoo blocks your IP temporarily after hitting some invisible limit of data downloaded (the Archiver will continue to "download" messages for a bit, ending up with a bunch of 0-byte files, then stop completely). It's unknown if there is a solution.
Also: sometimes, some of the downloaded messages, in the middle of an otherwise normal batch, are 0 in size - almost as if Yahoo blocked your IP for a few seconds, then stopped. Watch out for these so that you can re-download them later.
Site Structure
There’s a convenient JSON API. May require logging in and joining a group to use all endpoints:
- Group Information: https://groups.yahoo.com/api/v1/groups/concatenative/
- List of Messages: https://groups.yahoo.com/api/v1/groups/concatenative/messages?count=100
- Specific Message: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/
- Raw Message Content: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw – note that there seems to be a message encoding problem
- List of Topics: https://groups.yahoo.com/api/v1/groups/concatenative/topics?count=100
- Specific Topic: https://groups.yahoo.com/api/v1/groups/concatenative/topics/1
- List of Tables: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database
- Specific Table: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/
- Table Content: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records
- List of Files: https://groups.yahoo.com/api/v1/groups/a_furrys_world/files
- List of Attachments: https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments
- List of Polls: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls?count=100
- Specific Poll: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106
- List of Photos: https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos
- List of Albums: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums
- Specific Album: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391
- List Moderators: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/moderators
- Members With Incorrect Emails: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/bouncing
- List of Links: https://groups.yahoo.com/api/v1/groups/a_furrys_world/links
- Search: https://groups.yahoo.com/api/v1/search/groups?offset=0&maxHits=20&sortBy=&query=abcdef – sort can be one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST
- Categories: https://groups.yahoo.com/api/v1/dir/categories/0/?start=0
Note that all paginated responses are limited to the first 500 results and do not return anything new beyond that.
Statistics
As of 2019-10-16 the directory lists 5619351 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.
The following graphs are slightly outdated:
Private groups of interest
- hpslash: see Fanlore page
- numberactivation: see all the press coverage. A FOI request has been made to try and get the data.
Software for backups
- Yahoo Group Archiver, Sourceforge