Difference between revisions of "Yahoo! Voices"

From Archiveteam
Jump to navigation Jump to search
Line 20: Line 20:
* A full list of URLs for the 1,858,710 articles currently live on the site - [https://www.dropbox.com/s/pnr3njg9rdcntyg/article-urls.zip]
* A full list of URLs for the 1,858,710 articles currently live on the site - [https://www.dropbox.com/s/pnr3njg9rdcntyg/article-urls.zip]
** URL type http://voices.yahoo.com/whatever-whatever-whatever-SOMENUMBER.html
** URL type http://voices.yahoo.com/whatever-whatever-whatever-SOMENUMBER.html
** URL type http://voices.yahoo.com/video/whatever-whatever-whatever-SOMENUMBER.html for video articles (62,746 - about 3.4% of total)
* HTML (only) grabs of the raw content_date index pages from Feb 2005-Jul 2014 - [https://www.dropbox.com/s/j11dwdps9feuz5i/content_crawl1.zip] and [https://www.dropbox.com/s/7tx2vizvvtbgzsq/content_crawl2.zip]
* HTML (only) grabs of the raw content_date index pages from Feb 2005-Jul 2014 - [https://www.dropbox.com/s/j11dwdps9feuz5i/content_crawl1.zip] and [https://www.dropbox.com/s/7tx2vizvvtbgzsq/content_crawl2.zip]



Revision as of 16:28, 28 July 2014

Yahoo Voices
Yahoo! Voices logo
Assorted articles and videos, 2005-2014
Assorted articles and videos, 2005-2014
URL http://voices.yahoo.com
Status Closing
Archiving status In progress...
Archiving type Unknown
Project source yahoo-voices-grab
Project tracker yahoovoices
IRC channel #shutup (on hackint)

Yahoo Voices is closing down.

What we have so far

Basic map

All articles can be found sorted by date of publication via the URL scheme http://voices.yahoo.com/content_date_MM_DD_YYYY.html - valid dates from FEBRUARY 2005 to present day. (Invalid dates are accepted - e.g. 31st February is treated as March 3rd)

Most/all articles are categorised in [only] one of the following categories:

2 : Entertainment
3 : Business & Finance
5 : Health & Wellness
6 : Home Improvement
7 : Lifestyle
14 : Sports
15 : Technology
16 : Travel
27 : Automotive
38 : Books
47 : Creative Writing
62 : News

-- NB: list not complete --

Categories have main pages at URLs like http://voices.yahoo.com/books/?cat=38

Category IDs can be suffixed to content_date URLs (e.g. http://voices.yahoo.com/content_date_MM_DD_YYYY_6.html) to list only articles published on that date in that category.

Most articles are single-page basic text-and-graphics (e.g. http://voices.yahoo.com/yahoo-fires-all-contributors-12721520.html?cat=3) while others have video (e.g. http://voices.yahoo.com/video/how-better-housekeeper-12537238.html?cat=6) - the ?cat= category suffix does not appear to be necessary to access the page, but is added by the site if not present. Video articles can be identified by the leading /video/ path in the URL.