Difference between revisions of "The Mail Archive"
Jump to navigation
Jump to search
(update stats) |
m (Add site image and single line bash to generate list of mailing list URLs) |
||
Line 2: | Line 2: | ||
| title = The Mail Archive | | title = The Mail Archive | ||
| description = | | description = | ||
| image = Mail-archive_com_Oct13-2015.jpeg | |||
| URL = {{url|1=http://www.mail-archive.com|2=mail-archive.com}} | | URL = {{url|1=http://www.mail-archive.com|2=mail-archive.com}} | ||
| project_status = {{online}} | | project_status = {{online}} | ||
Line 13: | Line 14: | ||
We could use this as a starting point by parsing the OPML to get a sitemap for future web-scraping. | We could use this as a starting point by parsing the OPML to get a sitemap for future web-scraping. | ||
Single line Bash to scrape OPML file for mailing list URLs: | |||
wget -qO- https://www.mail-archive.com/feeds/feeds.opml | \ | |||
egrep -o "^\s*htmlUrl=\"([^\"]*)\"$" | sed 's/^[^"]*"//' | \ | |||
sed 's/"$//' | |||
{{Navigation box}} | {{Navigation box}} |
Revision as of 14:33, 14 October 2015
The Mail Archive | |
URL | mail-archive.com[IA•Wcite•.today•MemWeb] |
Status | Online! |
Archiving status | Not saved yet |
Archiving type | Unknown |
IRC channel | #archiveteam-bs (on hackint) |
The Mail Archive is what it sounds like; it's an ad-supported mailing list archive that users can add arbitrary mailing lists to. Started in 1998, it currently holds 121,034,946 archived postings, on 4,314 mailing lists as of October 2015.
Possible leads
List of mailing lists (in OPML format)
We could use this as a starting point by parsing the OPML to get a sitemap for future web-scraping.
Single line Bash to scrape OPML file for mailing list URLs:
wget -qO- https://www.mail-archive.com/feeds/feeds.opml | \ egrep -o "^\s*htmlUrl=\"([^\"]*)\"$" | sed 's/^[^"]*"//' | \ sed 's/"$//'