SadDM is a table-top role player and digital pack rat. Too many times he's tried to follow a link to a defunct blog or closed forum, and that's what makes him sad.
My main areas of current concern are a few sites that are basically hubs for the RPG community. While none of them seem particularly at risk, at least one of them has had a major security incident which took it offline for a time, and at least one other runs custom forum software that doesn't have a complete post index that is exposed to the web.
Stuff I Care About
Why it's important: Paizo publishes the Pathfinder RPG. Depending on how you measure such things, it is the most popular RPG on the market. There are currently over half a million forum threads (some having hundreds or even thousands of posts). Additionally, they run a web store selling a variety of products (35k+ at the moment). Each store item may have its own discussion thread and reviews. All in all, this is a mountain of information stretching back 10 years.
Vital signs: Paizo is at the top of its industry, and there is no indication that I've seen that it is in danger of going away. That said, the forum and web store are a single piece of bespoke software that has evolved over the years, and will likely continue to do so. Of particular concern is the fact that there is no complete index of threads exposed to the public (merely a list of the last n threads posted to in each forum section). This means that it's unlikely that archive.org has been able to properly crawl the site.
Initial thread discovery: complete
Full thread scrape: complete
Product review discovery: forthcoming
**All of the product pages should be discoverable by the wayback crawler so we don't strictly need to crawl them**
Product review scrape: forthcoming
Product discussion discovery: forthcoming
Product discussion scrape: forthcoming
User profile scrape: forthcoming
Why it's important: This is probably the single largest RPG site in the world. The main feature is a forum with 250k+ threads and 5.8M+ individual posts.
Vital signs: Everything seems to be ship-shape here. However, they did suffer a fairly major security incident in Dec 2012 that was bad enough that the sites owner ran a Kickstarter to help get it back online.
Why it's important: Home of the two most important brands in table-top gaming: Dungeons & Dragons, and Magic: The Gathering.
Vital signs: There is no indication that WotC is going anywhere. That said, as a division of Hasbro, they are subject to corporate whims. Also, major site overhauls tend to occur when new editions of their games are released. In the past, old content has been shuffled off to archive sections, but there is concern throughout the community that old material may one day simply disappear down the memory hole.
Why it's important:
Why it's important:
Why it's important:
RPG Geek (and Board Game Geek)
Probably the most popular forum site for collectors of RPG products. Their forum is hidden behind a robots.txt file, and heaven forbid you want to grab a page with wget:
aeakett@sandpoint > ~/archive_team/acaeum.com > ~/bin/wget --spider --keep-session-cookies --save-cookies=COOKIEFILE http://www.acaeum.com/forum Spider mode enabled. Check if remote file exists. --2014-01-30 16:04:46-- http://www.acaeum.com/forum Resolving www.acaeum.com... 18.104.22.168 Connecting to www.acaeum.com|22.214.171.124|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 301 Moved Permanently Date: Thu, 30 Jan 2014 16:04:47 GMT Server: Apache Location: http://www.acaeum.com/forum/ Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 Location: http://www.acaeum.com/forum/ [following] Spider mode enabled. Check if remote file exists. --2014-01-30 16:04:47-- http://www.acaeum.com/forum/ Connecting to www.acaeum.com|126.96.36.199|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 403 FORBIDDEN Date: Thu, 30 Jan 2014 16:04:47 GMT Server: Apache X-Powered-By: PHP/5.5.7 Warning: 199 www.acaeum.com:80 You_are_abusive/hacking/spamming_www.acaeum.com X-Abuse: Your connection is not welcome due to: Possibly hostile scraper/harvester, signature may interfere with some Wordpress installs (SPD-0087). Domain error. INSTA-BAN (IB-21). You have been instantly banned due to extremely hazardous behavior! Vary: User-Agent Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html Remote file does not exist -- broken link!!!
That's right, they actually ban your IP if you make a request for one of their pages with wget! :-P
other companies with forums (Green Ronin, Kobold Publishing... doubtlessly there are others)
Funny stuff seen on IRC
23:39 < godane > it looks like IA is stuck for some reason 23:47 < joepie91 > "some reason"
13:06 < w0rp > I'm rsyncing two external disks, while mirroring an FTP, while moving files off machine to another, while seeding torrents, while running the Warrior. 13:06 < w0rp > What have I become!? 13:25 < BlueMaxim > godane.