Difference between revisions of "User:SadDM"

From Archiveteam
Jump to navigation Jump to search
(blah blah blah... cleaning up and capturing thoughs)
Line 50: Line 50:


===Canonfire===
===Canonfire===
===The Piazza===
[https://archive.org/details/www.thepiazza.org.uk_grab_2014-01-29 complete]


=== Acaeum ===
=== Acaeum ===
Line 89: Line 86:
That's right, they actually ban your IP if you make a request for one of their pages with wget! :-P
That's right, they actually ban your IP if you make a request for one of their pages with wget! :-P


===More Stuff===
* [http://www.norwescon.org/about/history/ Videos from Norwescon Going back to the 80s]
* [http://contessaonline.com/category/events/ Contessa recordings]
* [http://www.youtube.com/user/AetherconOfficial/videos?view=0&flow=list Aethercon videos]
----
----



Revision as of 15:46, 12 March 2014

Sad d20.jpg

SadDM is a table-top role player and digital pack rat. Too many times he's tried to follow a link to a defunct blog or closed forum, and that's what makes him sad.

My main areas of current concern are a few sites that are basically hubs for the RPG community. While none of them seem particularly at risk, at least one of them has had a major security incident which took it offline for a time, and at least one other runs custom forum software that doesn't have a complete post index that is exposed to the web.

Stuff I Care About

Paizo Publishing

Why it's important: Paizo publishes the Pathfinder RPG. Depending on how you measure such things, it is the most popular RPG on the market. There are currently over half a million forum threads (some having hundreds or even thousands of posts). Additionally, they run a web store selling a variety of products (35k+ at the moment). Each store item may have its own discussion thread and reviews. All in all, this is a mountain of information stretching back 10 years.

Vital signs: Paizo is at the top of its industry, and there is no indication that I've seen that it is in danger of going away. That said, the forum and web store are a single piece of bespoke software that has evolved over the years, and will likely continue to do so. Of particular concern is the fact that there is no complete index of threads exposed to the public (merely a list of the last n threads posted to in each forum section). This means that it's unlikely that archive.org has been able to properly crawl the site.

Progress
Initial thread discovery: complete
Full thread scrape: complete
Product review discovery: forthcoming
Product review scrape: forthcoming
Product discussion discovery: forthcoming
Product discussion scrape: forthcoming
**All of the product pages should be discoverable by the wayback crawler so we don't strictly need to crawl them**
User profile scrape: forthcoming

EN World

Why it's important: This is probably the single largest RPG site in the world. The main feature is a forum with 250k+ threads and 5.8M+ individual posts.

Vital signs: Everything seems to be ship-shape here. However, they did suffer a fairly major security incident in Dec 2012 that was bad enough that the sites owner ran a Kickstarter to help get it back online.

Wizards of the Coast (WotC)

Why it's important: Home of the two most important brands in table-top gaming: Dungeons & Dragons, and Magic: The Gathering.

Vital signs: There is no indication that WotC is going anywhere. That said, as a division of Hasbro, they are subject to corporate whims. Also, major site overhauls tend to occur when new editions of their games are released. In the past, old content has been shuffled off to archive sections, but there is concern throughout the community that old material may one day simply disappear down the memory hole.

RPG.net

Why it's important:

Vital signs:

Obsidian Portal

Why it's important:

Vital signs:

1000 Monkeys, 1000 Typewriters (1km1kt)

Why it's important:

Vital signs:

Others

RPG Geek (and Board Game Geek)

Canonfire

Acaeum

Probably the most popular forum site for collectors of RPG products. Their forum is hidden behind a robots.txt file, and heaven forbid you want to grab a page with wget:

aeakett@sandpoint > ~/archive_team/acaeum.com > ~/bin/wget --spider --keep-session-cookies --save-cookies=COOKIEFILE http://www.acaeum.com/forum
Spider mode enabled. Check if remote file exists.
--2014-01-30 16:04:46--  http://www.acaeum.com/forum
Resolving www.acaeum.com... 216.246.52.75
Connecting to www.acaeum.com|216.246.52.75|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 301 Moved Permanently
  Date: Thu, 30 Jan 2014 16:04:47 GMT
  Server: Apache
  Location: http://www.acaeum.com/forum/
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=iso-8859-1
Location: http://www.acaeum.com/forum/ [following]
Spider mode enabled. Check if remote file exists.
--2014-01-30 16:04:47--  http://www.acaeum.com/forum/
Connecting to www.acaeum.com|216.246.52.75|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 403 FORBIDDEN
  Date: Thu, 30 Jan 2014 16:04:47 GMT
  Server: Apache
  X-Powered-By: PHP/5.5.7
  Warning: 199 www.acaeum.com:80 You_are_abusive/hacking/spamming_www.acaeum.com
  X-Abuse: Your connection is not welcome due to: Possibly hostile scraper/harvester, signature may interfere with some Wordpress installs (SPD-0087). Domain error. INSTA-BAN (IB-21). You have been instantly banned due to extremely hazardous behavior!
  Vary: User-Agent
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: text/html
Remote file does not exist -- broken link!!!

That's right, they actually ban your IP if you make a request for one of their pages with wget! :-P

More Stuff



other companies with forums (Green Ronin, Kobold Publishing... doubtlessly there are others)