Difference between revisions of "Ispygames"

From Archiveteam
Jump to: navigation, search
(IGN domains)
Line 1: Line 1:
 +
{| width=300px style="border: 1px solid #aaa; background-color: #f9f9f9; color: black; margin: 0.5em 0 0.5em 1em; padding: 0.2em; font-size: 90%;clear: right; float: right;"
 +
|-
 +
| colspan=2 align=center | <big>'''{{{title|Gamespy, IGN, 1up, ugo}}}'''</big>
 +
|-
 +
| colspan=2 align=center | [[File:{{{logo|Dummy.png}}}|100px|{{PAGENAME}} logo]]
 +
|-
 +
| colspan=2 align=center | [[File:{{{image|}}}|280px|{{{description|}}}]]<br/>{{{description|}}}
 +
|-
 +
| width=125px | '''URL''' || {{{URL|{{{url|http://www.gamespy.com & many others}}}}}}
 +
|-
 +
| width=125px | '''Project status''' || {{{project_status|{{Closing}}}}}
 +
|-
 +
| width=125px | '''Archiving status''' || {{{archiving_status|{{inprogress}}}}}
 +
|-
 +
| width=125px | '''Project source''' || {{{source|{{Unknown}}}}}
 +
|-
 +
| width=125px | '''Project tracker''' || {{{tracker|{{Unknown}}}}}
 +
|-
 +
| width=125px | '''IRC channel''' || <span class="plainlinks">[http://chat.efnet.org:9090/?nick=&channels=%23{{{irc|ispygames}}}&Login=Login #{{{irc|ispygames}}}]</span>
 +
|}<noinclude>
 +
 
== The News ==
 
== The News ==
  

Revision as of 21:57, 25 April 2013

Gamespy, IGN, 1up, ugo
Ispygames logo
[[File:|280px|]]
URL http://www.gamespy.com & many others
Project status Closing
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #ispygames

The News

IGN hit with layoffs, 1UP, UGO and GameSpy shutting down
1UP, UGO and GameSpy to be shut down

The Problems

  • Once you start digging around these sites you find it to be a mess of inconsistent url schemes and content everywhere.
  • Some files are being hosted on MediaFire.
  • Based on tests the larger and older a site is the more that is missed by a wget crawl due to the url scheme.

What we know

  • We already have a list of almost all the domains involved
  • A clean list with dups and bad domains is already being process and will be posted here when complete.
  • Most of the sites are not that big, but a few are huge.

The plan

  • Save the sites and related content
  • Backup the twitter feeds for any associated accounts. All my tweets just takes a username and returns the max tweets possible.


wget test command

This if for the gamespy sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://planetdoom.gamespy.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST" \
--span-hosts --domains=$SAVE_HOST,pcmedia.gamespy.com,pnmedia.gamespy.com,pspmedia.gamespy.com,oystatic.ignimgs.com

Try this for the ign, ugo sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://ve3d.ign.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST"

IGN domains


Ready to grab


untested

These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab


In Progress

Redirects