Difference between revisions of "Ispygames"

From Archiveteam
Jump to navigation Jump to search
Line 126: Line 126:
* http://gta.ign.com - Smiley, done
* http://gta.ign.com - Smiley, done
* http://www.blockbuster.ign.com - already dead
* http://www.blockbuster.ign.com - already dead
=== Ready to grab ===
* http://au.bestof.ign.com
* http://au.retro.ign.com
* http://au.sports.ign.com
* http://wiki.stg.www.ign.com
=== untested ===
* http://au.microsites.ign.com
* http://au.top100.ign.com
* http://au.video.ign.com
* http://crose.dev.m.ca.ign.com
* http://crose.dev.m.ign.com
* http://guitarhero3.ign.com - Smiley, done
* http://guitarhero3.ign.com - Smiley, done
* http://gwvault.ign.com - Smiley, finished with some timeouts
* http://gwvault.ign.com - Smiley, finished with some timeouts
* http://halo.ign.com - Smiley, done
* http://halo.ign.com - Smiley, done
* http://hjvault.ign.com - redirects to vault.ign.com
* http://hls.gbartone.dev.m.ign.com -> hls.gbartone.dev.m.uk.ign.com, done, smiley
* http://hls.gbartone.dev.m.ign.com -> hls.gbartone.dev.m.uk.ign.com, done, smiley
* http://horizonsvault.ign.com - Smiley, done
* http://horizonsvault.ign.com - Smiley, done
* http://ie.bestof.ign.com -> uk.bestof.ign.com (localized version of sites?
* http://ie.bestof.ign.com -> uk.bestof.ign.com (localized version of sites?
* http://ie.top100.ign.com - Smiley, (localized version redirected to .uk.)
* http://ie.top100.ign.com - Smiley, (localized version redirected to .uk.)
* http://ipl.ign.com -> ign.com/ipl
* http://uk.bestof.ign.com - Smiley grabbing
* http://uk.corp.ign.com - Smiley grabbing
* http://uk.retro.ign.com - Smiley grabbing -> broken by + in url
* http://uk.sports.ign.com - Smiley grabbing -> uk.ign.com
* http://uk.top100.ign.com - Done
* http://iplstore.ign.com - dead, uploaded, Smiley
* http://iplstore.ign.com - dead, uploaded, Smiley
* http://jloijens.dev.m.ign.com - Smiley, redirected to .uk. version, done
* http://jloijens.dev.m.ign.com - Smiley, redirected to .uk. version, done
Line 162: Line 150:
* http://live.ign.com - Smiley, done
* http://live.ign.com - Smiley, done
* http://lotrovault.ign.com - Smiley
* http://lotrovault.ign.com - Smiley
* http://mac.ign.com -> http://uk.ign.com/games/reviews?platformSlug=mac
* http://mag.ign.com - Smiley, done
* http://mag.ign.com - Smiley, done
* http://mail2.ign.com - Smiley, done
* http://mail2.ign.com - Smiley, done
* http://m.au.ign.com
* http://m.ca.ign.com
* http://mcraft.ign.com - Smiley, done
* http://mcraft.ign.com - Smiley, done
* http://memoviedia.ign.com - Smiley, dead
* http://memoviedia.ign.com - Smiley, dead
* http://mevault.ign.com - Smiley, done
* http://mevault.ign.com - Smiley, done
* http://m.ie.ign.com
* http://m.ign.com -> m.uk.ign.com, Smiley
* http://m.ign.com -> m.uk.ign.com, Smiley
* http://minecraft.ign.com - Smiley
* http://minecraft.ign.com - Smiley
* http://mobile.ign.com
* http://uk.video.ign.com - Smiley grabbing -> broken by redirect
* http://mobile.ign.com  
* http://niboppub.ign.com - Smiley, done
* http://nwvault.ign.com - Smiley
* http://m.uk.ign.com - Smiley
* http://m.uk.ign.com - Smiley
* http://musichub.ign.com - Smiley
* http://musichub.ign.com - Smiley
* http://mxovault.ign.com - Smiley
* http://mxovault.ign.com - Smiley
* http://nchandra.dev.m.uk.ign.com - Smiley, done
* http://o.rpgvaultarchive.ign.com - Smiley, done
* http://potbsvault.ign.com - Smiley, done with some timeouts
* http://rotavault.ign.com - Smiley
* http://rpgvaultarchive.ign.com - Smiley
* http://ryzomvault.ign.com - Smiley, done, some timeouts
* http://sbvault.ign.com - Smiley, dead, done
* http://starcraft2.ign.com - Smiley done
=== Ready to grab ===
* http://au.bestof.ign.com
* http://au.retro.ign.com
* http://au.sports.ign.com
* http://wiki.stg.www.ign.com
=== untested ===
* http://au.microsites.ign.com
* http://au.top100.ign.com
* http://au.video.ign.com
* http://crose.dev.m.ca.ign.com
* http://crose.dev.m.ign.com
* http://m.au.ign.com
* http://m.ca.ign.com
* http://m.ie.ign.com
* http://mobile.ign.com
* http://mobile.ign.com
* http://nchandra.dev.m.au.ign.com
* http://nchandra.dev.m.au.ign.com
* http://nchandra.dev.m.ie.ign.com
* http://nchandra.dev.m.ie.ign.com
* http://nchandra.dev.m.ign.com
* http://nchandra.dev.m.ign.com
* http://nchandra.dev.m.uk.ign.com - Smiley, done
* http://nchandra.dev.www.ign.com
* http://nchandra.dev.www.ign.com
* http://niboppub.ign.com - Smiley, done
* http://nwvault.ign.com - Smiley
* http://o.guidesarchive.ign.com -> uk.ign.com/wikis
* http://o.mobile.ign.com
* http://o.mobile.ign.com
* http://open.em.ign.com
* http://open.em.ign.com
* http://opt-out.emailpreferences.ign.com
* http://opt-out.emailpreferences.ign.com
* http://o.rpgvaultarchive.ign.com - Smiley, done
* http://overlord.ign.com
* http://overlord.ign.com
* http://pawong.dev.www.ign.com
* http://pawong.dev.www.ign.com
Line 196: Line 204:
* http://podcast.ign.com
* http://podcast.ign.com
* http://podcasts.ign.com
* http://podcasts.ign.com
* http://potbsvault.ign.com - Smiley, done with some timeouts
* http://primeblog.ign.com
* http://primeblog.ign.com
* http://promotions.ign.com
* http://promotions.ign.com
Line 205: Line 212:
* http://rmcadams.dev.m.ign.com
* http://rmcadams.dev.m.ign.com
* http://rmcadams.dev.www.ign.com
* http://rmcadams.dev.www.ign.com
* http://rotavault.ign.com - Smiley
* http://rpgvaultarchive.ign.com - Smiley
* http://rsullivan.dev.m.ca.ign.com
* http://rsullivan.dev.m.ca.ign.com
* http://rsullivan.dev.m.ign.com
* http://rsullivan.dev.m.ign.com
* http://rsullivan.dev.www.ign.com
* http://rsullivan.dev.www.ign.com
* http://ryzomvault.ign.com - Smiley, done, some timeouts
* http://sbvault.ign.com - Smiley, dead, done
* http://share.affiliation.com  
* http://share.affiliation.com  
* http://share.ign.com
* http://share.ign.com
Line 223: Line 226:
* http://sslvpn.ign.com
* http://sslvpn.ign.com
* http://staging-api.ign.com
* http://staging-api.ign.com
* http://starcraft2.ign.com - Smiley done
* http://starcraft.ign.com - Smiley done - redirect to starcraft2.ign.com
* http://store.ign.com
* http://store.ign.com
* http://strangleholdcentral.ign.com
* http://strangleholdcentral.ign.com
Line 251: Line 252:
* http://trvault.ign.com
* http://trvault.ign.com
* http://twoworldsvault.ign.com
* http://twoworldsvault.ign.com
* http://uk.bestof.ign.com - Smiley grabbing
* http://uk.corp.ign.com - Smiley grabbing
* http://uk.retro.ign.com - Smiley grabbing -> broken by + in url
* http://uk.sports.ign.com - Smiley grabbing -> uk.ign.com
* http://uk.top100.ign.com - Done
* http://uk.video.ign.com - Smiley grabbing -> broken by redirect
* http://uovault.ign.com
* http://uovault.ign.com
* http://v3-api.stg.ie.ign.com
* http://v3-api.stg.ie.ign.com
Line 356: Line 351:
* http://emailpreferences.ign.com - redirect to mail.ign.com
* http://emailpreferences.ign.com - redirect to mail.ign.com
* http://guidesarchive.ign.com -> http://uk.ign.com/wikis
* http://guidesarchive.ign.com -> http://uk.ign.com/wikis
* http://hjvault.ign.com - redirects to vault.ign.com
* http://ipl.ign.com -> ign.com/ipl
* http://mac.ign.com -> http://uk.ign.com/games/reviews?platformSlug=mac
* http://o.guidesarchive.ign.com -> uk.ign.com/wikis
* http://starcraft.ign.com - Smiley done - redirect to starcraft2.ign.com


== Gamespy Domains ==
== Gamespy Domains ==

Revision as of 15:54, 23 April 2013

The News

IGN hit with layoffs, 1UP, UGO and GameSpy shutting down
1UP, UGO and GameSpy to be shut down

The Problems

  • Once you start digging around these sites you find it to be a mess of inconsistent url schemes and content everywhere.
  • Some files are being hosted on MediaFire.
  • Based on tests the larger and older a site is the more that is missed by a wget crawl due to the url scheme.

What we know

  • We already have a list of almost all the domains involved
  • A clean list with dups and bad domains is already being process and will be posted here when complete.
  • Most of the sites are not that big, but a few are huge.

The plan

  • Save the sites and related content
  • Backup the twitter feeds for any associated accounts. All my tweets just takes a username and returns the max tweets possible.


wget test command

This if for the gamespy sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://planetdoom.gamespy.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST" \
--span-hosts --domains=$SAVE_HOST,pcmedia.gamespy.com,pnmedia.gamespy.com,pspmedia.gamespy.com,oystatic.ignimgs.com

Try this for the ign, ugo sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://ve3d.ign.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST"

IGN domains


Ready to grab


untested

These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab


In Progress

Redirects