Difference between revisions of "Ispygames"

From Archiveteam
Jump to: navigation, search
(IGN domains)
(IGN domains)
Line 50: Line 50:
  
 
== IGN domains ==
 
== IGN domains ==
* http://beacon.snowball.com - already closed
 
* http://codesmedia.ign.com
 
* http://dcmedia.ign.com
 
* http://entertainmentmedia.ign.com
 
* http://faqsmedia.ign.com
 
* http://faqsmovies.ign.com
 
* http://ffmedia.ign.com
 
 
* http://911.ign.com
 
* http://911.ign.com
 
* http://aavault.ign.com
 
* http://aavault.ign.com
Line 86: Line 79:
 
* http://au.video.ign.com
 
* http://au.video.ign.com
 
* http://beacon.ign.com
 
* http://beacon.ign.com
 +
* http://beacon.snowball.com - already closed
 
* http://beaterator.ign.com
 
* http://beaterator.ign.com
 
* http://bestofe3.ign.com
 
* http://bestofe3.ign.com
Line 103: Line 97:
 
* http://code.ign.com
 
* http://code.ign.com
 
* http://codesmedia.ign.com
 
* http://codesmedia.ign.com
 +
* http://codesmedia.ign.com
 
* http://cohvault.ign.com
 
* http://cohvault.ign.com
 
* http://comiccon.ign.com
 
* http://comiccon.ign.com
Line 112: Line 107:
 
* http://cubemedia.ign.com
 
* http://cubemedia.ign.com
 
* http://dcmedia.ign.com
 
* http://dcmedia.ign.com
 +
* http://dcmedia.ign.com
 
* http://ddovault.ign.com
 
* http://ddovault.ign.com
 
* http://design.ign.com
 
* http://design.ign.com
Line 123: Line 119:
 
* http://emailpreferences.ign.com
 
* http://emailpreferences.ign.com
 
* http://entertainmentmedia.ign.com
 
* http://entertainmentmedia.ign.com
 +
* http://entertainmentmedia.ign.com
 
* http://eq2vault.ign.com
 
* http://eq2vault.ign.com
 
* http://eqvault.ign.com
 
* http://eqvault.ign.com
Line 128: Line 125:
 
* http://evevault.ign.com
 
* http://evevault.ign.com
 
* http://faqsmedia.ign.com
 
* http://faqsmedia.ign.com
 +
* http://faqsmedia.ign.com
 
* http://faqsmovies.ign.com
 
* http://faqsmovies.ign.com
 +
* http://faqsmovies.ign.com
 
* http://feeds.ign.com
 
* http://feeds.ign.com
 +
* http://ffmedia.ign.com
 
* http://ffvault.ign.com
 
* http://ffvault.ign.com
 
* http://findit.ign.com
 
* http://findit.ign.com
Line 158: Line 158:
 
* http://ie.media.ign.com
 
* http://ie.media.ign.com
 
* http://ie.top100.ign.com
 
* http://ie.top100.ign.com
 +
* http://insdermedia.ign.com
 
* http://insiderdownloads.ign.com
 
* http://insiderdownloads.ign.com
 
* http://insidermedia.ign.com
 
* http://insidermedia.ign.com
Line 173: Line 174:
 
* http://live.ign.com
 
* http://live.ign.com
 
* http://lotrovault.ign.com
 
* http://lotrovault.ign.com
 +
* http://mac.ign.com
 
* http://macmedia.ign.com
 
* http://macmedia.ign.com
 
* http://macmovies.ign.com
 
* http://macmovies.ign.com
Line 181: Line 183:
 
* http://mcraft.ign.com
 
* http://mcraft.ign.com
 
* http://media.ign.com
 
* http://media.ign.com
 +
* http://media.ign.com
 +
* http://memoviedia.ign.com
 
* http://mevault.ign.com
 
* http://mevault.ign.com
 
* http://m.ie.ign.com
 
* http://m.ie.ign.com
Line 186: Line 190:
 
* http://minecraft.ign.com
 
* http://minecraft.ign.com
 
* http://mobile.ign.com
 
* http://mobile.ign.com
 +
* http://mobile.ign.com
 
* http://moviemedia.ign.com
 
* http://moviemedia.ign.com
 
* http://moviesmovies.ign.com
 
* http://moviesmovies.ign.com
Line 218: Line 223:
 
* http://pawong.dev.www.ign.com
 
* http://pawong.dev.www.ign.com
 
* http://pcmedia.ign.com
 
* http://pcmedia.ign.com
 +
* http://pcmedia.ign.com
 
* http://planetelderscrolls.ign.com
 
* http://planetelderscrolls.ign.com
 
* http://play.ign.com
 
* http://play.ign.com
 
* http://pocketmedia.ign.com
 
* http://pocketmedia.ign.com
 +
* http://pocketmedia.ign.com
 
* http://podcast.ign.com
 
* http://podcast.ign.com
 
* http://podcasts.ign.com
 
* http://podcasts.ign.com
Line 228: Line 235:
 
* http://promotools.ign.com
 
* http://promotools.ign.com
 
* http://ps2media.ign.com
 
* http://ps2media.ign.com
 +
* http://ps2media.ign.com
 
* http://psxmedia.ign.com
 
* http://psxmedia.ign.com
 +
* http://psxmedia.ign.com
 
* http://psxmovies.ign.com
 
* http://psxmovies.ign.com
 +
* http://psxmovies.ign.com
 
* http://publish.ign.com
 
* http://publish.ign.com
 
* http://rift.ign.com
 
* http://rift.ign.com
Line 243: Line 253:
 
* http://sbvault.ign.com
 
* http://sbvault.ign.com
 
* http://scifimedia.ign.com
 
* http://scifimedia.ign.com
 +
* http://scifimedia.ign.com
 
* http://s.faqsmovies.ign.com
 
* http://s.faqsmovies.ign.com
 
* http://s.ffmovies.ign.com
 
* http://s.ffmovies.ign.com
 +
* http://share.affiliation.com
 
* http://share.ign.com
 
* http://share.ign.com
 
* http://shootmania.ign.com
 
* http://shootmania.ign.com
Line 301: Line 313:
 
* http://vault.ign.com
 
* http://vault.ign.com
 
* http://vaultmedia.ign.com
 
* http://vaultmedia.ign.com
 +
* http://vaultmedia.ign.com
 
* http://ve3d.ign.com - grabbing
 
* http://ve3d.ign.com - grabbing
 
* http://vgu.stg.www.ign.com
 
* http://vgu.stg.www.ign.com
Line 313: Line 326:
 
* http://wiki.stg.www.ign.com
 
* http://wiki.stg.www.ign.com
 
* http://wiremedia.ign.com
 
* http://wiremedia.ign.com
 +
* http://wiremedia.ign.com
 
* http://wiremovies.ign.com
 
* http://wiremovies.ign.com
 +
* http://wiremovies.ign.com
 
* http://wishvault.ign.com
 
* http://wishvault.ign.com
 
* http://witchervault.ign.com
 
* http://witchervault.ign.com
Line 319: Line 334:
 
* http://www.antis.ign.com
 
* http://www.antis.ign.com
 
* http://www.blockbuster.ign.com - already dead
 
* http://www.blockbuster.ign.com - already dead
 +
* http://www.championshipgamingseries.com
 
* http://www.ipl.ign.com
 
* http://www.ipl.ign.com
 
* http://www.kaneandlynch.ign.com
 
* http://www.kaneandlynch.ign.com
 
* http://www.mevault.ign.com
 
* http://www.mevault.ign.com
 
* http://www.supersmashbros.ign.com
 
* http://www.supersmashbros.ign.com
 +
* http://xboxlive.ign.com
 
* http://xboxlivemedia.ign.com
 
* http://xboxlivemedia.ign.com
 
* http://xboxlivemovies.ign.com
 
* http://xboxlivemovies.ign.com
 
* http://xboxmedia.ign.com
 
* http://xboxmedia.ign.com
 
* http://xboxmovies.ign.com
 
* http://xboxmovies.ign.com
* http://insdermedia.ign.com
 
* http://mac.ign.com
 
* http://media.ign.com
 
* http://memoviedia.ign.com
 
* http://mobile.ign.com
 
* http://pcmedia.ign.com
 
* http://pocketmedia.ign.com
 
* http://ps2media.ign.com
 
* http://psxmedia.ign.com
 
* http://psxmovies.ign.com
 
* http://scifimedia.ign.com
 
* http://share.affiliation.com
 
* http://vaultmedia.ign.com
 
* http://wiremedia.ign.com
 
* http://wiremovies.ign.com
 
* http://www.championshipgamingseries.com
 
* http://xboxlive.ign.com
 
  
 
=== Redirects ===
 
=== Redirects ===

Revision as of 19:29, 24 March 2013

The News

IGN hit with layoffs, 1UP, UGO and GameSpy shutting down
1UP, UGO and GameSpy to be shut down

The Problems

  • Once you start digging around these sites you find it to be a mess of inconsistent url schemes and content everywhere.
  • Some files are being hosted on MediaFire.
  • Based on tests the larger and older a site is the more that is missed by a wget crawl due to the url scheme.

What we know

  • We already have a list of almost all the domains involved
  • A clean list with dups and bad domains is already being process and will be posted here when complete.
  • Most of the sites are not that big, but a few are huge.

The plan

  • Save the sites and related content
  • Backup the twitter feeds for any associated accounts. All my tweets just takes a username and returns the max tweets possible.


wget test command

This if for the gamespy sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://planetdoom.gamespy.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$SAVE_HOST" \
-U "$USER_AGENT" "$SAVE_HOST" \
--span-hosts --domains=$SAVE_HOST,pcmedia.gamespy.com,pnmedia.gamespy.com,pspmedia.gamespy.com,oystatic.ignimgs.com

Try this for the ign, ugo sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://ve3d.ign.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$SAVE_HOST" \
-U "$USER_AGENT" "$SAVE_HOST"

IGN domains

Redirects

Gamespy Domains