Difference between revisions of "Ispygames"

Revision as of 02:09, 3 April 2013

The News

IGN hit with layoffs, 1UP, UGO and GameSpy shutting down
1UP, UGO and GameSpy to be shut down

The Problems

Once you start digging around these sites you find it to be a mess of inconsistent url schemes and content everywhere.
Some files are being hosted on MediaFire.
Based on tests the larger and older a site is the more that is missed by a wget crawl due to the url scheme.

What we know

We already have a list of almost all the domains involved
A clean list with dups and bad domains is already being process and will be posted here when complete.
Most of the sites are not that big, but a few are huge.

The plan

Save the sites and related content
Backup the twitter feeds for any associated accounts. All my tweets just takes a username and returns the max tweets possible.

wget test command

This if for the gamespy sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://planetdoom.gamespy.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST" \
--span-hosts --domains=$SAVE_HOST,pcmedia.gamespy.com,pnmedia.gamespy.com,pspmedia.gamespy.com,oystatic.ignimgs.com

Try this for the ign, ugo sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://ve3d.ign.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST"

IGN domains

http://aavault.ign.com - grabbed, checking for completeness
http://autoassaultvault.ign.com/ - grabbed, checking for completeness
http://beacon.snowball.com - already closed
http://bombergirl.ign.com - closed contest page in flash
http://o.cubemedia.ign.com - dead end
http://actionunleashed.ign.com - grabbed, checking for completeness
http://dndvault.ign.com - grabbed, checking for completeness
http://vault.ign.com - grabbing
http://ac2vault.ign.com - Smiley - done
http://acvault.ign.com - Smiley - done
http://aion.ign.com - Smiley Grabbing
http://wowvault.ign.com - Smiley grabbing
http://atvault.ign.com - Smiley grabbing
http://alli.ign.com - Smiley done
http://antis.ign.com - Smiley grabbing
http://doa.ign.com - Smiley grabbing
http://aovault.ign.com - siliconvalleypark - grabbed
http://aevans.dev.m.au.ign.com - dead
http://aevans.dev.m.ca.ign.com - dead
http://aevans.dev.m.ie.ign.com - dead
http://aevans.dev.m.ign.com - dead
http://aevans.dev.m.uk.ign.com - dead
http://cdyi.dev.m.ign.com - dead
http://ve3d.ign.com - grabbed, checking for completeness
http://flashpoint.ign.com - siliconvalleypark - grabbed
http://beaterator.ign.com - siliconvalleypark - grabbed
http://aivlev.dev.m.au.ign.com - password protected dev site
http://aivlev.dev.m.ie.ign.com - password protected dev site
http://aivlev.dev.m.ign.com - password protected dev site
http://aivlev.dev.m.uk.ign.com - password protected dev site
http://apassey.dev.m.ign.com - password protected dev site

Ready to grab

untested

These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab

In Progress

http://planetcnc.gamespy.com - grabbed, checking for completeness
http://planetthesims.gamespy.com - grabbed, checking for completeness
http://planetfrontlines.gamespy.com - grabbed, checking for completeness
http://planetcivilization.gamespy.com - grabbed, checking for completeness
http://planethalflife.gamespy.com - grabbed, checking for completeness
http://planettransformers.gamespy.com - grabbed, checking for completeness
http://planetcoh.gamespy.com - grabbed, checking for completeness
http://planetbattlefield.gamespy.com - grabbed, checking for completeness
http://planetresidentevil.gamespy.com - grabbed, checking for completeness
http://planetxmen.gamespy.com - grabbed, checking for completeness
http://planetquake.gamespy.com - grabbed, checking for completeness
http://planetgrandtheftauto.gamespy.com - grabbed, checking for completeness
http://planettonyhawk.gamespy.com - grabbed, checking for completeness
http://planetunreal.gamespy.com - grabbed, checking for completeness
http://planetfallout.gamespy.com - grabbed, checking for completeness
http://planetageofempires.gamespy.com - grabbed, checking for completeness
http://planetgearsofwar.gamespy.com - grabbed, checking for completeness
http://planetcallofduty.gamespy.com - grabbed, checking for completeness
http://classicgaming.gamespy.com - grabbed, checking for completeness
http://planetdoom.gamespy.com - grabbed, checking for completeness
http://planetwwe.gamespy.com - grabbed, checking for completeness
http://pc.gamespy.com - grabbed, checking for completeness
http://psp.gamespy.com - grabbed, checking for completeness
http://planetfarcry.gamespy.com - grabbing

@@ Line 76: / Line 76: @@
 * http://flashpoint.ign.com - siliconvalleypark - grabbed
 * http://beaterator.ign.com - siliconvalleypark - grabbed
+* http://aivlev.dev.m.au.ign.com - password protected dev site
+* http://aivlev.dev.m.ie.ign.com - password protected dev site
+* http://aivlev.dev.m.ign.com - password protected dev site
+* http://aivlev.dev.m.uk.ign.com - password protected dev site
+* http://apassey.dev.m.ign.com - password protected dev site
 === Ready to grab ===
+* http://au.bestof.ign.com
+* http://au.retro.ign.com
+* http://au.sports.ign.com
+* http://bestof.ign.com
+* http://code.ign.com
 === untested ===
-* http://911.ign.com
 * http://adtools.ign.com - blank
-* http://aivlev.dev.m.au.ign.com
-* http://aivlev.dev.m.ie.ign.com
-* http://aivlev.dev.m.ign.com
-* http://aivlev.dev.m.uk.ign.com
-* http://apassey.dev.m.ign.com
-* http://au.bestof.ign.com
 * http://au.microsites.ign.com
-* http://au.retro.ign.com
-* http://au.sports.ign.com
 * http://au.top100.ign.com
 * http://au.video.ign.com
 * http://beacon.ign.com
-* http://bestofe3.ign.com
-* http://bestof.ign.com
 * http://blockbuster.ign.com
 * http://broadband.ign.com
@@ Line 103: / Line 102: @@
 * http://championshipgamingseries.ign.com
 * http://championsonline.ign.com
-* http://code.ign.com
 * http://cohvault.ign.com
 * http://comiccon.ign.com
@@ Line 115: / Line 113: @@
 * http://downloads.ign.com
 * http://dragonica.ign.com
-* http://dsi.ign.com
 * http://dsvault.ign.com
 * http://emailpreferences.ign.com
@@ Line 345: / Line 342: @@
 * http://ddovault.ign.com -> http://dndvault.ign.com/
 * http://bigworldvault.ign.com -> http://vault.ign.com
+* http://911.ign.com -> http://tickets.ign-inc.com/
+* http://bestofe3.ign.com -> http://games.ign.com/bestofe3.html
+* http://dsi.ign.com -> http://ds.ign.com/dsi/
 == Gamespy Domains ==

Difference between revisions of "Ispygames"

Revision as of 02:09, 3 April 2013

Contents

The News

The Problems

What we know

The plan

wget test command

IGN domains

Ready to grab

untested

These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab

In Progress

Navigation menu

Difference between revisions of "Ispygames"

Revision as of 02:09, 3 April 2013

The News

The Problems

What we know

The plan

wget test command

IGN domains

Ready to grab

untested

These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab

In Progress

Navigation menu

Search