Difference between revisions of "Ispygames"

From Archiveteam
Jump to navigation Jump to search
m (MOTHERFUCKER ! ! !)
m (Reverted edits by Megalanya2 (talk) to last revision by Jscott)
Line 20: Line 20:
|}<noinclude>
|}<noinclude>


== '''MOTHERFUCKER ! ! !''' ==
== The News ==
 
== '''MOTHERFUCKER ! ! !''' ==
 
== '''MOTHERFUCKER ! ! !''' ==


[http://www.polygon.com/2013/2/21/4014196/ign-layoffs-1up-ugo-and-gamespy-shutting-down IGN hit with layoffs, 1UP, UGO and GameSpy shutting down]<br />
[http://www.examiner.com/article/1up-ugo-and-gamespy-to-be-shut-down 1UP, UGO and GameSpy to be shut down]<br />
[http://pc.gamespy.com/articles/122/1227460p1.html Goodbye, And Thank You From The GameSpy Team]
== The Problems ==
== The Problems ==



Revision as of 16:43, 17 January 2017

Gamespy, IGN, 1up, ugo
Ispygames logo
Gamespy.jpg
URL http://www.gamespy.com & many others
Project status Closing
Archiving status Partially saved
Project source Unknown
Project tracker Unknown
IRC channel #ispygames

The News

IGN hit with layoffs, 1UP, UGO and GameSpy shutting down
1UP, UGO and GameSpy to be shut down
Goodbye, And Thank You From The GameSpy Team

The Problems

  • Once you start digging around these sites you find it to be a mess of inconsistent url schemes and content everywhere.
  • Some files are being hosted on MediaFire.
  • Based on tests the larger and older a site is the more that is missed by a wget crawl due to the url scheme.

What we know

  • We already have a list of almost all the domains involved
  • A clean list with dups and bad domains is already being process and will be posted here when complete.
  • Most of the sites are not that big, but a few are huge.

The plan

  • Save the sites and related content
  • Backup the twitter feeds for any associated accounts. All my tweets just takes a username and returns the max tweets possible.


wget test command

This if for the gamespy sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://planetdoom.gamespy.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST" \
--span-hosts --domains=$SAVE_HOST,pcmedia.gamespy.com,pnmedia.gamespy.com,pspmedia.gamespy.com,oystatic.ignimgs.com

Try this for the ign, ugo sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://ve3d.ign.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST"

IGN domains

In progress

Ready to grab


untested


These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab

In Progress

Redirects

1up.com

On 2016-05-24, http://www.1up.com has been thrown into ArchiveBot with job ident 35fcc4zofjl5kg52fkbcskgus.