Difference between revisions of "Ispygames"

From Archiveteam
Jump to navigation Jump to search
(31 intermediate revisions by 13 users not shown)
Line 11: Line 11:
| width=125px | '''Project status''' || {{{project_status|{{Closing}}}}}
| width=125px | '''Project status''' || {{{project_status|{{Closing}}}}}
|-
|-
| width=125px | '''Archiving status''' || {{{archiving_status|{{inprogress}}}}}
| width=125px | '''Archiving status''' || {{partiallysaved}}
|-
|-
| width=125px | '''Project source''' || {{{source|{{Unknown}}}}}
| width=125px | '''Project source''' || {{{source|{{Unknown}}}}}
Line 23: Line 23:


[http://www.polygon.com/2013/2/21/4014196/ign-layoffs-1up-ugo-and-gamespy-shutting-down IGN hit with layoffs, 1UP, UGO and GameSpy shutting down]<br />
[http://www.polygon.com/2013/2/21/4014196/ign-layoffs-1up-ugo-and-gamespy-shutting-down IGN hit with layoffs, 1UP, UGO and GameSpy shutting down]<br />
[http://www.examiner.com/article/1up-ugo-and-gamespy-to-be-shut-down 1UP, UGO and GameSpy to be shut down]
[http://www.examiner.com/article/1up-ugo-and-gamespy-to-be-shut-down 1UP, UGO and GameSpy to be shut down]<br />
 
[http://pc.gamespy.com/articles/122/1227460p1.html Goodbye, And Thank You From The GameSpy Team]
== The Problems ==
== The Problems ==


Line 40: Line 40:


* Save the sites and related content
* Save the sites and related content
* Backup the twitter feeds for any associated accounts. [http://www.allmytweets.net/ All my tweets] just takes a username and returns the max tweets possible.
* Backup the Twitter feeds for any associated accounts. [http://www.allmytweets.net/ All my tweets] just takes a username and returns the max tweets possible.




Line 432: Line 432:
* http://planetdeusex.gamespy.com -> gamespy.com (actual site's at planetdeusex.com)
* http://planetdeusex.gamespy.com -> gamespy.com (actual site's at planetdeusex.com)
* http://planetelderscrolls.gamespy.com -> planetelderscrolls.ign.com
* http://planetelderscrolls.gamespy.com -> planetelderscrolls.ign.com
== 1up.com ==
On 2016-05-24, http://www.1up.com has been thrown into [[ArchiveBot]] with {{Job|35fcc4zofjl5kg52fkbcskgus}}.
{{Navigation box}}

Revision as of 16:04, 13 April 2018

Gamespy, IGN, 1up, ugo
Ispygames logo
Gamespy.jpg
URL http://www.gamespy.com & many others
Project status Closing
Archiving status Partially saved
Project source Unknown
Project tracker Unknown
IRC channel #ispygames

The News

IGN hit with layoffs, 1UP, UGO and GameSpy shutting down
1UP, UGO and GameSpy to be shut down
Goodbye, And Thank You From The GameSpy Team

The Problems

  • Once you start digging around these sites you find it to be a mess of inconsistent url schemes and content everywhere.
  • Some files are being hosted on MediaFire.
  • Based on tests the larger and older a site is the more that is missed by a wget crawl due to the url scheme.

What we know

  • We already have a list of almost all the domains involved
  • A clean list with dups and bad domains is already being process and will be posted here when complete.
  • Most of the sites are not that big, but a few are huge.

The plan

  • Save the sites and related content
  • Backup the Twitter feeds for any associated accounts. All my tweets just takes a username and returns the max tweets possible.


wget test command

This if for the gamespy sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://planetdoom.gamespy.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST" \
--span-hosts --domains=$SAVE_HOST,pcmedia.gamespy.com,pnmedia.gamespy.com,pspmedia.gamespy.com,oystatic.ignimgs.com

Try this for the ign, ugo sites.

USER_AGENT="Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
SAVE_HOST="http://ve3d.ign.com"
WARC_NAME="warc_name"

wget -e robots=off --mirror --page-requisites \ 
--waitretry 5 --timeout 60 --tries 5 --wait 2 \
--warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" \
-U "$USER_AGENT" "$SAVE_HOST"

IGN domains

In progress

Ready to grab


untested


These might be asset only hosting sites

Redirects

Gamespy Domains

Ready to grab

In Progress

Redirects

1up.com

On 2016-05-24, http://www.1up.com has been thrown into ArchiveBot with job:35fcc4zofjl5kg52fkbcskgus.