Cyberpunkreview.com

From Archiveteam
Jump to: navigation, search
Cyberpunkreview.com
Cyberpunkreview.com logo
A screenshot of the home page taken on 10 April 2012.
A screenshot of the home page taken on 10 April 2012.
URL http://cyberpunkreview.com
Project status Online!
Archiving status Not saved yet
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam
Project lead Unknown

Cyberpunkreview.com is a Web site that reviews cyberpunk films, music, art, games and more.

Overview

The site doesn't appear to be inactive. A two month or so hiatus between October and December, but other than that, in good shape. It used to be host to a very popular forum for cyberpunks at http://cyberpunkreview.com/forums/index.php but they recently moved over to a different domain http://cyberpunkforums.com/ that is much more active. I have a penchant for anything cyberpunk and will keep an eye on the site.

It looks like a typical Wordpress instance. Not sure if we have a standard procedure for scraping them or not. Aggroskater 08:10, 19 March 2012 (EDT)

Mirroring

Currently in the process of mirroring with the following command set:

$ wget https://raw.github.com/ArchiveTeam/fortunecity/master/get-wget-warc.sh
$ chmod 0755 get-wget-warc.sh
$ ./get-wget-warc.sh
$ ./wget-warc --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites -e robots=off -w 5 --random-wait --warc-file=cpr http://cyberpunkreview.com/

Please email me or contact me in the #archiveteam channel if there's a better way of doing this. I've basically just taken the guideline from the Software page and applied it to alard's wget build with warc. The larger projects seem to have a more systematic way of grabbing terabytes worth of data. Since this is a relatively small site, I don't know if that's necessary or not. --Aggroskater 06:58, 13 April 2012 (EDT)

Friday April 20 2012 Update

Mirror complete. Appears fully operational on my machine. Warc gz is 385 megabytes in size. What's next? --Aggroskater 02:54, 20 April 2012 (EDT)

Ydg.jpg

Yeah... so it seems the wordpress site is at www.cyberpunkreview.com and not just cyberpunkreview.com. Running the following to grab the blog itself. Shouldn't take nearly as long. The /wiki and /forums work great from the first run though.


./wget-warc --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --exclude-directories=wiki,forums -e robots=off -w 2 --random-wait --warc-file=cpr-wp-fix http://www.cyberpunkreview.com/

Sunday April 22 2012 Update

Running into problems downloading the wordpress portion. Wget keeps segfaulting when converting links. Managed to narrow down a replication case:

$ wget https://raw.github.com/ArchiveTeam/fortunecity/master/get-wget-warc.sh
$ chmod 0755 get-wget-warc.sh
$ sed -i 's/rm -rf \$TARFILE \$TARDIR\///g' get-wget-warc.sh
$ sed -i 's/.\/configure/CFLAGS="-g" .\/configure/' get-wget-warc.sh
$ ./get-wget-warc.sh
$ cd wget-1.13.4-2582/src
$ gdb ./wget

...

Reading symbols from /home/preston/cprwp/wget-1.13.4-2582/src/wget...done.
(gdb) set args --html-extension --page-requisites -k -e robots=off --exclude-directories=wiki,forums --reject "*action=print" -w 1 --random-wait --warc-file=cpr-wp-debug http://www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/
(gdb) run

...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000405d56 in convert_links_in_hashtable (downloaded_set=0x679e10, 
    is_css=0, file_count=0x7fffffffdf8c) at convert.c:127
127	          local_name = hash_table_get (dl_url_file_map, u->url);
(gdb) backtrace
#0  0x0000000000405d56 in convert_links_in_hashtable (downloaded_set=0x679e10, 
    is_css=0, file_count=0x7fffffffdf8c) at convert.c:127
#1  0x0000000000405ead in convert_all_links () at convert.c:189
#2  0x0000000000427a62 in main (argc=14, argv=0x7fffffffe2a8) at main.c:1572
(gdb) print 0x679e10
$1 = 6790672
(gdb) print 0x7fffffffdf8c
$2 = 140737488347020
(gdb) info args
downloaded_set = 0x679e10
is_css = 0
file_count = 0x7fffffffdf8c
(gdb) info locals
local_name = 0x67c6d0 "www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/index.html"
u = 0x0
pi = 0x677b80
urls = 0x6bd050
cur_url = 0x69edd0
url = 0x679b70 "http://www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/"
file = 0x67c8a0 "www.cyberpunkreview.com/movie/upcoming-movies/initial-impressions-review-of-solid-state-society/index.html"
i = 0
cnt = 1
file_array = 0x7fffffffdee0

Oh goody. Null pointers. Or at least, I think that's what I'm looking at. Not sure what to do from here. --Aggroskater 19:47, 22 April 2012 (EDT)

Wednesday June 6 2012 Update

I ended up submitting a bug report and got a fix back. Open source is awesome :D

I'll most likely be mirroring the wordpress blog some time this weekend. The wiki and the forums are both static content with no changes in the past months.

Tuesday July 2 2013 Update

No input file specified.

Well that's not good. Looks like the site's been down for a week or two at least. This link at cyberpunkforums.com has some discussion. So far, no one's been able to get a hold of the domain registrant/website owner.

Good thing I still have my scrapes. They're not in the best of condition and could do with some postprocessing, but they have the corresponding WARCs and browsing locally from the main index.php or index.html file works pretty damn well for all three components of the site: Wordpress blog, Mediawiki wiki, PHPBB (I think?) forum.

Friday July 5 2013 Update

Aaaaaaand we're back :) Got in touch with the hosting company. Looks like there was a technical issue on their end that got patched up. I'll proceed to do a more thorough scrape with reasonable delays between requests. --Aggroskater 19:34, 5 July 2013 (EDT)


[view]  [edit]                   Archive Team                  
Current events Alive... OR ARE THEY  · Deathwatch  · Projects
Archiveteam.jpg
Archiving projects APKMirror  · Archive.is  · BetaArchive  · Government Backup (#datarefuge  · ftp-gov)  · Gmane  · Internet Archive  · It Died  · Megalodon.jp  · OldApps.com  · OldVersion.com  · OSBetaArchive  · TEXTFILES.COM  · The Dead, the Dying & The Damned  · The Mail Archive  · UK Web Archive  · WebCite  · Vaporwave.me
Blogging Blog.pl  · Blogger  · Blogster  · Blogter.hu  · Freeblog.hu  · Fuelmyblog  · Jux  · LiveJournal  · My Opera  · Nolblog.hu  · Open Diary  · ownlog.com  · Posterous  · Powerblogs  · Proust  · Roon  · Splinder  · Tumblr  · Vox  · Weblog.nl  · Windows Live Spaces  · Wordpress.com  · Xanga  · Yahoo! Blog  · Zapd
Cloud hosting/file sharing aDrive  · AnyHub  · Box  · Dropbox  · Docstoc  · Google Drive  · Google Groups Files  · iCloud  · Fileplanet  · LayerVault  · MediaCrush  · MediaFire  · Mega  · MegaUpload  · MobileMe  · OneDrive  · Pomf.se  · RapidShare  · Ubuntu One  · Yahoo! Briefcase
Corporations Apple  · IBM  · Google  · Lycos Europe  · Microsoft  · Yahoo!
Events Arab Spring  · Great Ape-Snake War  · Spanish Revolution
Font Repos DaFont  · Google Web Fonts  · GNU FreeFont  · Fontspace
Forums/Message boards 4chan  · Captain Luffy Forums  · College Confidential  · DSLReports  · ESPN Forums  · forums.starwars.com  · HeavenGames  · Invisionfree  · NeoGAF  · The Classic Horror Film Board  · Yahoo! Messages  · Yahoo! Neighbors  · Yuku.com
Gaming Atomicgamer  · City of Heroes  · Club Nintendo  · CS:GO Lounge  · Desura  · Dota 2  · Dota 2 Lounge  · Emulation Zone  · ESEA  · GameBanana  · GameMaker Sandbox  · GameTrailers  · Halo  · HLTV.org  · Infinite Crisis  · Minecraft.net  · Player.me  · Playfire  · Raptr  · Steam  · SteamDB  · TF2 Outpost  · Warhammer  · Xfire
Image hosting 500px  · AOL Pictures  · Blipfoto  · Blingee  · Canv.as  · Camera+  · Cameroid  · DailyBooth  · Degree Confluence Project  · deviantART  · Demotivalo.net  · Flickr  · Fotoalbum.hu  · Fotolog.com  · Fotopedia  · Frontback  · Geograph Britain and Ireland  · GTF Képhost  · ImageShack  · Imgh.us  · Imgur  · Inkblazers  · Instagram  · Kepfeltoltes.hu  · Kephost.com  · Kephost.hu  · Kepkezelo.com  · Keptarad.hu  · Madden GIFERATOR  · MLKSHK  · Microsoft Clip Art  · Microsoft Photosynth  · Nokia Memories  · noob.hu  · Odysee  · Panoramio  · Photobucket  · Picasa  · Picplz  · Pixiv  · PSharing  · Ptch  · puu.sh  · Rawporter  · Relay.im  · ScreenshotsDatabase.com  · Snapjoy  · Streetfiles  · Tabblo  · Tinypic  · Trovebox  · TwitPic  · Wallbase  · Wallhaven  · Webshots  · Wikimedia Commons
Knowledge/Wikis arXiv  · Citizendium  · Clipboard.com  · Deletionpedia  · EditThis  · Encyclopedia Dramatica  · Etherpad  · Everything2  · infoAnarchy  · GeoNames  · GNUPedia  · Google Books (Google Books Ngram)  · Horror Movie Database  · Insurgency Wiki  · Knol  · Lost Media Wiki  · Neoseeker.com  · Notepad.cc  · Nupedia  · OpenCourseWare  · OpenStreetMap  · Orain  · Pastebin  · Patch.com  · Project Gutenberg  · Puella Magi  · Referata  · Resedagboken  · SongMeanings  · ShoutWiki  · The Internet Movie Database  · TropicalWikis  · Uncyclopedia  · Urban Dictionary  · Webmonkey  · Wikia  · Wikidot  · WikiHow  · Wikkii  · WikiLeaks  · Wikipedia (Simple English Wikipedia)  · Wikispaces  · Wikispot  · Wik.is  · Wiki-Site  · WikiTravel  · Word Count Journal
Magazines/Blogs/News Cyberpunkreview.com  · Game Developer Magazine  · Gigaom  · Helium  · JPG Magazine  · Polygamia.pl  · San Fransisco Bay Guardian  · Scoop  · Regretsy  · Yahoo! Voices
Microblogging Heello  · Identi.ca  · Jaiku  · Mommo.hu  · Plurk  · Sina Weibo  · Twitter  · TwitLonger
Music/Audio AOL Music  · Audimated.com  · Cinch  · digCCmixter  · Dogmazic.net  · Earbits  · exfm  · Free Music Archive  · Gogoyoko  · Indaba Music  · Instacast  · Jamendo  · Last.fm  · Music Unlimited  · MOG  · PureVolume  · Reverbnation  · ShareTheMusic  · SoundCloud  · Soundpedia  · This Is My Jam  · TuneWiki  · Twaud.io  · WinAmp
People Aaron Swartz  · Michael S. Hart  · Steve Jobs  · Mark Pilgrim  · Dennis Ritchie  · Len Sassaman Project
Protocols/Infrastructure FTP  · Gopher  · IRC  · Usenet  · World Wide Web
BitTorrent DHT
Q&A Askville  · Answerbag  · Answers.com  · Ask.com  · Askalo  · Baidu Knows  · Blurtit  · ChaCha  · Experts Exchange  · Formspring  · GirlsAskGuys  · Google Answers  · Google Baraza  · JustAnswer  · MetaFilter  · Quora  · Retrospring  · StackExchange  · The AnswerBank  · The Internet Oracle  · Uclue  · WikiAnswers  · Yahoo! Answers
Recipes/Food Allrecipes  · Epicurious  · Food.com  · Foodily  · Food Network  · Punchfork  · ZipList
Social bookmarking Addinto  · Backflip  · Balatarin  · BibSonomy  · Bkmrx  · Blinklist  · BlogMarks  · BookmarkSync  · CiteULike  · Connotea  · Delicious  · Designer News  · Digg  · Diigo  · Dir.eccion.es  · Evernote  · Excite Bookmark  · Faves  · Favilous  · folkd  · Freelish  · Getboo  · GiveALink.org  · Gnolia  · Google Bookmarks  · Hacker News  · HeyStaks  · IndianPad  · Kippt  · Knowledge Plaza  · Licorize  · Linkwad  · Menéame  · Microsoft Developer Network  · myVIP  · Mister Wong  · My Web  · Mylink Vault  · Newsvine  · Oneview  · Pearltrees  · Pinboard  · Pocket  · Propeller.com  · Reddit  · sabros.us  · Scloog  · Scuttle  · Simpy  · SiteBar  · Slashdot  · Squidoo  · StumbleUpon  · Twine  · Vizited  · Yummymarks  · Xmarks  · Yahoo! Buzz  · Zootool  · Zotero
Social networks Bebo  · BlackPlanet  · Classmates.com  · Cyworld  · Dogster  · Dopplr  · douban  · Ello  · Facebook  · Flixster  · FriendFeed  · Friendster  · Friends Reunited  · Gaia Online  · Google+  · Habbo  · hi5  · Hyves  · iWiW  · LinkedIn  · Miiverse  · mixi  · MyHeritage  · MyLife  · Myspace  · myVIP  · Netlog  · Odnoklassniki  · Orkut  · Plaxo  · Qzone  · Renren  · Skyrock  · Sonico.com  · Storylane  · Tagged  · tvtag  · Upcoming  · Viadeo  · Vine  · Vkontakte  · WeeWorld  · Weibo  · Wretch  · Yahoo! Groups  · Yahoo! Stars India  · Yahoo! Upcoming  · more sites...
Shopping/Retail Alibaba  · AliExpress  · Amazon  · Apple Store  · eBay  · NCIX  · Printfection  · RadioShack  · Sears  · Sears Canada  · Target  · The Book Depository  · ThinkGeek  · Walmart
Software/code hosting Android Development  · Alioth  · Assembla  · BerliOS  · Betavine  · Bitbucket  · BountySource  · Codecademy  · CodePlex  · Freepository  · Free Software Foundation  · GNU Savannah  · GitHost  · GitHub  · GitHub Downloads  · Gitorious  · Gna!  · Google Code  · ibiblio  · java.net  · JavaForge  · KnowledgeForge  · Launchpad  · LuaForge  · Maemo  · mozdev  · OSOR.eu  · OW2 Consortium  · Openmoko  · OpenSolaris  · Ourproject.org  · Ovi Store  · Project Kenai  · RubyForge  · SEUL.org  · SourceForge  · Stypi  · TestFlight  · tigris.org  · Transifex  · TuxFamily  · Yahoo! Downloads
Torrenting/Piracy ExtraTorrent  · EZTV  · isoHunt  · KickassTorrents  · The Pirate Bay  · Torrentz  · Library Genesis
Video hosting Academic Earth  · Blip.tv  · Epic  · Google Video  · Justin.tv  · Niconico  · Nokia Trailers  · Plays.tv  · Qwiki  · Skillfeed  · Stickam  · TED Talks  · Ticker.tv  · Twitch.tv  · Ustream  · Videoplayer.hu  · Viddler  · Viddy  · Vimeo  · Vine  · Vstreamers  · Yahoo! Video  · YouTube  · Famous Internet videos (Me at the zoo)
Web hosting Angelfire  · Brace.io  · BT Internet  · CableAmerica Personal Web Space  · Claranet Netherlands Personal Web Pages  · Comcast Personal Web Pages  · Extra.hu  · FortuneCity  · Free ProHosting  · GeoCities (patch)  · Google Business Sitebuilder  · Google Sites  · Internet Centrum  · MBinternet  · MSN TV  · Nwnyet  · Parodius Networking  · Prodigy.net  · Saunalahti Iso G  · Swipnet  · Telenor  · Tripod  · University of Michigan personal webpages  · Verizon Mysite  · Verizon Personal Web Space  · Webzdarma  · Virgin Media
Web applications Mailman  · MediaWiki  · phpBB  · Simple Machines Forum  · vBulletin
Other 800notes  · AOL  · Akoha  · Ancestry.com  · April Fools' Day  · Amplicate  · AutoAdmit  · Bre.ad  · Circavie  · Cobook  · Co.mments  · Countdown  · Distill  · Dmoz  · Easel  · Eircode  · Electronic Frontier Foundation  · FanFiction.Net  · Feedly  · Ficlets  · Forrst  · FunnyExam.com  · FurAffinity  · Google Helpouts  · Google Moderator  · Google Reader  · ICQmail  · IFTTT  · Jajah  · JuniorNet  · Lulu Poetry  · Mobile Phone Applications  · Mochi Media  · Mozilla Firefox  · MyBlogLog  · NBII  · Neopets  · Quantcast  · Quizilla  · Salon Table Talk  · Shutdownify  · Slidecast  · SOPA blackout pages  · starwars.yahoo.com  · TechNet  · Toshiba Support  · USA-Gov  · Volán  · Widgetbox  · Windows Technical Preview  · Wunderlist  · Zoocasa
Information A Million Ways to Die on the Web  · Backup Tips  · Cheap storage  · Collecting items randomly  · Data compression algorithms and tools  · Dev  · Discovery Data  · DOS Floppies  · Fortress of Solitude  · Keywords  · Naughty List  · Nightmare Projects  · Rescuing floppy disks  · Rescuing optical media  · Site exploration  · The WARC Ecosystem  · Working with ARCHIVE.ORG
Projects ArchiveCorps  · Audit2014  · Emularity  · Faceoff  · FlickrFckr  · Froogle  · INTERNETARCHIVE.BAK (Internet Archive Census)  · IRC Quotes  · JSMESS  · JSVLC  · Just Solve the Problem  · NewsGrabber  · Project Newsletter  · Valhalla  · Web Roasting (ISP Hosting  · University Web Hosting)  · Woohoo
Tools ArchiveBot  · ArchiveTeam Warrior (Tracker)  · Google Takeout  · HTTrack  · Video downloaders  · Wget (Lua  · WARC)
Teams Bibliotheca Anonoma  · LibreTeam  · URLTeam  · Yahoo Video Warroom  · WikiTeam
About Archive Team Introduction  · Philosophy  · Who We Are  · Our stance on robots.txt  · Why Back Up?  · Software  · Formats  · Storage Media  · Recommended Reading  · Films and documentaries about archiving  · Talks  · In The Media  · FAQ