Angelfire

From Archiveteam
Jump to: navigation, search
Angelfire
Angelfire logo
Angelfire- Welcome to Angelfire 1303510943179.png
URL http://www.angelfire.lycos.com/
Project status Online!
Archiving status Not saved yet
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam

Angelfire is a web hosting service since 1996, containing big chunks of early WWW history (which people love to mock at).

It is not expected that the Angelfire archive can ever be truly complete, as Angelfire, like other free hosts such as Homestead, has or had a policy of deleting "inactive" accounts. As there is no known mirror of many of these former accounts and associated web pages, there may be no way to recover such deleted websites.

Angelfire underwent some changes in 2010, apparently not disruptive but requiring users to pay for some options like the old Web Shell tool; we do not know whether this caused some older websites to become unaccessible for their owners and whether that could cause inactivity and hence deletion. The Alexa rank of the property seems in constant fall, from better than 2000th position in early 2012 to worse than 3400th in early 2014.

It's not clear how bad Lycos is. A quick seach for Lycos shutdowns only points to their (independently operated) Lycos Europe liquidation, which gave less than a month for the users to save their emails before deletion. Lycos Tripod on the other hand, which was in 2003 Europe's largest homepage building community (with special Google alliance), found a last minute buyer for its European wing but then suddenly went down in July 2013 (it was around 60,000th Alexa position in 2012 and fell well below 100,000 in early 2013).

Status

Archivebot gave it a try, http://archive.fart.website/archivebot/viewer/job/9yhap

Schbirid has some ugly Bash scripts: https://github.com/SpiritQuaddicted/angelfire (ask before you use, they are probably out of date)

Discovery & Downloading

First grab all the sitemap indexes:

curl http://www.angelfire.com/robots.txt | grep -Eo 'http.*gz' > sitemap-index-urls

http://www.angelfire.com/sitemap-index-00.xml.gz
http://www.angelfire.com/sitemap-index-01.xml.gz
http://www.angelfire.com/sitemap-index-02.xml.gz
...
http://www.angelfire.com/sitemap-index-ff.xml.gz


Use that to grab all the sitemaps:

wget -i sitemap-index-urls

Inside you will see the users' sitemaps URLs

<sitemap><loc>http://www.angelfire.com/punk4/jori_loves_jackass/sitemap.xml</loc><lastmod>2012-04-10</lastmod></sitemap>
<sitemap><loc>http://www.angelfire.com/vevayaqo/sitemap.xml</loc><lastmod>2012-04-10</lastmod></sitemap>
<sitemap><loc>http://www.angelfire.com/planet/dumbass123/sitemap.xml</loc><lastmod>2012-04-10</lastmod></sitemap>
...


Extract the user sitemap URLs:

zgrep -hEo 'http:.*xml' sitemap-index-*.xml.gz > sitemap-urls

http://www.angelfire.com/punk4/jori_loves_jackass/sitemap.xml
http://www.angelfire.com/vevayaqo/sitemap.xml
http://www.angelfire.com/planet/dumbass123/sitemap.xml
...

Extract the webpage URLs:

grep -Eo '<loc>.*</loc>' www.angelfire.com/"${user}"/sitemap.xml | sed 's#<loc>##' | sed 's#</loc>##' > "${user}.urls"

http://www.angelfire.com/ab7/pledgecry/band.html
http://www.angelfire.com/ab7/pledgecry/biography.html
http://www.angelfire.com/ab7/pledgecry/ernst.html
http://www.angelfire.com/ab7/pledgecry/header.html
...

Grab them with options like: -m --no-parent --no-cookies -e robots=off --page-requisites --domains=angelfire.com,lycos.com


As of 2015-05-08 there are 3895290 users

You will want --no-cookies because angelfire wants to set them everywhere.

Reject http://www.angelfire.lycos.com/doc/images/track/ot_noscript.gif.* and reject http://www.angelfire.com/adm/ad/ (ads) --> --reject-regex='(www.angelfire.com\/adm\/ad\/|www.angelfire.com\/doc\/images\/track\/ot_noscript\.gif)'

Some images are hosted on http://www.angelfire.lycos.com --> --domains=angelfire.com,lycos.com


Guestbooks have been killed in 2012, eg http://htmlgear.lycos.com/guest/control.guest?u=gosanson&i=2&a=view

Some users have blogs with infinite calendars, like this in the sitemap: http://filesha.angelfire.com/blog/index.blog . Wget will run infinitely on those, better skip them for now.

Many users have no URLs in their sitemaps. Not sure what to do with those.

External links


[view]  [edit]                   Archive Team                  
Current events Alive... OR ARE THEY · Deathwatch · Projects
Archiveteam.jpg
Archiving projects Archive.is · BetaArchive · Gmane · Internet Archive · It Died · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite
Blogging Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd
Cloud hosting/file sharing AnyHub · Box · Dropbox · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase
Corporations Apple · IBM · Google · Lycos Europe · Microsoft · Yahoo!
Events Arab Spring · Occupy movement · Spanish Revolution
Font Repos Google Web Fonts · GNU FreeFont · Fontspace
Forums 4chan · College Confidential · ESPN Forums · forums.starwars.com · HeavenGames · Yahoo! Messages · Yahoo! Neighbors
Gaming Atomicgamer · City of Heroes · Club Nintendo · Desura · Emulation Zone · GameMaker Sandbox · Halo · Infinite Crisis · Minecraft.net · Player.me · Playfire · Steam · Warhammer · Xfire
Image hosting AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · deviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotopedia · Frontback · Geograph Britain and Ireland · GTF Képhost · ImageShack · Imgur · Inkblazers · Instagr.am · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Snapjoy · Streetfiles · Tabblo · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons
Knowledge/Wikis arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram) · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia) · Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal
Magazines/Blogs/News Cyberpunkreview.com · Game Developer Magazine · Gigaom · Helium · JPG Magazine · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices
Microblogging Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Twitter · TwitLonger
Music/Audio AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · TuneWiki · Twaud.io · WinAmp
People Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project
Protocols/Infrastructure FTP · Gopher · IRC · Usenet · World Wide Web
Q&A Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers
Recipes/Food Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList
Social bookmarking Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero
Social networks Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...
Shopping/Retail Alibaba · AliExpress · Amazon · Apple Store · eBay · Printfection · RadioShack · Sears · Target · The Book Depository · ThinkGeek · Walmart
Software/code hosting Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads
Torrenting/Piracy ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz
Video hosting Academic Earth · Blip.tv · Epic · Google Video · Justin.tv · Niconico · Nokia Trailers · Qwiki · Stickam · TED Talks · Twitch.tv · Ustream · Viddler · Viddy · Vimeo · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)
Web hosting Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch) · Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webzdarma · Virgin Media
Web applications Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin
Other AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Neopets · Quantcast · Quizilla · Salon Table Talk · Slidecast · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · Volán · Widgetbox · Windows Technical Preview · Wunderlist · Zoocasa
Information A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Backup Tips · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG
Projects Audit2014 · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census) · IRC Quotes · ISP Hosting · JSMESS · JSVLC · Just Solve the Problem · Project Newsletter · University Web Hosting · Valhalla · Woohoo
Tools ArchiveBot · ArchiveTeam Warrior (Tracker) · Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)
Teams Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam
About Archive Team Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ
Personal tools