URLTeam

From Archiveteam
Jump to: navigation, search
Urlteam
URLTeam logo
url shortening was a fucking awful idea
url shortening was a fucking awful idea
URL http://urlte.am
Project status Online!
Archiving status In progress...
Project source urlteam-stuff tinyback tinyarchive
Project tracker http://urlteam.terrywri.st/
IRC channel #urlteam

TinyURL, bit.ly and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.

Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see Wikipedia: Link Rot). Archive.org/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member bit.ly does not actually share their databases and most other big shorteners don't share theirs either.

Contents

301Work cooperation

301works logo.jpg

The fine folks at archive.org have provides us with upload permissions to the 301Works archive: http://www.archive.org/details/301utm. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).

Tools

TinyBack

The easiest way to help with scraping is to run the Warrior and select the URLTeam project. You can also run TinyBack outside the warrior, though Python 2.6 or newer is required:

 git clone https://github.com/ArchiveTeam/tinyback
 cd tinyback
 # Use ./run.py --help for more information on command-line options
 ./run.py --tracker=http://urlteam.terrywri.st/ --num-threads=3 --sleep=180

URL shorteners

New table

The new table includes shorteners we have already started to scrape.

Name Est. number of shorturls Scraping done by Status Comments
Tinyurl.com 10,000,000,000 Warrior scraping: sequential, done up to azzzzz new shorturls: non-sequential, 7 characters
Bit.ly 50,000,000,000 Warrior scraping: non-sequential, 6 characters new shorturls: non-sequential, 6 characters
Goo.gl  ? User:Scumola started (2011-03-04) goo.gl throttles pulls
is.gd 934,134,706 (2013-05-20) Warrior scraping: sequential, done up to kZZZZ new shorturls: non-sequential, 6 characters
ff.im  ? User:Chronomex only used by FriendFeed, no interface to shorten new URLs
4url.cc 1279 (2009-08-14)[1] User:Chronomex dead (2011-02-15)
litturl.com 17096 (2010-04-15)[2] User:Chronomex dead (2010-11-18)
xs.md 3084 (2009-08-15)[3] User:Chronomex done dead (2010-11-18)
url.0daymeme.com 14867 (2009-08-14)[4] User:Chronomex done dead (2010-11-18)
Old tr.im 1990425 - got what we could dead (2011-12-31)
New tr.im  ? Warrior scraping: sequential, done up to 42pzz new shorturls: sequential
visibli (hex) 16777216 User:Chfoo Done. 15104865 301MB Using links.sharedby.co/links/ as URL prefix.
ur1.ca  ? Warrior scraping: sequential, done up to dzzzz new shorturls: sequential
ow.ly  ? Warrior scraping: sequential, done up to lyZZZ new shorturls: sequential
snipurl.com  ? Warrior scraping: sequential, done up to 271~~~~ new shorturls: sequential, starting from 20wa5rt
post.ly (Posterous)  ? Warrior/EC2 done dead
vbly.us (formerly vb.ly)  ? Warrior scraping: sequential, done up to 2hba new shorturls: sequential
arseh.at  ? Warrior scraping: sequential, done up to 4fv3 new shorturls: sequential
zapd.co 326592 User:Chfoo Done. 144093 1.7M xxxx.zapd.co
Bre.ad 120932351 User:Chfoo Incomplete (59771889 examined). 54506 1.2MB de.ad (2013-11-18).

Got what I can without overloading their EC2 instance.

Name Number of shorturls Scraping done by Status Comments

Alive

Last verified 2013-04-17. Original list last updated 2009-08-14 [5].

  • adf.ly - Ex: http://adf.ly/bnpYL
  • adjix.com - Still resolves URLs, but site does not work: "The requested application was not found on this server."
  • ar.gy - Argyle Social
  • ask.fm - Ex: ask.fm/a/40k05kgp
  • awe.sm
  • biglnk.com
  • budurl.com - Appears non-incremental
  • buff.ly - Buffer App
  • burl.se - Incremental. Ex: http://burl.se/428
  • catchyurl.co
  • cli.gs - Appears non-incremental
  • cl.ly - CloudApp
  • cmt.com - Country Music Television
  • decenturl.com - Not at all easy to scrape.
  • del.ly - sprinklr
  • df4.us - daringfireball.net
  • dld.bz - "private URL shortening service"
  • dlvr.it
  • doiop.com - Appears non-incremental
  • easyurl.net - Appears non-incremental. Ex: http://easyurl.net/afd2f
  • fav.me - Used by DeviantArt. Ex: http://fav.me/d31sfml
  • flip.it - Flipboard
  • flpbd.it - Flipboard
  • fnd.us (See offical shorteners)
  • fwdurl.net
  • go2.me - Appears incremental. Ex: http://u.go2.me/6YK http://u.go2.me/6YL
  • htl.li - Alias of ow.ly
  • ht.ly - Alias of ow.ly
  • dft.ba
  • ilix.in - HTML redirect
  • jdem.cz - Incremental with random (?) last digit - Ex: http://jdem.cz/bw388
  • kcy.me
  • korta.nu
  • kuijt.nu
  • ln.is - linkis.com
  • lnq.me
  • metamark.net / xrl.us - ? http://xrl.us/bfabog
  • mgnet.me - for torrent magnet URIs.
  • migre.me
  • mindless.co
  • msft.it - Sprinklr
  • msnbc.com
  • msplinks.com - Used by Myspace[1]
  • mtw.tl
  • my.dot.tk/tweak - Appears non-incremental
  • myloc.me
  • mytinyurl.com
  • myurl.in - HTML redirect - Ex: http://myurl.in/xtP5H / http://urlgator.com/xtP5H / http://ug4.me/xtP5H / http://link-ed.in/xtP5H
  • nblo.gs
  • news.me
  • nig.gr
  • notlong.com - Appears to be alpha-only - Ex: http://yeitoo.notlong.com/
  • nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.
  • owl.li
  • pear.ly - Used by pearltrees.com. Ex: http://pear.ly/6J1H
  • ph.ly Related to the pond called Philadelphia, where links are born and raised
  • pnut.co - Ex: http://pnut.co/3a
  • po.st
  • prsm.tc - getprismatic.com
  • r.ebay.com
  • rod.gs
  • say.ly
  • sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX
  • shar.es (See offical shorteners). Still resolves URLs, but the site is 404
  • shorl.com - Doesn't appear guessable - Ex: http://shorl.com/tisikestibahu
  • shorturl.com - Probably sequential/loweralpha - Ex: http://alturl.com/wqok
  • shrd.by - see sharedby.co
  • shrinkurl.us - Still resolves, but does not allow creating new URLs ("The URL you entered was not valid or did not exist.")
  • shrt.st - Appears incremental - Ex: http://shrt.st/vpz
  • smarturl.eu / joturl.com - Doesn't appear guessable, HTML redirect.
  • smarturl.it - smartURL
  • snipr.com / snipurl.com / snurl.com - Appears incremental - Ex: http://snipr.com/27nvst http://snipr.com/27nvtt. snipr.com and snipurl.com work but appear infected with malware.
  • sns.mx - SNS Analytics
  • soa.li - Gigya inc.
  • soc.li - Gigya inc.
  • spne.ws - Silicon Prairie News
  • spnsr.tw - sponsoredtweets.com
  • srtn.us - still resolves URLs, but site just shows blank page
  • surl.co.uk - Many shortening options.
  • techme.me - Techmeme
  • tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv
  • tinyarrows.com / ta.gd / ri.ms / ➡.ws / ➨.ws / ➯.ws / ➔.ws / ➞.ws / ➽.ws / ➹.ws / ✩.ws / ✿.ws / ❥.ws / ›.ws / ⌘.ws / ‽.ws / ☁.ws - Appears non-incremental: uses user-defined words for URLs (e.g. http://➡.ws/URLTEAM)
  • tiny.cc - Appears non-incremental
  • tm.to - Twtmore
  • to.gg - Global Giving
  • trap.it
  • trib.al
  • tr.im - Appears incremental - Ex: http://tr.im/44tn2 http://tr.im/44tn4
  • tweetburner.com / twurl.nl - Appears incremental
  • twitthis.com
  • urlcut.com
  • uxp.in - still resolves URLs, but site just shows blank page
  • vimeo.com
  • vitrue.com - Now part of Oracle
  • vk.cc
  • waa.ai
  • x.co - Appears incremental - Ex: http://x.co/1IxUV http://x.co/1IxUW
  • xrl.us - see metamark.net
  • y2u.be - meant for YouTube videos
  • yatuc.com - Not accepting new urls.
  • yep.it
  • yoolink.to - Yoolink

"Official" shorteners

  • bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)
  • CokeURL.com - Coca-Cola
  • db.tt - Dropbox
  • di.sn - Disney
  • fb.me - Facebook
  • flic.kr - Flickr
  • fnd.us - Fundrazr.com
  • g.co - Google (used for Google products and services)
  • goo.gl - Google
  • go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)
  • git.io - GitHub only URLs
  • gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )
  • hub.me - HubPages
  • ift.tt - IFTTT
  • igg.me - Indiegogo
  • lnkd.in - LinkedIn
  • pocket.co - Pocket
  • post.ly - Posterous
  • shar.es - ShareThis - 404 on homepage, otherwise ok
  • skfb.ly - Sketchfab
  • spoti.fi - Spotify
  • stanford.io - Stanford University
  • su.pr - StumbleUpon
  • sx3.se - swedishstartupspace.se
  • t.co - Twitter
  • tmblr.co - Tumblr
  • uoft.me - University of Toronto
  • upl.nu - Ung Pirat (Youth Pirate Party, Sweden)
  • wapo.st - Washington Post
  • wp.me - Wordpress.com
  • y.ahoo.it - Yahoo (shutting down)
  • youtu.be - YouTube
bit.ly aliases
  • 1.usa.gov - USA Government
  • 4sq.com - Foursquare
  • aje.me - Aljazeera
  • amzn.to - Amazon
  • atfp.co - Foreign Policy
  • bbc.in - BBC
  • bbnew.be
  • bbybgrl.com
  • binged.it - Bing (bonus points for being longer than bing.com)
  • bnkrpt.am - Bankrupting America
  • bzfd.it - Buzzfeed
  • cb.com - Career Builder
  • chzb.gr - Cheezeburger
  • cmplx.it - Complex Magazine
  • cnet.co - CNET
  • cnnmon.ie - CNN Money
  • conta.cc - Constant Contact Inc.
  • cot.ag
  • cpurl.net - Current Photographer.com
  • curbed.cc - Curbed.com
  • dag.gy -
  • dennysd.in - Denny's Restaurants
  • dtoid.it - Destructoid
  • econ.st - The Economist
  • emarketee.rs
  • engri.sh - Engrish.com
  • eonli.ne
  • es.pn - ESPN
  • fakes.pn
  • fanpa.ge - Fanpage.it
  • gaw.kr - Gawker
  • geekiss.im
  • grd.to - The Grid TO
  • grn.bz
  • gtg.lu - GetGlue
  • hoblu.es
  • hub.am
  • huff.to - Huffington Post
  • ift.tt
  • j.mp - bit.ly[6]
  • jrnl.to - thejournal.ie
  • kck.st - Kickstarter
  • marsdd.it - MaRS Discovery District
  • mbist.ro
  • mwne.ws
  • ncl.uz
  • nyti.ms - New York Times
  • onforb.es - Forbes
  • onion.com - The Onion
  • pops.ci - Popular Science
  • popu.pe
  • psxs.us
  • read.bi - Business Insider
  • rseo.co - realseo
  • s831.us - Studio831 - whatever that is
  • sbn.to - sbnation
  • shr.li
  • skygrid.me - SkyGrid
  • slackers.co - slackers.com
  • squid.us - Laughing Squid
  • s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly
  • stay.am
  • stjo.es - St. Joseph Media
  • tag.my
  • tcrn.ch - Techcrunch
  • theatln.tc - The Atlantic
  • tnw.co - The Next Web
  • tom.hn
  • toms.sh - TOMS Shoes
  • txpr.de - TexasStore
  • unr.ly - Unruly media
  • usat.ly - USA Today Newspaper
  • vrge.co - The Verge
  • zite.to - Zite

Dead or Broken

  • 1link.in - Website dead
  • 6url.com - HTML redirect, Error 500
  • ad.vu - mirror of adjix.com, application not found
  • bacn.me
  • bwtm.co - DNS fails to resolve.
  • calyp.co - Server error. 403 - Forbidden: Access is denied.
  • canurl.com - Website dead
  • chod.sk - Appears non-incremental, not resolving
  • da.co - Parked.
  • digg.com - discontinued - [2]
  • dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041
  • easy.tc - DNS not resolving.
  • easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3
  • eqent.me - Improper redirect to bitly.
  • feedzil.la - Domain parked.
  • go2cut.com - Website dead
  • gob.li - Golbin Ridge Limited. Timed out
  • gonext.org - not resolving
  • go.to - sold its domains on Sedo apparently.
  • hashonomy.com - Timed out
  • htcdev.net - DNS not resolving.
  • iawtp.me - DNS not resolving
  • icymi.me - DNS not resolving
  • imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve.
  • inspr.in - Inspired Beta. Can't find server
  • ix.it - Not resolving
  • jijr.com - Doesn't appear to be a shortener, now parked
  • jump.to - dead as of February 1, 2013
  • kissa.be - "Kissa.be url shortener service is shutdown"
  • kl.am - "kl.am Closes its Shell"
  • kurl.us - Parked.
  • lk.to
  • lnkurl.com - Website dead
  • marv.ly - DNS fails to resolve.
  • mash.to - Cannot connect.
  • memurl.com - Pronounceable. Broken.
  • me.lt - Connection refused.
  • mens.hm - Not responding (timeout)
  • miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."
  • minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead
  • minim.in - Times out
  • minurl.org - Presently in ERROR 404
  • ms.me - Parked.
  • muhlink.com - Not resolving
  • myurl.us - cpanel frontend
  • myv.bz - Not resolving
  • nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters
  • onvzi.com - DNS fails to resolve.
  • otf.me - Empty WordPress site
  • ping.fm - Fails to resolve.
  • pln.so - Not working.
  • plzretwt.me - Fails to resolve.
  • pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc
  • pulsene.ws - Expired. Parked by GoDaddy.
  • qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.
  • re.ad - Fails to resolve.
  • redirx.com - Lowercase alpha only, appears sequential or guessable - Ex: http://redirx.com/?wyok. Website still online but does not resolve existing URLs nor does it allow creating new ones (responds with the message: blame the spammers)
  • see.sc - Fails to resolve.
  • s.me - Domain parked.
  • s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.
  • shortlinks.co.uk - Working again. Maybe not.
  • short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp
  • shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.
  • shrtn.us - myshorturls.appspot.com. 404, does not resolve
  • simurl.com - Doesn't appear guessable - Ex: http://simurl.com/panpes. Website is blank; does not resolve URLs ("This SimURL is now inactive")
  • smf.is - DNS not resolving.
  • sq.com - Now redirects to Singapore Airlines.
  • tiny.ly - DNS not resolving.
  • traceurl.com - DNS fails to resolve.
  • tr.im (1st generation) - "Be back soon!"
  • twixar.com - "Estamos fora do ar por algum tempo, mas estamos trabalhando para voltar a oferecer o serviço para encurtar URLs longa em breve!"
  • twthpr.co - DNS not resolving.
  • twitpwr.com - Domain parked.
  • u.mavrev.com - Stopped accepting new urls. Now times out
  • u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)
  • url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."
  • urlborg.com - 404 Not Found.
  • urlcover.com - Domain parked.
  • urlhawk.com - Domain parked.
  • url-press.com - Suspended by web host.
  • urlsinn.com - DNS not resolving.
  • urlsmash.com - DNS not resolving.
  • urltea.com - Dreamhost's coming soon page.
  • urlvi.be - Domain parked.
  • urlx.org - Owner has agreed to share his database
  • vibemag.co - Vibe Magazine. Times out
  • vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.
  • w3t.org - 403 Forbidden.
  • wlink.us - Domain parked.
  • wl.tl - DNS not resolving.
  • xaddr.com - Domain parked.
  • xil.in - Under construction.
  • x.se - Cannot resolve, but www.x.se works.
  • xym.kr - Gibberish (?) Korean text blog.
  • yweb.com - Suspicious iframe with long url and fake loading gif image.
  • zi.ma - DNS not resolving.
  • zip.sm - was a redirect to joturl.com. Now times out

Discontinued

  • urlbrief.com - co-operates with 301Works.org

Hueg list

[3]

References

  1. http://github.com/chronomex/urlteam
  2. http://github.com/chronomex/urlteam
  3. http://github.com/chronomex/urlteam
  4. http://github.com/chronomex/urlteam
  5. http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html
  6. http://blog.bitly.com/post/179664996/go-ahead-and-j-mp

Weblinks


[view]  [edit]                   Archive Team                  
Current events Alive... OR ARE THEY · Deathwatch · Projects · Download available archives
Archiveteam.jpg
Archiving projects Archive.is · BetaArchive · Internet Archive · It Died · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES
The Dead, the Dying & The Damned · UK Web Archive · WebCite
Blogs/Web hostings Angelfire · Blogger · Blogster · EtherPad · FortuneCity · Free ProHosting · Fuelmyblog · GeoCities (patch) · Google Sites · Jux · LiveJournal · My Opera · Open Diary · Posterous · Prodigy.net · Proust · Splinder · Tripod · Vox · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd
Corporations Apple · IBM · Google · Microsoft · Yahoo!
Events Arab Spring · Occupy movement · Spanish Revolution
Font Repos Google Web Fonts · GNU FreeFont · Fontspace
Image hosting services Cameroid · Flickr · Geograph Britain and Ireland · ImageShack · Imgur · Instagr.am · Panoramio · Photobucket · Picasa · Picplz · Ptch · puu.sh · Snapjoy · TwitPic · Wikimedia Commons
Knowledge/Wikis arXiv · Citizendium · Edit.This · Encyclopedia Dramatica · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books · Insurgency Wiki · Knol · Nupedia · OpenCourseWare · OpenStreetMap · Project Gutenberg · Puella Magi · Referata · SongMeanings · ShoutWiki · The Internet Movie Database · The Pirate Bay · TropicalWikis · Urban Dictionary · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia · Wikispaces · Wik.is · Wiki-Site · WikiTravel
Microblogging Identi.ca · Jaiku · Plurk · Sina Weibo · Tumblr · Twitter · TwitLonger
Music/Audio Audimated.com · digCCmixter · Dogmazic.net · Free Music Archive · Gogoyoko · Indaba Music · Jamendo · Last.fm · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Twaud.io
People Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project
Q&A Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Expers Exchange · GirlsAskGuys · Google Answers · Google Questions and Answers · JustAnswer · MetaFilter · Quora · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers
Social bookmarking Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · Microsoft TechNet · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Squidoo · StumbleUpon · Twine · Vizited · Yummymarks · Xmarks · Zootool · Zotero
Social networks Bebo · BlackPlanet · Classmates.com · Cyworld · deviantART · Dopplr · douban · Facebook · Flixster · Friendster · Gaia Online · Google+ · Habbo · hi5 · Hyves · LinkedIn · mixi · MyHeritage · MyLife · Myspace · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Tagged · Viadeo · Vkontakte · WeeWorld · Wretch · more sites...
Software Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHub · Gitorious · Gna! · Google Code · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · mozdev · OSOR.eu · OW2 Consortium · Openmoko · Ourproject.org · Project Kenai · RubyForge · SEUL.org · SourceForge · tigris.org · Transifex · TuxFamily
Video hosting services Academic Earth · Blip.tv · Google Video · Justin.tv · TED Talks · Ustream · Viddler · Vimeo · Yahoo! Video · YouTube
Other 4chan · April Fools' Day · Amplicate · Circavie · Co.mments · Dmoz · Electronic Frontier Foundation · Feedly · Ficlets · FriendFeed · Gopher · Google Books Ngram · Google Reader · IFTTT · isoHunt · MegaUpload · MyBlogLog · Pastebin · Propeller.com · Quantcast · Salon Table Talk · SOPA blackout pages · World Wide Web · Yahoo! Buzz · Yahoo! Groups
Teams Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam
About Archive Team Introduction · Philosophy · Who We Are · Why Back Up? · Software · Films and documentaries about archiving · Formats · Cheap storage · Storage Media · Recommended Reading · FAQ
Personal tools