URLTeam

From Archiveteam
Revision as of 11:53, 4 March 2013 by DukeNukem (talk | contribs) (bit.ly aliases)
Jump to: navigation, search
Urlteam
URLTeam logo
url shortening was a fucking awful idea
url shortening was a fucking awful idea
URL http://urlte.am
Project status Online!
Archiving status In progress...
Project source https://github.com/ArchiveTeam/urlteam-stuff
Project tracker http://tracker.tinyarchive.org/
IRC channel #urlteam (on EFnet)
Project lead Unknown

TinyURL, bit.ly and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.

Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see Wikipedia: Link Rot). Archive.org/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member bit.ly does not actually share their databases and most other big shorteners don't share theirs either.

Who did this?

You can join us in our IRC channel: #urlteam on EFNet

301Work cooperation

301works logo.jpg

The fine folks at archive.org have provides us with upload permissions to the 301Works archive: http://www.archive.org/details/301utm. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).

Tools

TinyBack

The easiest way to help with scraping is to run the Warrior and select the URLTeam project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:

 git clone https://github.com/soult/tinyback
 cd tinyback
 # Use ./run.py --help for more information on command-line options
 ./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180

URL shorteners

New table

The new table includes shorteners we have already started to scrape.

Name Est. number of shorturls Scraping done by Status Comments
Tinyurl.com 1,000,000,000 Warrior scraping: sequential, <= 6 characters new shorturls: non-sequential, 7 characters
Bit.ly 4,000,000,000 Warrior scraping: non-sequential, 6 characters new shorturls: non-sequential, 6 characters
Goo.gl ? User:Scumola started (2011-03-04) goo.gl throttles pulls
is.gd 810,264,745 (2013-01-30) Warrior scraping: sequential, <= 5 characters new shorturls: non-sequential, 6 characters
ff.im ? User:Chronomex only used by FriendFeed, no interface to shorten new URLs
4url.cc 1279 (2009-08-14)[1] User:Chronomex dead (2011-02-15)
litturl.com 17096 (2010-04-15)[2] User:Chronomex dead (2010-11-18)
xs.md 3084 (2009-08-15)[3] User:Chronomex done dead (2010-11-18)
url.0daymeme.com 14867 (2009-08-14)[4] User:Chronomex done dead (2010-11-18)
tr.im 1990425 User:Soult got what we could dead (2011-12-31)
adjix.com ? User:Jeroenz0r Already done: 00-zz, 000-zzz, 0000-izzz. case-insensitive, incremental
rod.gs ? User:Jeroenz0r Done: 00-ZZ, 000-2Qc case-sensitive, incremental, server can't keep up with all the requests.
biglnk.com ? User:Jeroenz0r Done: 0-Z, 00-ZZ, 000-ZZZ case-sensitive, incremental
go.to 60000 User:Asiekierka Done: ~45000 (go.to network links only: goto_dump.zip) no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less
Name Number of shorturls Scraping done by Status Comments

Alive

Last verified 2013-02-13. Original list last updated 2009-08-14 [5].

"Official" shorteners

  • goo.gl - Google
  • fb.me - Facebook
  • y.ahoo.it - Yahoo
  • youtu.be - YouTube
  • t.co - Twitter
  • post.ly - Posterous
  • wp.me - Wordpress.com
  • flic.kr - Flickr
  • lnkd.in - LinkedIn
  • su.pr - StumbleUpon
  • go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)
  • db.tt - DropBox
  • fnd.us - Fundrazr.com
  • shar.es - ShareThis
bit.ly aliases
  • amzn.to - Amazon
  • binged.it - Bing (bonus points for being longer than bing.com)
  • 1.usa.gov - USA Government
  • tcrn.ch - Techcrunch
  • nyti.ms - New York Times

Dead or Broken

  • x.se - Cannot resolve, but www.x.se works.
  • pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc
  • 1link.in - Website dead
  • 6url.com - HTML redirect, Error 500
  • ad.vu - mirror of adjix.com, application not found
  • canurl.com - Website dead
  • chod.sk - Appears non-incremental, not resolving
  • digg.com - discontinued - [1]
  • dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041
  • easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3
  • go2cut.com - Website dead
  • gonext.org - not resolving
  • imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve.
  • ix.it - Not resolving
  • jijr.com - Doesn't appear to be a shortener, now parked
  • jump.to - dead as of February 1, 2013
  • kissa.be - "Kissa.be url shortener service is shutdown"
  • kl.am - "kl.am Closes its Shell"
  • kurl.us - Parked.
  • lnkurl.com - Website dead
  • memurl.com - Pronounceable. Broken.
  • miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."
  • minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead
  • minurl.org - Presently in ERROR 404
  • muhlink.com - Not resolving
  • myurl.us - cpanel frontend
  • nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters
  • qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.
  • s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.
  • shortlinks.co.uk - Working again. Maybe not.
  • short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp
  • shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.
  • traceurl.com - DNS fails to resolve.
  • tr.im (1st generation) - "Be back soon!"
  • twitpwr.com - Domain parked.
  • u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)
  • url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."
  • urlborg.com - 404 Not Found.
  • urlcover.com - Domain parked.
  • urlhawk.com - Domain parked.
  • url-press.com - Suspended by web host.
  • urlsmash.com - DNS not resolving.
  • urltea.com - Dreamhost's coming soon page.
  • urlvi.be - Domain parked.
  • urlx.org - Owner has agreed to share his database
  • w3t.org - 403 Forbidden.
  • wlink.us - Domain parked.
  • xaddr.com - Domain parked.
  • xil.in - Under construction.
  • xym.kr - Gibberish (?) Korean text blog.
  • yweb.com - Suspicious iframe with long url and fake loading gif image.
  • zi.ma - DNS not resolving.

Discontinued

  • urlbrief.com - co-operates with 301Works.org

Hueg list

[2]

References

Weblinks