Difference between revisions of "Nifty"

Revision as of 15:49, 16 January 2017

Nifty
Japanese ISP with web hosting
URL	homepage.nifty.com
Status	Offline
Archiving status	Saved!
Archiving type	Unknown
Project source	https://github.com/ArchiveTeam/nifty-discovery
IRC channel	#niftyjanai (on hackint)

Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-11-10 15:00. Termination notice^{[IA•Wcite•.today•MemWeb]} (Japanese)

http://homepage1.nifty.com/USERNAME/
http://homepage2.nifty.com/USERNAME/
http://homepage3.nifty.com/USERNAME/

MOTHERFUCKER ! ! !

@@ Line 18: / Line 18: @@
 </pre>
-== URL harvesting ==
+== '''MOTHERFUCKER ! ! !''' ==
-Let's follow [[Site exploration]].
-<pre>
+== '''MOTHERFUCKER ! ! !''' ==
-<polm> One thing I would recommend is searching Hatena Bookmarks, which is like a Japanese free Pinboard
-<polm> Like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com
-<polm> the "of" query parameter paginates like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com&of=20
-<zout> there's some here. https://archive.is/homepage2.nifty.com
-</pre>
-=== Progress ===
+== '''MOTHERFUCKER ! ! !''' ==
-* On 2016-09-12, [[User:Sanqui]] harvested 8884 *.nifty.com URLs from Wikimedia sites using [[Site exploration#MediaWiki wikis|mwlinkscrape]]
-* On 2016-09-13, root homepages were added to this list, making it 11423 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/wikimedia.txt.  ArchiveBot job ident <tt>21z8da69732jgmp4g6pn949p4</tt>
-* On 2016-09-15, Hatena bookmarks were scraped with [https://github.com/ArchiveTeam/nifty-discovery/blob/master/scrape_hatena.py a script] and derived, producing a list of 19973 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/hatena.txt.  ArchiveBot job ident <tt>3i04vcsil92hl80yxbxiimncn</tt>
-* On 2016-09-16, archive.is pages were scraped with [https://github.com/ArchiveTeam/nifty-discovery/blob/master/scrape_archive_is.py a script], derived and deduplicated, producing a list of mere 1165 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/archiveis.txt.  ArchiveBot job ident <tt>2bkvkya714zxqkity2cmw1w10</tt>
-* [[User:DoomTay]] has plucked more URLs from [http://e-shuushuu.net/wiki/index.php?title=Special:LinkSearch&target=http%3A%2F%2F%2A.nifty.com&limit=500&offset=0 e-shuushuu wiki] (ArchiveBot job ident <tt>3spkhvzhep0azp811nk4zelw5</tt>) and from {{url|http://award.surpara.com/misssp/|Miss Surfersparadise}} (ArchiveBot job ident <tt>ew3a0olovf2e2pq20ki2fwgra</tt>)
-* On 2016-09-23, almost 80 URLs were scraped from [[Portalgraphics.net]] artist data (ArchiveBot job ident <tt>6gjq81kbvhhcjvf6v5z4ysv4i</tt>)
-* From 2016-09-23 to 2016-11-08 thousands more URLs were scraped from a mixture of sources (Archivebot job idents <tt>83nkqxzrbuuojnol1yzz4katq, de2s3en6ayvo8vtyy91vmc3re, dvmhmomc7foe3t3mfbnqptgac, 1kpy7mk8a5glwq8ne7plb7a83, 3djy7ku5qhsdh9whcpnk6zkt, ad9xia0mpn616k0bjjxss3zcd, 3xb4h934hh57p1u2pl2dd2qcu</tt>)
-Next steps
-* GoogleScraper is no good.  Make attempts at scraping, Bing, Twitter using hints on [[Site exploration]]
-* Put chunks of up to 100k URLs onto high speed (20160911.01) ArchiveBot pipelines

Difference between revisions of "Nifty"

Revision as of 15:49, 16 January 2017

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

Navigation menu

Search