Nifty

From Archiveteam
Revision as of 14:18, 15 September 2016 by Sanqui (talk | contribs) (homepage1 not homepage)
Jump to navigation Jump to search
Nifty
Japanese ISP with web hosting
Japanese ISP with web hosting
URL homepage.nifty.com
Status Closing
Archiving status Not saved yet
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-09-29. Termination notice[IAWcite.todayMemWeb] (Japanese)

http://homepage1.nifty.com/USERNAME/
http://homepage2.nifty.com/USERNAME/
http://homepage3.nifty.com/USERNAME/

URL harvesting

Let's follow Site exploration.

<polm> One thing I would recommend is searching Hatena Bookmarks, which is like a Japanese free Pinboard
<polm> Like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com
<polm> the "of" query parameter paginates like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com&of=20
<zout> there's some here. https://archive.is/homepage2.nifty.com

Progress

  • On 2016-09-12, User:Sanqui harvested 8884 *.nifty.com URLs from Wikimedia sites using mwlinkscrape
  • On 2016-09-13, root homepages were added to this list, making it 11423 URLs: https://sanqui. rustedlogic.net/etc/archiveteam/nifty_wikimedia_sites_fix.txt. ArchiveBot job ident 21z8da69732jgmp4g6pn949p4

Next steps

  • GoogleScraper is no good. Make attempts at scraping, Bing, Twitter using hints on Site exploration
  • Scrape hatena
  • Scrape archive.is
  • Put chunks of up to 100k URLs onto high speed (20160911.01) ArchiveBot pipelines