Difference between revisions of "Nifty"
Jump to navigation
Jump to search
(Mass-edit to update uses of Template:IRC) |
|||
(8 intermediate revisions by 4 users not shown) | |||
Line 4: | Line 4: | ||
| URL = [http://homepage.nifty.com/ homepage.nifty.com] | | URL = [http://homepage.nifty.com/ homepage.nifty.com] | ||
| description = Japanese ISP with web hosting | | description = Japanese ISP with web hosting | ||
| project_status = {{ | | project_status = {{Closed}} | ||
| archiving_status = {{ | | archiving_status = {{saved}} | ||
| source = https://github.com/ArchiveTeam/nifty-discovery | | source = https://github.com/ArchiveTeam/nifty-discovery | ||
| irc = niftyjanai | | irc = niftyjanai | ||
| irc_network = EFnet | |||
| irc_abandoned = true | |||
| lead = [[User:Sanqui]], [[User:DoomTay]] | |||
}} | }} | ||
Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016- | Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-11-10 15:00. {{url|http://homepage.nifty.com/information/2016/01/|Termination notice}} (Japanese) | ||
<pre> | <pre> | ||
Line 31: | Line 34: | ||
* On 2016-09-12, [[User:Sanqui]] harvested 8884 *.nifty.com URLs from Wikimedia sites using [[Site exploration#MediaWiki wikis|mwlinkscrape]] | * On 2016-09-12, [[User:Sanqui]] harvested 8884 *.nifty.com URLs from Wikimedia sites using [[Site exploration#MediaWiki wikis|mwlinkscrape]] | ||
* On 2016-09-13, root homepages were added to this list, making it 11423 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/wikimedia.txt. | * On 2016-09-13, root homepages were added to this list, making it 11423 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/wikimedia.txt. {{Job|21z8da69732jgmp4g6pn949p4}} | ||
* On 2016-09-15, Hatena bookmarks were scraped with [https://github.com/ArchiveTeam/nifty-discovery/blob/master/scrape_hatena.py a script] and derived, producing a list of 19973 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/hatena.txt. | * On 2016-09-15, Hatena bookmarks were scraped with [https://github.com/ArchiveTeam/nifty-discovery/blob/master/scrape_hatena.py a script] and derived, producing a list of 19973 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/hatena.txt. {{Job|3i04vcsil92hl80yxbxiimncn}} | ||
* On 2016-09-16, archive.is pages were scraped with [https://github.com/ArchiveTeam/nifty-discovery/blob/master/scrape_archive_is.py a script], derived and deduplicated, producing a list of mere 1165 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/archiveis.txt. | * On 2016-09-16, archive.is pages were scraped with [https://github.com/ArchiveTeam/nifty-discovery/blob/master/scrape_archive_is.py a script], derived and deduplicated, producing a list of mere 1165 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/archiveis.txt. {{Job|2bkvkya714zxqkity2cmw1w10}} | ||
* [[User:DoomTay]] has plucked more URLs from [http://e-shuushuu.net/wiki/index.php?title=Special:LinkSearch&target=http%3A%2F%2F%2A.nifty.com&limit=500&offset=0 e-shuushuu wiki] ( | * [[User:DoomTay]] has plucked more URLs from [http://e-shuushuu.net/wiki/index.php?title=Special:LinkSearch&target=http%3A%2F%2F%2A.nifty.com&limit=500&offset=0 e-shuushuu wiki] ({{Job|3spkhvzhep0azp811nk4zelw5}}) and from {{url|http://award.surpara.com/misssp/|Miss Surfersparadise}} ({{Job|ew3a0olovf2e2pq20ki2fwgra}}) | ||
* On 2016-09-23, almost 80 URLs were scraped from [[Portalgraphics.net]] artist data ( | * On 2016-09-23, almost 80 URLs were scraped from [[Portalgraphics.net]] artist data ({{Job|6gjq81kbvhhcjvf6v5z4ysv4i}}) | ||
* From 2016-09-23 to 2016-11-08 thousands more URLs were scraped from a mixture of sources ({{Job|83nkqxzrbuuojnol1yzz4katq}}, {{Job|de2s3en6ayvo8vtyy91vmc3re}}, {{Job|dvmhmomc7foe3t3mfbnqptgac}}, {{Job|1kpy7mk8a5glwq8ne7plb7a83}}, {{Job|3djy7ku5qhsdh9whcpnk6zkt}}, {{Job|ad9xia0mpn616k0bjjxss3zcd}}, {{Job|3xb4h934hh57p1u2pl2dd2qcu}}) | |||
Next steps | Next steps |
Latest revision as of 19:00, 31 October 2021
Nifty | |
Japanese ISP with web hosting | |
URL | homepage.nifty.com |
Status | Offline |
Archiving status | Saved! |
Archiving type | Unknown |
Project source | https://github.com/ArchiveTeam/nifty-discovery |
IRC channel | #archiveteam-bs (on hackint) (formerly #niftyjanai (on EFnet)) |
Project lead | User:Sanqui, User:DoomTay |
Japanese ISP providing web hosting. Will be closing about 140,000 unclaimed homepages by 2016-11-10 15:00. Termination notice[IA•Wcite•.today•MemWeb] (Japanese)
http://homepage1.nifty.com/USERNAME/ http://homepage2.nifty.com/USERNAME/ http://homepage3.nifty.com/USERNAME/
URL harvesting
Let's follow Site exploration.
<polm> One thing I would recommend is searching Hatena Bookmarks, which is like a Japanese free Pinboard <polm> Like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com <polm> the "of" query parameter paginates like so: http://b.hatena.ne.jp/entrylist?url=homepage2.nifty.com&of=20 <zout> there's some here. https://archive.is/homepage2.nifty.com
Progress
- On 2016-09-12, User:Sanqui harvested 8884 *.nifty.com URLs from Wikimedia sites using mwlinkscrape
- On 2016-09-13, root homepages were added to this list, making it 11423 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/wikimedia.txt. job:21z8da69732jgmp4g6pn949p4
- On 2016-09-15, Hatena bookmarks were scraped with a script and derived, producing a list of 19973 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/hatena.txt. job:3i04vcsil92hl80yxbxiimncn
- On 2016-09-16, archive.is pages were scraped with a script, derived and deduplicated, producing a list of mere 1165 URLs: https://raw.githubusercontent.com/ArchiveTeam/nifty-discovery/master/urls/archiveis.txt. job:2bkvkya714zxqkity2cmw1w10
- User:DoomTay has plucked more URLs from e-shuushuu wiki (job:3spkhvzhep0azp811nk4zelw5) and from Miss Surfersparadise[IA•Wcite•.today•MemWeb] (job:ew3a0olovf2e2pq20ki2fwgra)
- On 2016-09-23, almost 80 URLs were scraped from Portalgraphics.net artist data (job:6gjq81kbvhhcjvf6v5z4ysv4i)
- From 2016-09-23 to 2016-11-08 thousands more URLs were scraped from a mixture of sources (job:83nkqxzrbuuojnol1yzz4katq, job:de2s3en6ayvo8vtyy91vmc3re, job:dvmhmomc7foe3t3mfbnqptgac, job:1kpy7mk8a5glwq8ne7plb7a83, job:3djy7ku5qhsdh9whcpnk6zkt, job:ad9xia0mpn616k0bjjxss3zcd, job:3xb4h934hh57p1u2pl2dd2qcu)
Next steps
- GoogleScraper is no good. Make attempts at scraping, Bing, Twitter using hints on Site exploration
- Put chunks of up to 100k URLs onto high speed (20160911.01) ArchiveBot pipelines