Difference between revisions of "Patch.com"

From Archiveteam
Jump to navigation Jump to search
Line 15: Line 15:
=== Current status ===
=== Current status ===


antomatic has prepared (what appears to be) a [[List_of_Patch.com_sites|complete list of sites]].  A prototype seesaw project (no Warrior integration yet) also exists.
A combo archive/spider script now exists, and is being testedIf it all goes well, we'll put this out on the Warrior.  Pop in the IRC channel if you'd like to be notified when we're ready to go.
 
There's what looks like a master site map (with links to sub-sitemaps) at http://www.patch.com/sitemaps.xml also.
 
Most/all(?) Patches seem to share similar directories and content structures - e.g. /news, /blogs, /boards, /events, /directory, /jobs, etc.
 
=== Next steps ===
 
Patch subdomains are (1) big and (2) appear to implement some sort of request cap per IP per unit time(You'll start getting HTTP 420s after a while.) We need to investigate whether we need to implement a complicated mechanism to split up individual sites and then megawarc them together, or just take each site slowly (e.g. n requests every hour).
 
Pop in the IRC channel if you want to help.

Revision as of 04:21, 19 August 2013

Patch.com
Your neighborhood. Your news.
Your neighborhood. Your news.
URL http://www.patch.com/
Status Closing
Archiving status In progress...
Archiving type Unknown
Project source https://github.com/ArchiveTeam/patch-grab
Project tracker here
IRC channel #cabbagepatch (on hackint)

Patch.com is a "hyperlocal" news community which is being downsized from its current ~900 sites to ~500.

Current status

A combo archive/spider script now exists, and is being tested. If it all goes well, we'll put this out on the Warrior. Pop in the IRC channel if you'd like to be notified when we're ready to go.