Difference between revisions of "Patch.com"

From Archiveteam
Jump to navigation Jump to search
(Add link to IA collection)
 
(9 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Infobox project
{{Infobox project
| title = Patch.com
| image = Patch screenshot.png
| image = Patch screenshot.png
| description = Your neighborhood. Your news.
| description = Your neighborhood. Your news.
| URL = <nowiki>http://www.patch.com/</nowiki>
| URL = <nowiki>http://www.patch.com/</nowiki>
| project_status = {{closing}}
| project_status = {{specialcase}}
| source = https://github.com/ArchiveTeam/patch-grab
| source = https://github.com/ArchiveTeam/patch-grab
| archiving_status = {{inprogress}}
| archiving_status = {{saved}} - [https://archive.org/details/archiveteam_patch archives]
| irc = cabbagepatch
| irc = cabbagepatch
| irc_network = EFnet
| irc_abandoned = true
| tracker = [http://quilt.at.ninjawedding.org/patchy here]
| tracker = [http://quilt.at.ninjawedding.org/patchy here]
| data = {{IA collection|archiveteam_patch}}
}}
}}


'''Patch.com''' is a "hyperlocal" news community which is [http://www.webcitation.org/6IrUArBiV being downsized] from its current ~900 sites to ~500.
'''Patch.com''' is a "hyperlocal" news community which is [http://www.webcitation.org/6IrUArBiV being downsized] from its current ~900 sites to ~500.


=== Current status ===
== Current status ==


antomatic has prepared (what appears to be) a [[List_of_Patch.com_sites|complete list of sites]]A prototype seesaw project (no Warrior integration yet) also exists.
In progress.  Warrior integration coming soon.


There's what looks like a master site map (with links to sub-sitemaps) at http://www.patch.com/sitemaps.xml also.
== Patch.com will rate-limit you across all sites ==


=== Next steps ===
Patch.com institutes a rate-limit (some unknown hundreds of requests/hour) across all sites.  If you exceed this, all of your requests will be met with HTTP 420s.


Patch subdomains are (1) big and (2) appear to implement some sort of request cap per IP per unit time(You'll start getting HTTP 420s after a while.) We need to investigate whether we need to implement a complicated mechanism to split up individual sites and then megawarc them together, or just take each site slowly (e.g. n requests every hour).
If the patch-grab script detects these, it hard-aborts.  A kinder solution would be to sleep for some period of time (an hour?) and try again; suggestions appreciated.


Pop in the IRC channel if you want to help.
{{Navigation box}}

Latest revision as of 04:56, 25 January 2022

Patch.com
Your neighborhood. Your news.
Your neighborhood. Your news.
URL http://www.patch.com/
Status Special case
Archiving status Saved! - archives
Archiving type Unknown
Project source https://github.com/ArchiveTeam/patch-grab
Project tracker here
IRC channel #archiveteam-bs (on hackint)
(formerly #cabbagepatch (on EFnet))
Data[how to use] archiveteam_patch

Patch.com is a "hyperlocal" news community which is being downsized from its current ~900 sites to ~500.

Current status

In progress. Warrior integration coming soon.

Patch.com will rate-limit you across all sites

Patch.com institutes a rate-limit (some unknown hundreds of requests/hour) across all sites. If you exceed this, all of your requests will be met with HTTP 420s.

If the patch-grab script detects these, it hard-aborts. A kinder solution would be to sleep for some period of time (an hour?) and try again; suggestions appreciated.