Difference between revisions of "Template:CTA URL lists"
Jump to navigation
Jump to search
(Clarify regex type and add comment on case-insensitive matching) |
Switchnode (talk | contribs) (move category link to hat position; tighten up prose) |
||
Line 1: | Line 1: | ||
<includeonly>== How to help if you have lists of URLs == | <includeonly>== How to help if you have lists of URLs == | ||
: ''For other ArchiveTeam projects that can use this kind of help, see [[:Category:Projects requiring URL lists|Projects requiring URL lists]].'' | |||
This project requires lists of URLs for content on the target website. If you have a source of URLs, please: | This project requires lists of URLs for content on the target website. If you have a source of URLs, please: | ||
{{ #if: {{{regex|}}} | | {{ #if: {{{regex|}}} | | ||
# Use the PCRE regular expression <code>{{{regex}}}</code> for filtering.{{ #if: {{{broad|}}} | | # Use the PCRE regular expression <code>{{{regex}}}</code> for filtering.{{ #if: {{{broad|}}} | | ||
#* Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.}} | #* Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern, as it may miss valid URLs. We can always filter or transform the results as needed later.}} | ||
#* Enable case-insensitive matching (e.g. <code>-i | #* Enable case-insensitive matching (e.g. grep's <code>-i</code>) to catch URLs with capitalization. | ||
#* If | #* If using grep or similar, enable text matching (<code>-a</code> or <code>--text</code>) to catch URLs in files with apparent binary data. | ||
#* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>}} | #* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>}} | ||
# If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>. | # If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>. | ||
# Upload the file to https://transfer.archivete.am/. | # Upload the file to https://transfer.archivete.am/. | ||
# Share the resulting URL in the project IRC channel. | # Share the resulting URL in the project IRC channel. | ||
#* If you | #* If you wish your list to remain private, please get in touch with a channel op (e.g. [[User:Arkiver|arkiver]] or [[User:JustAnotherArchivist|JustAnotherArchivist]]). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.{{ #if: {{{suppresscategory|}}} ||[[Category:Projects requiring URL lists]]}}</includeonly><noinclude> | ||
Options: | Options: | ||
Revision as of 00:06, 4 January 2024
Options:
regex
, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command- Technically, this isn't actually required, but only for use on URLs.
broad
, optional, adding an extra bit about the regex being intentionally broad if non-empty
Example:
{{CTA URL lists|regex = <nowiki>\S*(foo|bar)\S*</nowiki>|broad = yes}}
renders as:
How to help if you have lists of URLs
- For other ArchiveTeam projects that can use this kind of help, see Projects requiring URL lists.
This project requires lists of URLs for content on the target website. If you have a source of URLs, please:
- Use the PCRE regular expression
\S*(foo|bar)\S*
for filtering.- Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern, as it may miss valid URLs. We can always filter or transform the results as needed later.
- Enable case-insensitive matching (e.g. grep's
-i
) to catch URLs with capitalization. - If using grep or similar, enable text matching (
-a
or--text
) to catch URLs in files with apparent binary data. - Example command (GNU grep):
grep -Pahoi '\S*(foo|bar)\S*' FILENAME FILENAME...
- If the output exceeds a few megabytes, please compress it, preferably using
zstd -10
. - Upload the file to https://transfer.archivete.am/.
- Share the resulting URL in the project IRC channel.
- If you wish your list to remain private, please get in touch with a channel op (e.g. arkiver or JustAnotherArchivist). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.