From Archiveteam
Revision as of 18:53, 30 December 2019 by PurpleSymphony (talk | contribs) (People: add devops scripts)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

chromebot aka. crocoite is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. WARCs are uploaded twice a day to the chromebot collection on

By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.


crocoite usage documentation on GitHub

You can call chromebot on the #archivebot (on EFnet) IRC channel, which chromebot shares with ArchiveBot. Both “chromebot” and “chromebot:” work, with or without the colon.

Command Description

chromebot: a <url> -r <policy> -j <concurrency>

Archive <url> with <concurrency> processes according to recursion <policy>.
chromebot: s <uuid> Get job status for <uuid>.
chromebot: r <uuid> Revoke or abort running job with <uuid>.

Please note that the commands are case-sensitive.

URL lists can be archived using recursion, for example:

chromebot: a -r 1 -j 4

chromebot will assume all lines starting with http(s):// are valid links. Note that the list itself must be retured by the server as an *inline* document, not as a download (attachment).



chromebot has been blacklisted by Instagram. When trying to archive any website, chromebot responds with the following error:

< URL> cannot be queued: Banned by Instagram

Cloudflare DDoS protection

chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload (issue #13 on GitHub).


PurpleSym maintains software, scripts, pays the server bills and has administrative access. katocala is a server administrator.