Difference between revisions of "Chromebot"

Revision as of 22:01, 26 April 2019

chromebot is an IRC bot parallel to ArchiveBot that uses Google Chrome and thus is able to archive JavaScript-heavy websites. Both, software and bot, are maintained by User:PurpleSymphony. WARCs are uploaded daily to the chromebot collection on archive.org.

By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A dashboard is available for watching the progress of such jobs.

Usage^[1]

You can call chromebot on the #archivebot (on hackint) IRC channel, which chromebot shares with it's parent ArchiveBot. Both “chromebot” and “chromebot:” work, with or without the colon. The username can be autocompleted using the “↹Tab” key in the EFNet web chat interface or IRC client.

Command	Description
`chromebot: a <uuid> chromebot a <uuid>`	Archive <url> with <concurrency> processes according to recursion <policy>.
`chromebot: s <uuid>` `chromebot s <uuid>`	Get job status for <uuid>.
`chromebot: r <uuid>` `chromebot r <uuid>`	Revoke or abort running job with <uuid>.

Please note that the commands are case-sensitive.

Restrictions

Instagram.com

ChromeBot has been blacklisted by Instagram, a website infamous for being an archival loophole.

When trying to archive any Instagram.com website, chromebot responds with the following error:

<Instagram.com URL> cannot be queued: Banned by Instagram

One way to bypass Instagram's restrictions partially is using Insta-Stalker-com, which is just a third-party web viewer for Instagram, equipped with an AJAX-free user search feature and the ability to view profiles without Instagram's new Web-App-type website (similar to Twitter Lite) that made Instagram inaccessible to the Wayback Machine and Archive.Today's crawlers. The former gets stuck in an infinite refresh loop.

URL format:

Search URL: https://insta-stalker.com/search/?q=Search+Term+here
User URL (example): https://insta-stalker.com/profile/SamsungMobile/

References

↑ ChromeBot usage documentation on GitHub

[usage-1] ChromeBot usage documentation on GitHub

[1]

@@ Line 4: / Line 4: @@
 == Usage<ref name=usage>[https://github.com/PromyLOPh/crocoite/blob/184189f0a535996edca01a68182ed07d32e26e9c/README.rst#IRC-bot ChromeBot usage documentation on GitHub]</ref> ==
-You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with it's parent [[ArchiveBot]]. Both “<code>chromebot</code>” and “<code>chromebot:</code>” work, with or without the colon. The username can be autocompleted using the “Tab” key in the EFNet web chat interface or IRC client.
+You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with it's parent [[ArchiveBot]]. Both “<code>chromebot</code>” and “<code>chromebot:</code>” work, with or without the colon. The username can be autocompleted using the “<kbd>↹</kbd>Tab” key in the EFNet web chat interface or IRC client.
 {| class="wikitable"
@@ Line 26: / Line 26: @@
 When trying to archive any Instagram.com website, chromebot responds with the following error:
   ''<Instagram.com URL> cannot be queued: Banned by Instagram''
+One way to bypass Instagram's restrictions partially is using [http://Insta-Stalker.com/ Insta-Stalker-com], which is just a third-party web viewer for Instagram, equipped with an AJAX-free user search feature and the ability to view profiles without Instagram's new Web-App-type website (similar to [https://mobile.twitter.com/ Twitter Lite]) that made Instagram inaccessible to the [[Wayback Machine]] and [[Archive.Today]]'s crawlers. The former gets stuck in an infinite refresh loop.
+'''URL format:'''
+* Search URL: https://insta-stalker.com/search/?q=<code>Search+Term+here</code>
+* User URL (example): https://insta-stalker.com/profile/SamsungMobile/
 == References ==
 <references />

Difference between revisions of "Chromebot"

Revision as of 22:01, 26 April 2019

Contents

Usage^[1]

Restrictions

Instagram.com

References

Navigation menu

Difference between revisions of "Chromebot"

Revision as of 22:01, 26 April 2019

Usage[1]

Restrictions

Instagram.com

References

Navigation menu

Search

Usage^[1]