Difference between revisions of "Mozilla Addons"

From Archiveteam
Jump to navigation Jump to search
(Add ArchiveBot job IDs)
(Add details on what my archival covered exactly due to requests in #outofammo)
Line 35: Line 35:
* There were two (proper) attempts to archive AMO through [[ArchiveBot]]. {{Job|4aa66jgox1pg1gp6gxzkgthiq}} ran from 2017-08-29 until early December 2017, and {{Job|xew9sjj59osltx5oyjr6n9rg}} was started on 2018-07-29 and vanished sometime in August 2018.
* There were two (proper) attempts to archive AMO through [[ArchiveBot]]. {{Job|4aa66jgox1pg1gp6gxzkgthiq}} ran from 2017-08-29 until early December 2017, and {{Job|xew9sjj59osltx5oyjr6n9rg}} was started on 2018-07-29 and vanished sometime in August 2018.
* All addon files (both from AMO for Firefox/Firefox Android and from addons.thunderbird.net for Thunderbird/Seamonkey) were downloaded by [[User:JustAnotherArchivist]] between 2018-09-14 and 2018-09-16.
* All addon files (both from AMO for Firefox/Firefox Android and from addons.thunderbird.net for Thunderbird/Seamonkey) were downloaded by [[User:JustAnotherArchivist]] between 2018-09-14 and 2018-09-16.
* The amo-links-getter list linked above is being downloaded through [[ArchiveBot]] as {{Job|akifc65k7kfhpdhfbveh79v1c}} (started on 2018-09-30).
* The amo-links-getter list linked above was downloaded through [[ArchiveBot]] as {{Job|akifc65k7kfhpdhfbveh79v1c}} (started on 2018-09-30, finished on 2018-10-07).
* The old, "classic desktop" AMO website – minus downloads and <code>src</code> parameter variations, but including version history, reviews, and API data – is being grabbed by [[User:JustAnotherArchivist]] since 2018-09-30.
* The old, "classic desktop" AMO website – minus downloads and <code>src</code> parameter variations, but including version history, reviews, and API data – is being grabbed by [[User:JustAnotherArchivist]] since 2018-09-30 (see [[#JustAnotherArchivist.27s_website_grab|below]] for details).
* A warrior project for the website is in preparation.
* A warrior project for the website is in preparation.
=== JustAnotherArchivist's website grab ===
General notes:
* Any URL starting with <code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/</code> redirects to a URL using the slug instead. Only the <code>ADDONID</code> URLs are listed below for brevity, but of course the redirect target with the slug was also grabbed in all cases.
* For all API resources, both the v3 and the v4 version was retrieved, but only the v3 URL is given below for brevity. Unless otherwise noted, you can simply replace <code>v3</code> with <code>v4</code> in those URLs to get the v4 URL.
For all addon IDs between 0 and 1009999 (largest existing ID as of 2018-10-13 is 1003947), these URLs are covered:
* addon detail API endpoint (<code>https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/</code>)
* addon page (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/</code>)
** This URL may redirect to addons.thunderbird.net for Thunderbird addons. In that case, all redirects on addons.mozilla.org are kept, but the addons.thunderbird.net page itself is not grabbed, and the addon is ignored.
** If this URL returns a 404 or another error (e.g. disabled addon), the addon is ignored.
* the "more" subpage which is loaded through JavaScript (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/more</code>, must be requested with the header <code>X-Requested-With: XMLHttpRequest</code>)
* the addon-specific images, i.e. icons (in both resolutions, 32x32 px and 64x64 px) and preview images (thumbnail and full resolution), extracted from both the page and the API response (just to be sure)
* addon detail API endpoint with the slug and/or the GUID instead of the addon ID if possible (i.e. if the slug and/or GUID could be determined)
* version history
** initial page (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/versions/</code>)
** pagination (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/versions/?page=N</code>; page=1 always retrieved even if there is no pagination)
** API endpoint (<code>https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/versions/</code> and <code>https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/versions/?page=1</code> + all following pages until the <code>next</code> field is empty/null)
* versions
** API endpoint for each version (<code>https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/versions/VERSIONID/</code>, where the version IDs were collected from the API history pagination)
** page redirect for each version (<code>https://addons.mozilla.org/en-US/firefox/addon/SLUG/versions/VERSIONSTRING</code>, collected during the pagination traversal on the website)
* reviews/ratings
** initial page + pagination as described above for the version history (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/reviews/[?page=N]</code>)
** API endpoint including further pages according to the <code>next</code> field (<code>https://services.addons.mozilla.org/api/v3/reviews/review/?addon=ADDONID</code> and <code>https://services.addons.mozilla.org/api/v4/ratings/rating/?addon=ADDONID</code>)
** API endpoint for each version of the addon + further pages according to <code>next</code> (<code>https://services.addons.mozilla.org/api/v3/reviews/review/?addon=ADDONID&version=VERSIONID</code>)
** individual review page (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/reviews/REVIEWID/</code>)
** individual review API endpoint (<code>https://services.addons.mozilla.org/api/v3/reviews/review/REVIEWID/</code> and <code>https://services.addons.mozilla.org/api/v4/ratings/rating/REVIEWID/</code>)
** page(s) for users who wrote multiple reviews for an addon (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/reviews/user:USERID</code>; also pagination with <code>?page=N</code> if available, though that doesn't seem to be the case anywhere)
* statistics
** page (<code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID/statistics/</code>)
** data (<code>https://addons.mozilla.org/en-US/firefox/addon/SLUG/statistics/DATASET-day-YEAR0101-YEAR1231.json</code>)
*** Here, <code>DATASET</code> was each of <code>('overview', 'apps', 'locales', 'os', 'versions', 'statuses', 'sources', 'downloads')</code>, and <code>YEAR</code> started from 2018 and went back until the returned data was empty.
* any other subpage of the addon which is linked on the addon page and starts with <code>https://addons.mozilla.org/en-US/firefox/addon/ADDONID|SLUG/</code>, e.g. privacy policy
* feature compatibility API endpoint (<code>https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/feature_compatibility/</code>)
* EULA and privacy policy API endpoint (<code>https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/eula_policy/</code>)
Furthermore, during the relevant stages above (addon page, "more", addon detail API endpoint, and reviews pages and API endpoints), usernames were extracted, and the user profiles were afterwards retrieved as well:
* user profile page using the username (<code>https://addons.mozilla.org/en-US/firefox/user/USERNAME/</code>)
* if it can be found on that page, the same thing with the user ID (<code>https://addons.mozilla.org/en-US/firefox/user/USERID/</code>; the abuse report button is used for extracting the user ID)
* avatar if provided (somewhere under <code>https://addons.cdn.mozilla.net/user-media/userpics/</code>)
* pagination for reviews, if necessary (<code>https://addons.mozilla.org/en-US/firefox/user/USERNAME/?page=N</code> and <code>https://addons.mozilla.org/en-US/firefox/user/USERID/?page=N</code>)


== References ==
== References ==
<references/>
<references/>

Revision as of 22:39, 13 October 2018

Mozilla Addons
Amo screenshot 2018-08-22.png
URL https://addons.mozilla.org/
Status Special case
Archiving status Saved! (addon files)
Upcoming... (website, warrior)
In progress... (website, JAA)
Archiving type Unknown
IRC channel #outofammo (on hackint)
Project lead User:Arkiver, User:JustAnotherArchivist

Mozilla Addons, also known as AMO (from its domain, addons.mozilla.org), is a website run by the Mozilla Foundation which hosts extensions and themes for Firefox, Thunderbird, and other Mozilla software.

Extensions used to be based on XPI until the introduction of WebExtensions around 2016. Since Firefox 57 and Thunderbird 58, only WebExtensions are supported. XPI-based addons (called "legacy") are deprecated but still supported until the end-of-life of Firefox 52 ESR in September 2018. The legacy addons will be removed from AMO in early October 2018[1][2].

Website structure

As of September 2018, there are two different versions of AMO: the old version, called "classic desktop" on the website, and a redesigned new site. The two mostly serve the same content; the most important difference is that the new site does not serve user profile pages for non-developers while the old site does. The switching between the two sites happens through a cookie called mamo (modern AMO?); when it is set to off, the old site is served; when it's on or unset, the new site is served.

AMO uses numeric IDs and slugs for addon identification. (GUIDs are also used, but only in the API and internally in Firefox.) These IDs are shared with Thunderbird and Seamonkey addons, which used to be hosted on AMO but have since been moved to addons.thunderbird.net (which only exists in the "old" form; there is a "view the new site" link in the footer, but it doesn't have any effect as of 2018-09-30).

To track addon installations, AMO uses a src parameter everywhere on the site. There are at least 59 possible values for this parameter[3].

Addon download links have the general format https://addons.mozilla.org/firefox/downloads/file/$FILEID/$FILENAME?src=$SRC. Note that file IDs are separate from addon and version IDs. The filename typically contains the slug and a version identifier. When AMO detects that you're using a version of Firefox that is incompatible with an addon, it displays a "download anyway" link, which in additiona contains a type:attachment path segment between the file ID and the filename (i.e. .../file/$FILEID/type:attachment/$FILENAME...). All download URLs redirect to a CDN at addons.cdn.mozilla.net; the type:attachment is also reflected in that CDN URL as _attachments (which then inserts a Content-Disposition header); the src parameter is not included in the redirect target.

Besides the actual addon files, AMO also hosts preview screenshots, reviews, version history (including changelogs), statistics, and in some cases additional pages (e.g. privacy policy) for each addon. The review page only displays the most recent review of any particular user, and one needs to follow an extra link to discover a user's earlier reviews for the same addon.

Note that AMO does not only host extensions but also themes. These consist simply of a JSON object which provides the URLs for the relevant images and some additional settings (e.g. text colour), i.e. there is no real download for them.

The AMO API versions 3 and 4 are documented here and here, respectively.

Utilities

  • amo-links-getter: Both Wget and the Warrior are ineffective in downloading the site completely (besides there are many redundant links that are not taken into account as redirects causing the same content to be downloaded several times). This is a set of scripts that store all the links in a SQLite database to be downloaded later.

Archival

  • There were two (proper) attempts to archive AMO through ArchiveBot. job:4aa66jgox1pg1gp6gxzkgthiq ran from 2017-08-29 until early December 2017, and job:xew9sjj59osltx5oyjr6n9rg was started on 2018-07-29 and vanished sometime in August 2018.
  • All addon files (both from AMO for Firefox/Firefox Android and from addons.thunderbird.net for Thunderbird/Seamonkey) were downloaded by User:JustAnotherArchivist between 2018-09-14 and 2018-09-16.
  • The amo-links-getter list linked above was downloaded through ArchiveBot as job:akifc65k7kfhpdhfbveh79v1c (started on 2018-09-30, finished on 2018-10-07).
  • The old, "classic desktop" AMO website – minus downloads and src parameter variations, but including version history, reviews, and API data – is being grabbed by User:JustAnotherArchivist since 2018-09-30 (see below for details).
  • A warrior project for the website is in preparation.

JustAnotherArchivist's website grab

General notes:

  • Any URL starting with https://addons.mozilla.org/en-US/firefox/addon/ADDONID/ redirects to a URL using the slug instead. Only the ADDONID URLs are listed below for brevity, but of course the redirect target with the slug was also grabbed in all cases.
  • For all API resources, both the v3 and the v4 version was retrieved, but only the v3 URL is given below for brevity. Unless otherwise noted, you can simply replace v3 with v4 in those URLs to get the v4 URL.

For all addon IDs between 0 and 1009999 (largest existing ID as of 2018-10-13 is 1003947), these URLs are covered:

Furthermore, during the relevant stages above (addon page, "more", addon detail API endpoint, and reviews pages and API endpoints), usernames were extracted, and the user profiles were afterwards retrieved as well:

References