Talk:Mozilla Addons

From Archiveteam
Revision as of 03:26, 30 September 2018 by Swicher (talk | contribs) (Add parameters to WARC download in Wget command)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A few days ago I commented in the chat about some scripts in which I was working to download Mozilla extensions and related pages. I uploaded the code to https://github.com/aaferrari/amo-links-getter for those interested in using it or integrating it with the Warrior.

After several days of running I got a list with more than a million links to download (the list is here) but I do not think I can download everything before Mozilla deactivates/deletes the classic extensions, so I would like more people to also take care of this.

To download the links you can use the following command:

wget --header "Cookie: mamo=off" -k -x -e robots=off -H -o messages.txt -nc -i "mozilla addons url list.txt" --mirror --warc-file="addons.mozilla.org

Explanation of some parameters:

  • --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style.
  • -k: Makes links in the downloaded HTML or CSS point to local files.
  • -x: Force the creation of directories.
  • -H: Obtains external resources of a page (such as CSS files or images).
  • -o: Save the output of the program to a file.
  • -nc: Avoid overwriting already downloaded files.

--Swicher (talk) 00:34, 30 September 2018 (UTC)