Difference between revisions of "Talk:Mozilla Addons"
(Add new message) |
(Add parameters to WARC download in Wget command) |
||
Line 5: | Line 5: | ||
To download the links you can use the following command: | To download the links you can use the following command: | ||
<code>wget | <code>wget --header "Cookie: mamo=off" -k -x -e robots=off -H -o messages.txt -nc -i "mozilla addons url list.txt" --mirror --warc-file="addons.mozilla.org</code> | ||
Explanation of some parameters: | Explanation of some parameters: | ||
* --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style. | * --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style. | ||
* -k: Makes links in the downloaded HTML or CSS point to local files. | * -k: Makes links in the downloaded HTML or CSS point to local files. |
Latest revision as of 03:26, 30 September 2018
A few days ago I commented in the chat about some scripts in which I was working to download Mozilla extensions and related pages. I uploaded the code to https://github.com/aaferrari/amo-links-getter for those interested in using it or integrating it with the Warrior.
After several days of running I got a list with more than a million links to download (the list is here) but I do not think I can download everything before Mozilla deactivates/deletes the classic extensions, so I would like more people to also take care of this.
To download the links you can use the following command:
wget --header "Cookie: mamo=off" -k -x -e robots=off -H -o messages.txt -nc -i "mozilla addons url list.txt" --mirror --warc-file="addons.mozilla.org
Explanation of some parameters:
- --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style.
- -k: Makes links in the downloaded HTML or CSS point to local files.
- -x: Force the creation of directories.
- -H: Obtains external resources of a page (such as CSS files or images).
- -o: Save the output of the program to a file.
- -nc: Avoid overwriting already downloaded files.