Difference between revisions of "Talk:Mozilla Addons"

From Archiveteam
Jump to: navigation, search
(Add new message)
 
(Add parameters to WARC download in Wget command)
 
Line 5: Line 5:
 
To download the links you can use the following command:
 
To download the links you can use the following command:
  
<code>wget -c --header "Cookie: mamo=off" -k -x -e robots=off -H -o messages.txt -nc -i "mozilla addons url list.txt"</code>
+
<code>wget --header "Cookie: mamo=off" -k -x -e robots=off -H -o messages.txt -nc -i "mozilla addons url list.txt" --mirror --warc-file="addons.mozilla.org</code>
  
 
Explanation of some parameters:
 
Explanation of some parameters:
* -c: To resume the download.
 
 
* --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style.
 
* --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style.
 
* -k: Makes links in the downloaded HTML or CSS point to local files.
 
* -k: Makes links in the downloaded HTML or CSS point to local files.

Latest revision as of 03:26, 30 September 2018

A few days ago I commented in the chat about some scripts in which I was working to download Mozilla extensions and related pages. I uploaded the code to https://github.com/aaferrari/amo-links-getter for those interested in using it or integrating it with the Warrior.

After several days of running I got a list with more than a million links to download (the list is here) but I do not think I can download everything before Mozilla deactivates/deletes the classic extensions, so I would like more people to also take care of this.

To download the links you can use the following command:

wget --header "Cookie: mamo=off" -k -x -e robots=off -H -o messages.txt -nc -i "mozilla addons url list.txt" --mirror --warc-file="addons.mozilla.org

Explanation of some parameters:

  • --header: It allows to download the pages with the classic design. This is optional but it facilitates the parsing (if necessary) and I am not sure that the pages render correctly once they are downloaded with the new style.
  • -k: Makes links in the downloaded HTML or CSS point to local files.
  • -x: Force the creation of directories.
  • -H: Obtains external resources of a page (such as CSS files or images).
  • -o: Save the output of the program to a file.
  • -nc: Avoid overwriting already downloaded files.

--Swicher (talk) 00:34, 30 September 2018 (UTC)