PANDA
PANDA is a tool for dumping a website from a sitemap, an URL Shortener, or any website where content is ordered (alpha)numerically.
PANDA itself is divided into three executables:
- PANDA-DL, which dumps all URLs in a text file, separated by Unix newlines ("\n").
- PANDA-SH (incomplete), which dumps URL shortening websites and other websites where URLs are numerically (or alphabetically) ordered (i.e. "00000" to "zzzzz").
- PANDA-SP, which parses a sitemap and outputs a list of URLs usable by PANDA-DL.
The first two are only compatible with Linux and (maybe) macOS; if you want to run them on Windows, you'll have to use Cygwin or WSL.
Usage
PANDA-DL
To use PANDA-DL you must specify a file containing full URLs and (optionally) a number representing how many URLs to process at the same time.
After you start the program, it will start dumping the URLs, 16 at a time.
When dumping an URL, PANDA-DL will use the "wget" command and download both the WARC and regular file.
Bugs
As of Version A, PANDA-DL supports up to 1000000 lines per file.
PANDA-SP
PANDA-SP if written in Python; to use it, specify one or more XML Sitemap files, and PANDA-SP will automatically print out all URLs in it.