Difference between revisions of "Software"
Jump to navigation
Jump to search
(→General Tools: Added WiLiSe) |
|||
Line 12: | Line 12: | ||
* [http://scrapy.org/ Scrapy] - Fast python library for web scraping | * [http://scrapy.org/ Scrapy] - Fast python library for web scraping | ||
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places | * [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places | ||
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]). | |||
== Hosted tools == | == Hosted tools == |
Revision as of 06:35, 27 July 2011
General Tools
- GNU WGET
- Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"
- cURL
- HTTrack - HTTrack options
- Heritrix -- what archive.org use
- Pavuk -- a bit flaky, but very flexible
- http://warrick.cs.odu.edu/warrick.html
- Beautiful Soup - Python library for web scraping
- Scrapy - Fast python library for web scraping
- Splinter - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places
- WiLiSe WikiLink Search - Python script to get links to specific pages of a site through the search in a Wiki (MediaWiki-type) has the api.php accessible or extension LinkSearch enabled (the project is still very immature and at the moment the code is only available in this SVN repository).
Hosted tools
Pinboard is a convenient social bookmarking service that will archive copies of all your bookmarks for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.