Difference between revisions of "Software"
Jump to navigation
Jump to search
TheTechRobo (talk | contribs) m (improve formatting) |
m (Updated annual prices of Pinboard) |
||
Line 19: | Line 19: | ||
== Hosted tools == | == Hosted tools == | ||
* [https://pinboard.in/ Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $ | * [https://pinboard.in/ Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $22/year, or $39/year if you want the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry. | ||
* [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of July 2019. | * [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of July 2019. | ||
Latest revision as of 21:59, 21 November 2021
WARC Tools
The WARC Ecosystem has information on tools to create, read and process WARC files.
General Tools
- GNU WGET
- Backing up a Wordpress site:
wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"
- Backing up a Wordpress site:
- cURL
- HTTrack - HTTrack options
- Pavuk -- a bit flaky, but very flexible
- Warrick - Tool to recover lost websites using various online archives and caches.
- Beautiful Soup - Python library for web scraping
- Scrapy - Fast python library for web scraping
- snscrape - Tool to scrape social networking services.
- Splinter - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places
- WiLiSe WikiLink Search - Python script to get links to specific pages of a site through the search in a Wiki (MediaWiki-type) has the api.php accessible or extension LinkSearch enabled (the project is still very immature and at the moment the code is only available in this SVN repository).
- Mobile Phone Applications -- some notes on preserving old versions of mobile apps
Hosted tools
- Pinboard is a convenient social bookmarking service that will archive copies of all your bookmarks for online viewing. The catch is that it costs $22/year, or $39/year if you want the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.
- freeyourstuff.cc -- Extensible open-source (source) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, IMDB, TripAdvisor, Amazon, GoodReads, and Quora as of July 2019.
Site-Specific
- Livejournal
- SomaFM
- https://www.allmytweets.net/ - Download the last 3,200 tweets from any user.
Format Specific
- OmniFlop
- ZIM it (ZIM format for Kiwix)
Proposed
- Solid project attempts to make data portability a reality
- Data transfer project is a (promise of) a quick implementation of GDPR data portability by the GAFA + Twitter
Web scraping
- See Site exploration
← Why Back Up? • Software • Formats →