Difference between revisions of "Blogger"

From Archiveteam
Jump to navigation Jump to search
Line 14: Line 14:


Find as many http://foobar.blogspot.com domains as possible and download them. Blogs often link to other blogs, which will help, so each individual blog saved will help discover others. Also a small-scale crawl of Blogger profiles (e.g. http://www.blogger.com/profile/{random number up to 35217655}) will provide links to blogs authored by each user (e..g https://www.blogger.com/profile/5618947 links to http://hintergedanke.blogspot.com/)
Find as many http://foobar.blogspot.com domains as possible and download them. Blogs often link to other blogs, which will help, so each individual blog saved will help discover others. Also a small-scale crawl of Blogger profiles (e.g. http://www.blogger.com/profile/{random number up to 35217655}) will provide links to blogs authored by each user (e..g https://www.blogger.com/profile/5618947 links to http://hintergedanke.blogspot.com/)
== Country Redirect ==
Accessing http://whatever.blogspot.com will usually redirect to a country-specific subdomain depending on your IP address (e.g. whatever.blogspot.co.uk, whatever.blogspot.in, etc) - this can be bypassed by requesting http://whatever.blogspot.com/ncr as the root URL.


== Downloading a single blog with Wget ==
== Downloading a single blog with Wget ==

Revision as of 19:07, 24 February 2015

Blogger
Blogger- Crea tu blog gratuito 1303511108785.png
URL http://www.blogger.com/
Status Online!
Archiving status Upcoming...
Archiving type Unknown
IRC channel #frogger (on hackint)

Blogger is a blog hosting service. On February 23, 2015, they announced that "sexually explicit" blogs would be restricted from public access in a month. We're downloading everything.

Strategy

Find as many http://foobar.blogspot.com domains as possible and download them. Blogs often link to other blogs, which will help, so each individual blog saved will help discover others. Also a small-scale crawl of Blogger profiles (e.g. http://www.blogger.com/profile/{random number up to 35217655}) will provide links to blogs authored by each user (e..g https://www.blogger.com/profile/5618947 links to http://hintergedanke.blogspot.com/)

Country Redirect

Accessing http://whatever.blogspot.com will usually redirect to a country-specific subdomain depending on your IP address (e.g. whatever.blogspot.co.uk, whatever.blogspot.in, etc) - this can be bypassed by requesting http://whatever.blogspot.com/ncr as the root URL.

Downloading a single blog with Wget

These Wget parameters can download a BlogSpot blog, including comments and any on-site dependencies. It should also reject redundant pages such as the /search/ directory and any multiple occurrences of the same page but with different query strings. It has only be tested on blogs using a Blogger subdomain (e.g. http://foobar.blogspot.com), not custom domains (e.g. http://foobar.com). Both instances of [URL] should be replaced with the same URL. A simple Perl wrapper is available here.

wget --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="/search,/feeds" --referer="[URL]" --wait 1 [URL]

Export XML trick

Add this to a blog url and it will download the most recent 499 posts (that is the limit): /atom.xml?redirect=false&max-results=499

External links