Difference between revisions of "Blogger"

Revision as of 15:36, 16 January 2017

Blogger


URL	http://www.blogger.com/
Status	Online!
Archiving status	Not saved yet
Archiving type	Unknown
Project source	blogger-discovery
Project tracker	bloggerdisco
IRC channel	#frogger (on hackint)

Blogger is a blog hosting service. On February 23, 2015, they announced that "sexually explicit" blogs would be restricted from public access in a month. But soon they withdrew their plan, and said they wouldn't change their existing policies.^[1]

ArchiveTeam did a discovery between February and May 2015, but actual content has not been downloaded yet.

MOTHERFUCKER ! ! !

Country Redirect

Accessing http://whatever.blogspot.com will usually redirect to a country-specific subdomain depending on your IP address (e.g. whatever.blogspot.co.uk, whatever.blogspot.in, etc) which in some cases may be censored or edited to meet local laws and standards - this can be bypassed by requesting http://whatever.blogspot.com/ncr as the root URL.^[2] ^[3]

Downloading a single blog with Wget

These Wget parameters can download a BlogSpot blog, including comments and any on-site dependencies. It should also reject redundant pages such as the /search/ directory and any multiple occurrences of the same page but with different query strings. It has only be tested on blogs using a Blogger subdomain (e.g. http://foobar.blogspot.com), not custom domains (e.g. http://foobar.com). Both instances of [URL] should be replaced with the same URL. A simple Perl wrapper is available here.

wget --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="/search,/feeds" --referer="[URL]" --wait 1 [URL]

UPDATE:

Use this improved bash script instead, in order to bypass the adult content confirmation. BLOGURL should be in http://someblog.blogspot.com format.

#!/bin/bash
blogspoturl="BLOGURL"
wget -O - "blogger.com/blogin.g?blogspotURL=$blogspoturl" | grep guestAuth | cut -d'"' -f 4 | wget -i - --save-cookies cookies.txt --keep-session-cookies
wget --load-cookies cookies.txt --recursive --level=2 --no-clobber --no-parent --page-requisites --continue --convert-links --user-agent="" -e robots=off --reject "*\\?*,*@*" --exclude-directories="/search,/feeds" --referer="$blogspoturl" --wait 1 $blogspoturl

Export XML trick

Add this to a blog url and it will download the most recent 499 posts (that is the limit): /atom.xml?redirect=false&max-results=

Your own blogs

Download them at https://takeout.google.com/settings/takeout

We've not tested whether the output is suitable for importing in any other software such as Wordpress.

External links

Blogger^{[IA•Wcite•.today•MemWeb]}

References

[1] ttps://support.google.com/blogger/answer/6170671?p=policy_update&hl=en&rd=1

[2] ttps://support.google.com/blogger/answer/2402711?hl=en

[3] ttp://www.bbc.co.uk/news/technology-16852920

[1]

[2]

[3]

@@ Line 15: / Line 15: @@
 '''ArchiveTeam did a discovery between February and May 2015, but actual content has not been downloaded yet.'''
+== '''MOTHERFUCKER ! ! !''' ==
+== '''MOTHERFUCKER ! ! !''' ==
 == '''MOTHERFUCKER ! ! !''' ==

Difference between revisions of "Blogger"

Revision as of 15:36, 16 January 2017

Contents

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

Country Redirect

Downloading a single blog with Wget

Export XML trick

Your own blogs

External links

References

Navigation menu

Difference between revisions of "Blogger"

Revision as of 15:36, 16 January 2017

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

Country Redirect

Downloading a single blog with Wget

Export XML trick

Your own blogs

External links

References

Navigation menu

Search