VBulletin

From Archiveteam
Revision as of 20:35, 9 September 2015 by InquilineKea (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Archiving vBulletin (tested only with http://boards.cityofheroes.com/, you may have to change some things):

1. Get a recent Wget+Lua version (it should include WARC support).

2. Get the vbulletin.lua script: https://raw.github.com/ArchiveTeam/cityofheroes-grab/master/vbulletin.lua

3. Collect the forum IDs (the f= parameter in the urls) of forums and subforums. Some pages have a "Forum Jump" dropdown list that gives you the numbers.

Run Wget with the Lua script and seed it with the forum URLs. Start with the URL to /external.php?type=RSS2 to get a session cookie (having a session cookie is necessary to remove the session ID from the URLs).

The Lua script will navigate the forum pages: it will follow pagination links, go from forumdisplay to threads, from threads to posts and members. Use --page-requisites and --span-hosts to get the images. When preparing the seed URLs, be aware that the Lua script only crawls from forum to thread to post/member. It does not, for example, jump from one forum to the other or from a thread back to the forum.

For example, this works for the City of Heroes forums:

./wget-lua \
      -U "$USER_AGENT" \
      -nv \
      -o wget.log \
      --directory-prefix files/ \
      --keep-session-cookies \
      --save-cookies cookies.txt \
      --force-directories \ 
      --adjust-extension \
      -e "robots=off" \
      --page-requisites --span-hosts \
      --lua-script vbulletin.lua \
      --timeout 10 \
      --tries 3 \
      --waitretry 5 \
      --warc-file forum \
      --warc-header "operator: Archive Team" \
      "http://boards.cityofheroes.com/external.php?type=RSS2" \
      "http://boards.cityofheroes.com/forumdisplay.php?f=547" \
      "http://boards.cityofheroes.com/forumdisplay.php?f=569" \
      "http://boards.cityofheroes.com/forumdisplay.php?f=660" \
      etc.

Here is a list of some old forums (many which are vBulletin): http://web.archive.org/web/20061229181451/http://rankings.big-boards.com/?p=all

A very trivial way to archive vBulletin forums (with recent vBulletin software) is to just run a single for loop across all the posts. E.g. run a for loop on Physics Forums from https://www.physicsforums.com/posts/1 to https://www.physicsforums.com/posts/5223616.

See also