Difference between revisions of "Wget with Lua hooks"

From Archiveteam
Jump to navigation Jump to search
(Add link to documenation)
Line 2: Line 2:


Work in progress: https://github.com/alard/wget-lua
Work in progress: https://github.com/alard/wget-lua
Documentation: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks


<pre>
<pre>

Revision as of 19:16, 1 August 2013

New idea: add Lua scripting to wget.

Work in progress: https://github.com/alard/wget-lua

Documentation: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks

wget http://www.archiveteam.org/ -r --lua-script=lua-example/print_parameters.lua

Why would this be useful?

Custom error handling

What to do in case of an error? Sometimes you want wget to retry the url if it gets a server error.

wget.callbacks.httploop_result = function(url, err, http_stat)
  if http_stat.statcode == 500 then
    -- try again
    return wget.actions.CONTINUE
  elseif http_statcode == 404 then
    -- stop
    return wget.actions.EXIT
  else
    -- let wget decide
    return wget.actions.NOTHING
  end
end

Custom decide rules

Download this url or not?

wget.callbacks.download_child_p = function(urlpos, parent, depth, start_url_parsed, iri, verdict)
  if string.find(urlpos.url, "textfiles.com") then
    -- always download
    return true
  elseif string.find(urlpos.url, "archive.org") then
    -- never!
    return false
  else
    -- follow wget's advice
    return verdict
  end
end

Custom url extraction/generation

Sometimes it's useful if you can write your own url extraction code, for example to add urls that aren't actually on the page.

wget.callbacks.get_urls = function(file, url, is_css, iri)
  if string.find(url, ".com/profile/[^/]+/$") then
    -- make sure wget downloads the user's photo page
    -- and custom profile photo
    return {
      { url=url.."photo.html",
        link_expect_html=1,
        link_expect_css=0 },
      { url=url.."photo.jpg",
        link_expect_html=0,
        link_expect_css=0 }
    }
  else
    -- no new urls to add
    return {}
  end
end