Difference between revisions of "FTP"

From Archiveteam
Jump to navigation Jump to search
(ftp etherpad thingy)
(→‎The Project: old pad is back and added links to mirrors)
Line 41: Line 41:
* We're currently [https://github.com/ArchiveTeam/ftp-nab listing all FTP sites on the internet] to download them all.
* We're currently [https://github.com/ArchiveTeam/ftp-nab listing all FTP sites on the internet] to download them all.
* We're auding a list of some select FTP sites manually:  
* We're auding a list of some select FTP sites manually:  
** http://dat.serveert.me.uk/p/ftp
** http://dat.serveert.me.uk/p/ftp (mirror: https://github.com/midasf/ftplist)
** https://www.piratepad.ca/p/old-ftp-list (dead link)
** https://www.piratepad.ca/p/old-ftp-list (mirror: http://dat.serveert.me.uk/p/old-ftp-list)


{| class="wikitable"
{| class="wikitable"

Revision as of 22:15, 16 November 2014

FTP
Threeplaces.jpg
Status Online!
Archiving status Not saved yet
Archiving type Unknown
Project source https://github.com/ArchiveTeam/ftp-nab
IRC channel #effteepee (on hackint)

Archiving a whole public FTP host/mirror is easy:

SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com
tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt

OR, use this handy dandy function to put in your .bashrc file, you can also remove the first and last line to turn it into a fancy bash script. Made by SN4T14

ftp-grab(){
    target="$1"
    wget -r -l 0 -np -nc "$target"
    if "$target" =~ ^ftp://.*$ 
        then
        target="$(echo "$target" | cut -d '/' -f 3)"
        echo "ftp"
        echo "$target"
    fi
    tar cvf $(date +%Y).$(date +%m)."$target".tar "$target"
    tar tvf $(date +%Y).$(date +%m)."$target".tar > $(date +%Y).$(date +%m)."$target".tar.txt
}

Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using tar --remove-files

lftp ftp://site.com -e 'du -h'

An alternate to try if the above does not work correctly (happens more often on old servers):

lftp -c 'set ftp:use-feat no; du -h ftp://site'

Now zip/tar it up and send to the spacious Internet Archive![1] (If you're short on space: tar --remove-files deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike zip -rm.)

The Project

Who is grabbing what?
Midas ftp.tu-chemnitz.de
Midas ftp.uni-muenster.de
Midas gatekeeper.dec.com
Midas ftp.uni-erlangen.de
Midas ftp.warwick.ac.uk

Uni FTP's are massive, currently only grabbing DEC and Sweex.

External Links