Difference between revisions of "Fileplanet/uploadingfilestoia"

From Archiveteam
Jump to navigation Jump to search
(→‎script draft: now outputs s3cmd line)
Line 22: Line 22:


== script draft ==
== script draft ==
  #!/bin/bash
#!/bin/bash
 
  #traverse through subdirectories, generate metadata
#traverse through subdirectories, generate metadata
 
  commonheaders=' --add-header x-archive-auto-make-bucket:1
commonheaders='--add-header x-archive-auto-make-bucket:1 --add-header x-archive-meta-noindex:true --add-header "x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork" --add-header "x-archive-meta-collection:archiveteam-fileplanet" --add-header "x-archive-meta-mediatype:software"'  
  --add-header "x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork"
# mediatype:software is not always correct, maybe check for filename extensions like avi, mov, mp4 etc to set it to movies for those?)
  --add-header "x-archive-meta-collection:archiveteam-fileplanet"
## nah, underscor said software :)
  --add-header "x-archive-meta-mediatype:software" '  
  # (not always correct, maybe check for filename extensions like avi, mov, mp4 etc to set it to movies for those?)
echo $commonheaders
 
  echo $commonheaders
tempfile="/tmp/fileplanet_ListOfFiles"
 
  tempfile="/tmp/fileplanet_ListOfFiles"
# generate a list of files
 
find -type f > ${tempfile}
  # generate a list of files
  find -type f > ${tempfile}
while read file
 
do
  while read file
file=$(echo ${file}| sed 's/\.\///') # remove ./
  do
echo "Now uploading ${file}"
  echo ${file} # has a ./ in front
 
datetime=$(ls -l --time-style=long-iso "${file}" | awk '{print $6" "$7}')
  datetime=$(ls -l --time-style=long-iso "${file}" | awk '{print $6" "$7}')
#echo ${datetime}
  #echo ${datetime}
year=$(echo ${datetime} | grep -Eo '[0-9]{4}')
  year=$(echo ${datetime} | grep -Eo '[0-9]{4}')
  echo "--add-header x-archive-meta-date:\"${datetime}\""
date="--add-header x-archive-meta-date:\"${datetime}\""
  echo "--add-header x-archive-meta-year:\"${year}\""
year="--add-header x-archive-meta-year:${year}"
 
  file=$(echo ${file}| sed 's/\.\///') # remove ./
filename=$(basename ${file})
  filename=$(basename ${file})
ftppath=$(dirname ${file})
  ftppath=$(dirname ${file})
ftppath=$(echo "${ftppath}/")
  ftppath=$(echo "${ftppath}/")
title="--add-header x-archive-meta-title:\"Fileplanet Archive: ${filename}\""
  echo "--add-header x-archive-meta-title:\"Fileplanet Archive: ${filename}\""
desc="--add-header x-archive-meta-description:\"${filename}, mirrored from its original location in ${ftppath}\""
  echo "--add-header x-archive-meta-description:\"${filename}, mirrored from its original location in ${ftppath}\""
 
# from famicoman
  # from famicoman
# IA supports alphanum and _-.
  # IA supports alphanum and _-.
itemname=$(echo "Fileplanet_${file}" | tr ' ' '_' | tr -d '[{}(),\!:?~@#$%^&*+=;<>|]' | tr -d "\'" | sed 's/\//_/g')
  itemname=$(echo "Fileplanet_${file}" | tr ' ' '_' | tr -d '[{}(),\!:?~@#$%^&*+=;<>|]' | tr -d "\'" | sed 's/\//_/g')
#echo "s3://${itemname}"
  echo "s3://${itemname}"
 
echo "s3cmd ${commonheaders} ${date} ${year} ${title} ${desc} put \"${file}\" s3://${itemname}"
  echo "#################"
echo "#################"
  done < ${tempfile}
 
done < ${tempfile}
  rm ${tempfile}
rm ${tempfile}

Revision as of 20:40, 4 September 2012

Brainstorming on how we can upload the files to IA items.

Remember not to upload the ftp2 stuff publically!

x-archive-meta-title: "Fileplanet_Path_with_underscores". Without the ftp1/2/3 bit, so eg "ftp1/102011/Yes_Man_Dynamic_Theme.7z" would become the item "102011_Yes_Man_Dynamic_Theme.7z", "ftp1/fpnew/patches/prorally2001_v11.exe" would be "fpnew_patches_prorally2001_v11.exe". This might need some care about special characters that IA does not support for item names.

x-archive-meta-description: For now just put the path there. So basically the item name but with special characters intact. TODO later is to add the real metadata we got, shaqfu has a sqlite db with it, otherwise use the fileinfo item: http://archive.org/details/FileplanetFiles_fileinfo_pages_images

x-archive-meta-date: Not sure what format it supports, but this should be the file's original timestamp.

x-archive-meta-year: See above.

x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork


x-archive-meta-collection:archiveteam-fileplanet no idea if this would work with s3 upload

x-archive-meta-mediatype:software (not always correct, maybe check for filename extensions like avi, mov, mp4 etc to set it to movies for those)

Remember not to upload the ftp2 stuff publically!


script draft

#!/bin/bash

#traverse through subdirectories, generate metadata

commonheaders='--add-header x-archive-auto-make-bucket:1 --add-header x-archive-meta-noindex:true --add-header "x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork" --add-header "x-archive-meta-collection:archiveteam-fileplanet" --add-header "x-archive-meta-mediatype:software"' 
# mediatype:software is not always correct, maybe check for filename extensions like avi, mov, mp4 etc to set it to movies for those?)
## nah, underscor said software :)

echo $commonheaders

tempfile="/tmp/fileplanet_ListOfFiles"

# generate a list of files
find -type f > ${tempfile}

while read file
do
	file=$(echo ${file}| sed 's/\.\///') # remove ./
	echo "Now uploading ${file}"
	
	datetime=$(ls -l --time-style=long-iso "${file}" | awk '{print $6" "$7}')
	#echo ${datetime}
	year=$(echo ${datetime} | grep -Eo '[0-9]{4}')
	
	date="--add-header x-archive-meta-date:\"${datetime}\""
	year="--add-header x-archive-meta-year:${year}"
	
	filename=$(basename ${file})
	ftppath=$(dirname ${file})
	ftppath=$(echo "${ftppath}/")
	title="--add-header x-archive-meta-title:\"Fileplanet Archive: ${filename}\""
	desc="--add-header x-archive-meta-description:\"${filename}, mirrored from its original location in ${ftppath}\""
	
	# from famicoman
	# IA supports alphanum and _-.
	itemname=$(echo "Fileplanet_${file}" | tr ' ' '_' | tr -d '[{}(),\!:?~@#$%^&*+=;<>|]' | tr -d "\'" | sed 's/\//_/g')
	#echo "s3://${itemname}"
	
	echo "s3cmd ${commonheaders} ${date} ${year} ${title} ${desc} put \"${file}\" s3://${itemname}"
	echo "#################"
	
done < ${tempfile}

rm ${tempfile}