Difference between revisions of "Fileplanet/uploadingfilestoia"

From Archiveteam
Jump to navigation Jump to search
(→‎script draft: only idiots don't quote their filenames...)
Line 21: Line 21:




== script draft ==
  #!/bin/bash
  #!/bin/bash
   
   
  #traverse through subdirectories, generate metadata
  #traverse through subdirectories, generate metadata
# since the script is meant to be run on subdirectories of the ftpX directories, specify which parent directory you are currently in.
# this is for the description text.
parentdirectory="ftp1" # NO trailing slash!
echo "is ${parentdirectory} the correct parent directory?"
exit
   
   
  commonheaders='--add-header x-archive-auto-make-bucket:1 --add-header x-archive-meta-noindex:true --add-header "x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork" --add-header "x-archive-meta-collection:archiveteam-fileplanet" --add-header "x-archive-meta-mediatype:software"'  
  commonheaders='--add-header x-archive-auto-make-bucket:1 --add-header x-archive-meta-noindex:true --add-header "x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork" --add-header "x-archive-meta-collection:archiveteam-fileplanet" --add-header "x-archive-meta-mediatype:software"'  
Line 39: Line 44:
  while read file
  while read file
  do
  do
  file=$(echo ${file}| sed 's/\.\///') # remove ./
  file=$(echo ${file}| sed "s/\.\///") # remove ./
  echo "Now uploading ${file}"
  echo "Now uploading ${file}"
 
 
Line 50: Line 55:
 
 
  filename=$(basename "${file}")
  filename=$(basename "${file}")
  ftppath=$(dirname "${file}")
  #ftppath=$(dirname "${file}")
  ftppath=$(echo "${ftppath}/")
  #ftppath=$(echo "${ftppath}/")
  title="--add-header x-archive-meta-title:\"Fileplanet Archive: ${filename}\""
  title="--add-header x-archive-meta-title:\"Fileplanet Archive: ${filename}\""
  desc="--add-header x-archive-meta-description:\"${filename}, mirrored from its original location in ${ftppath}\""
  desc="--add-header x-archive-meta-description:\"${filename}, mirrored from its original location in ${parentdirectory}/${file}\""
 
 
  # from famicoman
  # from famicoman
  # IA supports alphanum and _-.
  # IA supports alphanum and _-.
  itemname=$(echo "Fileplanet_${file}" | tr ' ' '_' | tr -d '[{}(),\!:?~@#$%^&*+=;<>|]' | tr -d "\'" | sed 's/\//_/g')
  itemname=$(echo "Fileplanet_${parentdirectory}_${file}" | tr ' ' '_' | tr -d '[{}(),\!:?~@#$%^&*+=;<>|]' | tr -d "\'" | sed 's/\//_/g')
  #echo "s3://${itemname}"
  #echo "s3://${itemname}"
 
 

Revision as of 10:13, 6 September 2012

Brainstorming on how we can upload the files to IA items.

Remember not to upload the ftp2 stuff publically!

x-archive-meta-title: "Fileplanet_Path_with_underscores". Without the ftp1/2/3 bit, so eg "ftp1/102011/Yes_Man_Dynamic_Theme.7z" would become the item "102011_Yes_Man_Dynamic_Theme.7z", "ftp1/fpnew/patches/prorally2001_v11.exe" would be "fpnew_patches_prorally2001_v11.exe". This might need some care about special characters that IA does not support for item names.

x-archive-meta-description: For now just put the path there. So basically the item name but with special characters intact. TODO later is to add the real metadata we got, shaqfu has a sqlite db with it, otherwise use the fileinfo item: http://archive.org/details/FileplanetFiles_fileinfo_pages_images

x-archive-meta-date: Not sure what format it supports, but this should be the file's original timestamp.

x-archive-meta-year: See above.

x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork


x-archive-meta-collection:archiveteam-fileplanet no idea if this would work with s3 upload

x-archive-meta-mediatype:software (not always correct, maybe check for filename extensions like avi, mov, mp4 etc to set it to movies for those)

Remember not to upload the ftp2 stuff publically!


#!/bin/bash

#traverse through subdirectories, generate metadata

# since the script is meant to be run on subdirectories of the ftpX directories, specify which parent directory you are currently in.
# this is for the description text.
parentdirectory="ftp1" # NO trailing slash!
echo "is ${parentdirectory} the correct parent directory?"
exit

commonheaders='--add-header x-archive-auto-make-bucket:1 --add-header x-archive-meta-noindex:true --add-header "x-archive-meta-subject: gaming;software;gaming software;fileplanet;gamespy;ign;planetnetwork" --add-header "x-archive-meta-collection:archiveteam-fileplanet" --add-header "x-archive-meta-mediatype:software"' 
# mediatype:software is not always correct, maybe check for filename extensions like avi, mov, mp4 etc to set it to movies for those?)
## nah, underscor said software :)

echo $commonheaders

tempfile="/tmp/fileplanet_ListOfFiles"

# generate a list of files
find -type f > ${tempfile}

while read file
do
	file=$(echo ${file}| sed "s/\.\///") # remove ./
	echo "Now uploading ${file}"
	
	datetime=$(ls -l --time-style=long-iso "${file}" | awk '{print $6" "$7}')
	#echo ${datetime}
	year=$(echo ${datetime} | grep -Eo '[0-9]{4}')
	
	date="--add-header x-archive-meta-date:\"${datetime}\""
	year="--add-header x-archive-meta-year:${year}"
	
	filename=$(basename "${file}")
	#ftppath=$(dirname "${file}")
	#ftppath=$(echo "${ftppath}/")
	title="--add-header x-archive-meta-title:\"Fileplanet Archive: ${filename}\""
	desc="--add-header x-archive-meta-description:\"${filename}, mirrored from its original location in ${parentdirectory}/${file}\""
	
	# from famicoman
	# IA supports alphanum and _-.
	itemname=$(echo "Fileplanet_${parentdirectory}_${file}" | tr ' ' '_' | tr -d '[{}(),\!:?~@#$%^&*+=;<>|]' | tr -d "\'" | sed 's/\//_/g')
	#echo "s3://${itemname}"
	
	echo "s3cmd ${commonheaders} ${date} ${year} ${title} ${desc} put \"${file}\" s3://${itemname}"
	echo "#################"
	
done < ${tempfile}

rm ${tempfile}