Difference between revisions of "FurAffinity"

From Archiveteam
Jump to navigation Jump to search
Line 22: Line 22:
== Things to figure out. ==
== Things to figure out. ==


Exactly how much space is this going to take? A shot in the dark estimate is around 10 TB.
Exactly how much space is this going to take? I sampled 20 random images, and figured the average file size is 188.5 kb. That makes for a figure of 2.8 TB for media files alone, but someone should probably make a script that can take a larger sample to be sure (I did this by hand, can't code.)
 
What's the average file size? The file size limit is 10 MB, but the average is certain to be far lower. This will help us be sure of how much space is needed.


There are also users that have elected to not make their profile and/or gallery visible to non registered users. Should we exclude them? Is there an easy way to? When logged in, there doesn't seem to be any way to distinguish users that have elected to be invisible to unregistered views compared to those that have not.
There are also users that have elected to not make their profile and/or gallery visible to non registered users. Should we exclude them? Is there an easy way to? When logged in, there doesn't seem to be any way to distinguish users that have elected to be invisible to unregistered views compared to those that have not.

Revision as of 08:05, 19 November 2014

FurAffinity is the world's largest furry community. Founded in 2005, it hosts 15,000,000+ images, stories, songs, videos, poems, and other files (mostly images).

When planning to crawl FA, please bear in mind that it recently suffered a major DDoS, and the admins are likely still a bit jumpy about seeing unusually large amounts of traffic.

Things to be saved

  • Media files (including jpg, gif, png, jpeg, txt, swf, doc, docx, rtf, txt, pdf, odt, mid, wav, mp3, mpeg, and others?)
    • Descriptions
    • Thumbnails for files (Even for image files, as they are often different than a simple crop/resize)
    • Comments
    • Metadata (Author, submission date, category, theme, species, gender, tags, maturity rating, photographic meta data)

Keep in mind that many FurAffinity submissions are marked "Mature" which means they are only visible to registered users.

  • User profiles
    • Blogs
    • Page comments
    • Favorites
  • Forums

Things to figure out.

Exactly how much space is this going to take? I sampled 20 random images, and figured the average file size is 188.5 kb. That makes for a figure of 2.8 TB for media files alone, but someone should probably make a script that can take a larger sample to be sure (I did this by hand, can't code.)

There are also users that have elected to not make their profile and/or gallery visible to non registered users. Should we exclude them? Is there an easy way to? When logged in, there doesn't seem to be any way to distinguish users that have elected to be invisible to unregistered views compared to those that have not.