From Archiveteam
Jump to: navigation, search

This is just a thing I'm writing to get thoughts on paper and is in progress/probable shit, it should not be taken literally unless it's condoned by someone reasonable.

3rd Party Archival for Laymen/Hostmasters

Sites die everyday, it's a fact of life. A majority of the time the sites that die are half baked projects that someone started but didn't finish. Sometimes it's a large website that was bought or ran out of money. Archivists aim to save as much of the disappearing internet as they can. A mix of hoarding data and stealing art from a burning museum. Sometimes the building on fire is a small mom and pop shop, sometimes its The Louvre.

Calling the fire department

In practice, the fire department acts similarly to archivists trying to save the "property" of the website. There are varying degrees of calls to the fire department. It can be informative or not, for example individuals calling to inform emergency services of a building on fire may call with a couple phrases:

  1. "Hello I am calling to report a fire at 123 Main St. A small grease fire has started on the 3rd floor in the southeast corner of the building." *click*
  2. "Hello there's a fire in the building at 123 Main St." *click*
  3. "Hello I heard that there may or may not be a fire at 123 Main St., I dunno I'm across town." *click*


Now example three is a little extreme for real life but then again they're analogies. This example is analogous to a site that shuts down that hardly anyone knows about. It might be a forum or blog that had a small following and a small following of visitors. Someone notices that the site is looking sickly or just disappears one day. There's little chance that the fire department can get to the building before it burns down, and little chance that - unless the site was previously archived - the site will be saved.


Example one is how the fire department would love their tips to be. The department knows exactly which building the fire is in, exactly where it is, and the type of fire. They know how to approach the building for maximum efficiency - from the street on the southeast corner - that they'll need a ladder, and that grease fires need a particular tactic - using chemical suppressants not water. This would be like the website in question publishing the structure of their database/storage/interface along with their closure announcement. This makes the archivists' job easier too, they know exactly where they're supposed to look for the data they need to save.


Example two is how (I picture) a majority of fire department calls happening. Sure there is an 911 employee on the other side that coaxes more information out of the caller but this is all the information your average person would give. That's how I see a lot of closures happening. Let's pretend that our 911 employee didn't ask more questions. The fire department arrives on scene (for the sake of straw man) on the north west side of the building...

Fire Department Arrives

They don't see smoke right away so they park their trucks around the building and look for smoke. Firefighters enter the front of the building and look at the alarm panel, they get a rough idea of the building layout of the building and where the smoke detectors are in alarm state. They pull their hoses up the stairs, and reach the fire. At this point it's irrelevant that they brought water to a grease fire because the upper half of the building is engulfed in flames and they do their best to save the building.

You can imagine the rest but how does this pertain to saving dying websites? Well, the owner of a website puts out a tweet/blog post or someone else announces that the site is going down in a couple days or weeks. The archivists show up with their scripts and large amounts of bandwidth and start looking at the structure of the site. Interpreting the JavaScript and sequence of files and sometimes guessing through brute force. Given a decent amount of time intelligent programmers and engineers can find the majority of data on a website but as the web grows, the number of programmers required increases. In the age of cloud and scalable big data analytics, technology can help the progress more efficient.

Preemptive Steps to Save the Site

It would be good if there was an equivalent to the alarm panel for websites, a method for webmasters to proactively or retroactively (when the site is about to be closed down) list the arrangement and size of their website, any restrictions or possible problems the archivists would run into. Obviously a diagram could be useful for someone trying to burglarize the building so you wouldn't want the diagrams to be public, they're usually in a riser room or in the entryway. Getting websites and webmasters to trust a group or individuals with the diagrams may not be hard since the data should be public - i.e. it would be an index or diagram to find all of the all of the customer or public website data. After all, archivists are really only interested in the public data. The trick wouldn't be convincing webmasters to trust archivists with the data but getting webmasters to generate the diagram.

Obviously these are just rough thoughts now and I'll likely be expanding on them.