https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Ola+norsk&feedformat=atomArchiveteam - User contributions [en]2024-03-29T13:19:36ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=File:Space_it_out%3F_-_Sketchcow.png&diff=30549File:Space it out? - Sketchcow.png2018-05-17T03:56:51Z<p>Ola norsk: wise quote of sketchcow</p>
<hr />
<div>wise quote of sketchcow</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=File:Bryan_Lunduke_Internet_Quote.png&diff=30546File:Bryan Lunduke Internet Quote.png2018-05-09T17:34:27Z<p>Ola norsk: Bryan Lunduke quote, from aproximately end of 2017 to beginning of 2018</p>
<hr />
<div>Bryan Lunduke quote, from aproximately end of 2017 to beginning of 2018</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Talk:Alive..._OR_ARE_THEY&diff=30434Talk:Alive... OR ARE THEY2018-03-28T23:52:30Z<p>Ola norsk: just putting facebook out there (probably fine though)</p>
<hr />
<div>Has anyone attempted to restore a running copy of wikipedia from the dumps? If so do the dumps provide enough data that, in the event of a catastrophic data failure, wikipedia could be brought back up using just the backups?--[[User:Adewale|Adewale]] 08:42, 27 February 2009 (UTC)<br />
* That's a very good question. The problem is that the dumps are now insanely huge. I don't know how many people would even have the capacity to unpack them. --[[User:Jscott|Jscott]] 15:45, 27 February 2009 (UTC)<br />
* This: http://aws.amazon.com/publicdatasets/#1 suggests that DBpedia and FreeBase are attempting to maintain their own structured versions of the wikipedia dataset. Theoretically, if Amazon keeps their public dataset up to date, then it is possible to restore wikipedia from that.<br />
* Even worse, there is no real backup of the image data. According to their backup procedures[http://wikitech.wikimedia.org/view/Backup_procedures] they only manually rsync them to another remote host sometimes. They already lost some images because of a software bug that they could not restore[http://thread.gmane.org/gmane.org.wikimedia.commons/4175/focus=4178]. Maybe there is a possibility to (slowly) download all images (they provide a dump of the database table that contains all the image metadata) and safe them. As the text-only dump of wikipedia is already > 2 TB without compression, the image size must be HUGE. --[[User:Soult|Soult]] 10:57, 4 April 2009 (UTC)<br />
** Just calculated the total size: All images (without meta-data) are about 2604 GB (2.54 TB) in size (as of 24 January, 2009) without counting deleted or replaced images. --[[User:Soult|Soult]] 23:40, 9 April 2009 (UTC)<br />
:::All the images in [[Wikimedia Commons]] (7 millions) are about 6TB. [[User:Emijrp|Emijrp]] 23:12, 1 November 2010 (UTC)<br />
* Uh, well I know I use [http://www.yunqa.de/delphi/doku.php/products/wikitaxi/index WikiTaxi] regularly to read articles from the dumps. It works quite well. As for how easy it'd be to restore the site from the dumps rather than just read from them, I don't know. But I know the information is there and there's even tools already that read it. --[[User:Qwerty0|Qwerty0]] 19:21, 20 April 2011 (UTC)<br />
<br />
<br />
=== Wikimedia Commons monthly uploads ===<br />
<pre><br />
date sum(img_size)<br />
2003-1 1360188<br />
2004-10 637349207<br />
2004-11 726517177<br />
2004-12 1503501023<br />
2004-9 188850959<br />
2005-1 1952816194<br />
2005-10 17185495206<br />
2005-11 9950998969<br />
2005-12 11430418722<br />
2005-2 3118680401<br />
2005-3 3820401370<br />
2005-4 5476827971<br />
2005-5 10998180401<br />
2005-6 7160629133<br />
2005-7 9206024659<br />
2005-8 12591218859<br />
2005-9 14060418086<br />
2006-1 15433548270<br />
2006-10 33574470896<br />
2006-11 34231957288<br />
2006-12 30607951770<br />
2006-2 14952310277<br />
2006-3 19415486302<br />
2006-4 23041609453<br />
2006-5 29487911752<br />
2006-6 29856352192<br />
2006-7 32257412994<br />
2006-8 50940607926<br />
2006-9 37624697336<br />
2007-1 40654722866<br />
2007-10 89872715966<br />
2007-11 81975793043<br />
2007-12 75515001911<br />
2007-2 39452895714<br />
2007-3 53706627561<br />
2007-4 72917771224<br />
2007-5 72944518827<br />
2007-6 63504951958<br />
2007-7 76230887667<br />
2007-8 91290158697<br />
2007-9 100120203171<br />
2008-1 84582810181<br />
2008-10 122360827827<br />
2008-11 116290099578<br />
2008-12 126446332364<br />
2008-2 77416420840<br />
2008-3 89120317630<br />
2008-4 98180062150<br />
2008-5 117840970706<br />
2008-6 100352888576<br />
2008-7 128266650486<br />
2008-8 130452484462<br />
2008-9 120247362867<br />
2009-1 127226957021<br />
2009-10 345591510325<br />
2009-11 197991117397<br />
2009-12 228003186895<br />
2009-2 125819024255<br />
2009-3 273597778760<br />
2009-4 212175602700<br />
2009-5 191651496603<br />
2009-6 195998789357<br />
2009-7 241366758346<br />
2009-8 262927838267<br />
2009-9 184963508476<br />
2010-1 226919138307<br />
2010-2 191615007774<br />
2010-3 216425793739<br />
2010-4 312177184245<br />
2010-5 312240110181<br />
2010-6 283374261868<br />
2010-7 362175217639<br />
2010-8 172072631498<br />
</pre><br />
<br />
In bytes. In July 2010 were uploaded 362 GB. [[User:Emijrp|Emijrp]] 23:16, 1 November 2010 (UTC)<br />
<br />
== Add a section to the article that suggest workarounds ==<br />
<br />
Easiest (to save content) is to submit to multiple websites. My personal favorites for typical files I upload are [[Ovi Share]] (by Nokia, unlimited diskspace for quite a wide variety of files but no easy mass download), [[Scribd]], [[docstoc]], [[Slideshare]] and [[Box.net]] (from which I can download files as a zip file). --[[User:Jaakkoh|Jaakkoh]] 04:39, 4 April 2009 (UTC)<br />
<br />
<br />
Wasn't sure where to post this, but what about FriendFeed, given that they're in the process of being acquired by Facebook, and people are talking about leaving and taking their content with them/deleting accounts? <br />
[[User:TysonKey|TysonKey]] 17:49, 13 August 2009 (UTC)<br />
<br />
== whitehouse.gov ==<br />
<br />
Perhaps some mention should be made that the entity which owns this owes hundreds of *trillions* of dollars with no clear plan or schedule to repay this money. As a house in a bad neighbourhood with a world-leading quantity of foreclosures, it's just a matter of time before Commie China Inc. calls in the loans and the entire house of cards collapses. --[[User:Carlb|Carlb]] 18:24, 13 February 2012 (UTC)<br />
<br />
== Uncyclopedia and various former Wikia ==<br />
Quite a few of the Uncyclopedia individual-language Wikipedia parodies currently are Wikia and all but two of the rest have long been downloadable on download.uncyc.org - but there are two major ones missing (because they're hosted as independents): absurdopedia.org and uncyclopedia.kr (Russian and Korean, respectively). Odds are that there are backups of both from before they were moved independent, but these will be badly out of date. No idea whether the new hosts of these have any backups available for download.<br />
<br />
The same pattern likely also holds for a long list of wikis which have left Wikia due to ad-heavy forced reskins of that site in 2008 and again in 2010. http://awa.shoutwiki.com/wiki/Moved_wikis and http://complaintwiki.org were intended as consumer-complaint sites about Wikia, but de-facto should be being read as lists of MediaWiki installations, newly independent in hosting, which may or may not have downloadable backups. <br />
<br />
Might be best to assume that, just because Wikia (or another wiki farm) left the old wiki open and abandoned since 2008, that there's anything in the Wikia backups other than outdated content and vandalism once the community has established itself independently elsewhere. Wikia's infamous for keeping old wikis open after the community leaves and Wikia staff have been seen removing links to the new wiki on numerous occasions. <br />
<br />
Annoyingly, if some automated process on the old site is generating periodic backups of old data, the timestamp will appear current when the underlying data is three years outdated. --[[User:Carlb|Carlb]] 18:24, 13 February 2012 (UTC)<br />
<br />
== Facebook ==<br />
<big>Is it really doing <i>'ok'</i> ? Or, could it really all come crumbling down?</big><br />
<br />
# WHAT IF IT'S <b>NOT</b> ? <br />
##It would be a monumental <i>(and likely impossible task)</i><br />
##Most likely there would be ample opportunity and time given for users to grab their own data<br />
###What about the <i>if not</i> case, and of those users who might not or could not?<br />
* [hxxp://youtu.be/aY3rokHZgd0 Facebook is Dying, New Class Action Lawsuit Takes Aim]</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=User:Ola_norsk&diff=30433User:Ola norsk2018-03-28T23:47:50Z<p>Ola norsk: </p>
<hr />
<div>Merely, drunkard'ly and merrily, a '''Loudmouth'''!<br />
(Including some scarce knowledge of preservation techniques for timber-constructions: ''Soak it in chlorine, and let it dry out!'')</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=User:Ola_norsk&diff=30432User:Ola norsk2018-03-28T23:46:44Z<p>Ola norsk: </p>
<hr />
<div>Merely, drunkard'ly and merrily, a '''Loudmouth'''!<br />
(Including some scarce knowledge of preservation techniques for timber-constructions: ''Soak it in chlorine, and let it dry out!'')<br />
test</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=DEFCON_19_Talk_Transcript&diff=30182DEFCON 19 Talk Transcript2017-12-12T02:37:02Z<p>Ola norsk: the (https://dl.dropboxusercontent.com/u/5988419/Defcon19-YouTubeCaptions.srt) link appears broken / File not found</p>
<hr />
<div>Transcript of DEFCON 19 Talk from [[http://www.youtube.com/watch?v=-2ZTmuX3cog]]<br />
<br />
<br />
<br />
<br />
<br />
So again thank you all so much for coming to this, and for enjoying, I hope, DEFCON. <br />
<br />
"Excellent, yeah," says one guy! <br />
<br />
OK, so, the name of this talk is "Archive Team: A distributed preservation of service Attack." A hilarious title meant to bring you in, and it worked, apparently. <br />
<br />
My name is Jason Scott, I am the mascot of Archive Team, which is a rogue band of archivists, preservationists and jerks dedicated to saving online, and in some cases offline, history. <br />
<br />
And this project has been going on for a little while, and I thought, well, maybe it's time to kind of make people understand what we're up to here. <br />
<br />
So, before I get started though, I want to dedicate this talk to Tim Recher, a very old friend of mine who unfortunately passed on this year. It had been one of his dreams to bring his family back to DEFCON, so they are here in the house tonight to enjoy DEFCON. <br />
<br />
And I must say, you know, as time goes on, I'm currently forty years old, you know you have experiences of losing friends you didn't know you were going to keep around for such a short time, so it's always worthwhile, even though I'm in a talk, to skip a talk, if there's a friend who you haven't seen in awhile, spend maybe an extra four or five minutes with them to remember some things with them. It's just a benefit for that aspect of it. Huh. <br />
<br />
Since we're all about saving websites let me in fact talk about soy sauce. <br />
<br />
So, soy sauce... Soy sauce, as some of you might know, is basically fermented soybeans, wheat, it's a process in which these things are brewed just like beers, or other kinds of fermented dishes. <br />
<br />
There's an experience to it, there's an idea behind it, it's a long process to learn, we've been doing it for hundreds of years, but different groups have different approaches, and it's extremely important that the essence of them are maintained. In fact this is part of the marketing of a lot of beers and crafts and everything else that, you know, it's important that all the components stay absolutely the same. <br />
<br />
This is the Yamasa Soy Company, a company that's been around for two hundred and eight years, since the Edo period. They've had eight presidents, and they've been basically producing soy sauce in their local town for all this time. <br />
<br />
And they are beloved enough that, for instance, here's a tourist who has just gone and taken a beautiful drawing of this soy sauce factory to kind of show, you know, they've been around forever and they do this kind of work. <br />
<br />
So they were hit by the tsunami. <br />
<br />
This is that same factory, and that's the creator the current ninth president of this soy sauce company. He was given it by his father after the flood, because his father felt he was too old to figure out what to do next - because what are you going to do about a company in which everything is absolutely obliterated? <br />
<br />
Obliterated to the point, for instance, that this is the company safe from which he was able to extract their incorporation papers from two centuries ago so he could prove that the company was still around, and still existent. And again, I'm crediting Robert Gilhooly, this is the man who walked around with it. <br />
<br />
Now bear in mind, when they got hit with the tsunami, one of the first things that they discover, or one of the first things that happened was that their head of sales ran to one of the dams, to save the dam, and was killed by the ensuing rush of water. <br />
<br />
This man as well, this is the ninth president, was absolutely convinced his family have died and that he'd lost his children because their house was just a few blocks from this place. As it turned out his children had actually been led up a hill by some teachers and elderly residents who were nearby, most of which then proceeded to die. <br />
<br />
So this was a family, this is a family businesses that's just ensconced in tragedy right now. And, so this is him kind of standing on the hill that he was able to run up to with all the employees who didn't die, overlooking their factory, losing this thing. <br />
<br />
So, what, what I want to explain, though, is that this is a -- you know, a person might say "Well, who cares, it's a company, what does the company matter against human lives?" <br />
<br />
Well, this is a company that was so entwined in the identity of this town that even with a 70% death rate, people were coming to this soy sauce company and giving them money, to say "When you make soy sauce again, I'll wait for my soy sauce delivery." Because I want to be able to, you know, keep this important thing around. <br />
<br />
Now in the process of making soy sauce there's this whole process here, and one of the most important ones is the adding of the moromi, the fermentation where they add specific yeast, the specific fermenting agent that will be able to make more of the soy sauce, and it's got a specific flavor, and it's cooked in a certain way. <br />
<br />
So one of the things that the president discovered was that -- and again, he's thirty-seven, right, and he's been saddled with this idea to rebuild the company. So he looks for the barrels - there's about thirty barrels that they keep this agent in. <br />
<br />
And he comes and he finds most of them completely crushed. Gone, missing. He finds one or two that are intact, but to his horror, he tells people "Okay, we're good, we've got the moromi" but it turns out that too much sea water has gotten in, and it has killed it. So there's none. <br />
<br />
So what ends up happening is, when everything looks bleak, they recall an experiment that had happened four months earlier, and what it was, there was a local marine biology laboratory that was doing experimentation and asked for some of the moromi. So what they have done was asked for a small segment, a barrel's worth or whatever, of moromi to test with. They had given it to them. <br />
<br />
That lab had also been hit with the tsunami, and had its first floor destroyed. But among the first floor was a one kilogram bag of the moromi still in the plastic wrapping that they had never gotten around to. And from that small bit, this company is rebuilding itself into this very beloved brand once again. <br />
<br />
This is the two new sales girls who were hired by this company, that's their new digs while they're working it out, and this is a piece of their old factory, and the girls saying, you know, "Thank you, thank you for your patience. We will have your soy sauce soon." <br />
<br />
So why am I saying all this? <br />
<br />
Well, what I'm trying to say is that, first of all, backups are important! Multiple backups apparently even better! <br />
<br />
But even beyond that, this object, this yeast, was an emotional meaningful human item that had relevance to a culture and a world that was, you know, basically something that people considered part of their identity, really. Really, if you think about it. <br />
<br />
And, even though it's just an object, it's got meaning that way. So, what I'm saying here is, that objects that maintain memory, objects that are part of us, have relevance to us even after their initial use may be initially gone. <br />
<br />
In other words you look at these items and you say "I remember I was with this person" or "This proves that we were part of this" or, you know, "This is the proof that I invented this first" or, more accurately, "This is a friend who I have lost." "This is somebody who I can't speak to anymore, but I have their work." <br />
<br />
And I think that that's something that can sometimes be lost when I start to tell you about some websites, because one of the things people say is "Well, who gives a crap?" <br />
<br />
These are really old websites, and I say these websites are collections of memories that have been gathered up through people online. That is the driving heart and force of what I'm talking about here. <br />
<br />
And there's a wide variety of old media I might mention, and old websites and things that are currently sustaining our memories magnetically, in forms that are kind of strange, and each year it becomes harder and harder to extract them. But beyond that, they contain things that their exterior may not really reveal. <br />
<br />
So for instance, you might have writings that you might not remember doing, letters your family might have done, basically stored on very very old media. <br />
<br />
Additionally, you get weird shit! <br />
<br />
For instance, this is eBay: the home game. This speaks to a lot of things because it indicates that people thought that eBay... <br />
<br />
Well, first of all you have this belief that someone thinks eBay is something you'd want to do at home with your family, uh... with "No money down!" <br />
<br />
But it also indicates that eBay had some sort of cultural meaning to us in 2000 when this came out, that it was strong enough of a feeling to feel that this is an experience that you should share elsewhere. That idea of getting completely fucked on shipping costs. Very critical, right? All right. <br />
<br />
And again I'm keeping a lot of things myself that are part of that, this for instance there are some very old issues of 2600, I have three complete runs of 2600 magazine. Actually, I've got a lot of stuff. <br />
<br />
If you don't know me, that's fine, that's awesome, I don't have anything, I live a free life, how about that, but if you actually know me then you know about the shipping container, and you know about the various pieces of old media that I'm sent, I'm sent lots of floppy disks, old tapes, tape drives, you know, basically all sorts of collections of items that are things that I transfer out. <br />
<br />
So I've been able to get my hands on some very old things, and I and I constantly make myself available for this because I believe that all of these old memories have meaning, and it's pretty easy with something like 2600. Very prominent, lots of copies, very easy. I'm to tell you that there's a hacker calendar now from 2600, in case you want to support them, but the 2600 magazine is just one of just many. But we also have these electronic artifacts. Right? <br />
<br />
So, for some of you this has -- can I just get a clap if this has any emotional meaning to you whatsoever? <br />
<br />
And I DO do error correction for the person going (MIMES CLAPPING), that's cool. I'll just quickly go over what we're looking at here. <br />
<br />
These are two different items. The first one is a "Netscape Now" button. This is a period of time when browsers were just starting out, and you had a number of browsers, there were about twenty or thirty, but there were a few that were trying to build themselves up and one of the original investors in SGI went out and scooped up all the creators of the Mosaic browser, started a new company and named it Netscape. <br />
<br />
And, as part of that, wanted to let you know that if you want to see a really good site, and you want to watch it properly, you should get Netscape right now. So there was a button that they produced, that was animated, that said "Go to get Netscape right now so you can see my website the way it was meant to be seen". And so, that's the Netscape Now button. <br />
<br />
Underneath that is the "Under Construction" GIF. I'm not going to get into the "Gif" / "Jif" argument right now! So, the "Under Construction" GIF is basically a, uh, indicator that you're not finished with your website. <br />
<br />
Now obviously we in the future with our incredible abilities, websites are actually never finished so it's redundant! We've factored that thing out on both sides of the equation. We're like, you know what "always building", because if you're not always building it's now currently a period of shame, right. Lack of dynamism is a shameful trade of your website, indication of your failure, and lack of interest, so you would not want to say "Oh, I'm under construction", of course you're under construction. <br />
<br />
Alright? So, yeah, there's a very emotional reaction to that, but I find there's even a bigger one to this. Which is... <br />
<br />
This is a collection that I have of all of the Netscape Now buttons! As you can see there's a whole variety of stories being told here, because some of them indicate what... Uh, some of them are obviously made by hand, some of them go for different versions, some of them re-jigger themselves, these are all actually MD5 different. <br />
<br />
What I am discovering, for instance, here's a little story you might not know, is for instance some of them you look at and you say "Well, why is this different from the one next to it?" It is because they've removed the extra frames to save some space. They took out a K or two, and that way they got a little more space for their website. So the thing looks a little crappier, but "Thank god I've got more space!" We don't think about things that way. Also somebody there seems to be really against Netscape Now, so... screw you! <br />
<br />
Similarly on the "under construction" thing, also a lot of emotional reaction. I got ten thousand of these things. And as you can see, there's all variety of things put under construction, and I think what I'm trying to say is... And again, if you go to textfiles.com/underconstruction it says "This page is under construction", and then puts all of them underneath. Which will crash some browsers! <br />
<br />
And then it says "if there's a problem, mail me". That goes to one with mail me GIFs that DOES crash all browsers. So you can be a historian, and also be an asshole! It works out, actually. <br />
<br />
So anyway, so what you have here is again like I said, a wide variety of interesting cultural artifacts, and I've found the people who go to this just automatically get a massive amount of reaction from it. <br />
<br />
Well what we're experiencing right now is a bunch of websites that were started earlier, and "earlier" now could be anything from mid-1990s up through to even maybe a year ago or further, where they reach a point where somebody decides they're not going to stay up any more. <br />
<br />
And it's usually done with, like a post-it note on the outside of a restaurant that's been shut down for health code violations. It is just simply something saying "By the way, we're gone." <br />
<br />
Now, normally I would not care, right? I mean, if you've created, you know, hatsforcats.com, and suddenly no-one wants to buy your cat hats, and you say "Sorry, thank you for, you know, thank you for four months of wonderful business." And away you go, that's fine. <br />
<br />
But what we have right now from the mid-1990s on to now is this whole period where we're taking user generated content and a large amount of marketing is being made to make it as easy and quick and frictionless to put as much of yourself online as quickly as possible into something... with a huge lack of any interest in telling you what that something is. It's just there. You get an IT department, and you don't even know how to reach it. <br />
<br />
What ends up happening is you get this, right: AOL Hometown, which was a whole bunch of really interesting websites from the early 1990s, and in 2008 they said "You know what, we're out of here." And that was it. Hometown was gone. Same thing up there with Kickstart. You don't know what Kickstart is, I didn't know what Kickstart is, but I like the button. The indication there is like "See this light bulb? Going out." <br />
<br />
And then you get these kind of surreal shutdowns, right, like Free Pro Hosting, which offers you more! And the next thing it says is "We're going to be discontinuing our free hosting service at the end of the year." And look at that smiling girl! "Guess what, we're closed!" "We're out of business!" So, hey, welcome to Free Pro Hosting where nothing is now free. That is a tough, tough sell. "What are you called?" "Free cars." "What do you sell?" "Cars." "Free?" "No." "No, Bob Free, pleased to meet you." Yeah. <br />
<br />
So we started Archive Team. OK? ArchiveTeam.org. We are gonna rescue your shit. We are the A-Team. We are the team that will come in, and we will rescue things that need to be rescued. Help the helpless, go after the site, sight the sightless. <br />
<br />
We're going to go after places that look like they're being shut down. And we download them, and then we figure out what to do next. We know, you know, so much in history, if you go ahead and look at a lot of things, how we have it with housing and things, that you know, basically, uh...when, when you evict somebody from a home, it is a huge-ass painful process that sucks. Right? <br />
<br />
Yes, right there, you're looking outside, you can see in the window, you're the landlord, you can see them fucking up your apartment. You're like "I'm gonna get rid of them. It's going to take six weeks. But I'm going to get rid of them." And you have to apply in front of a judge, you have to show things, and you have to do all these things.<br />
<br />
Well with web hosting, we don't have to do any of that. And some people think that's beautiful, and, yes, the wild west was fucking awesome until you died of dysentery. <br />
<br />
And I'm saying that it's 2011, and this is DEFCON. This is one of these places which goes, like, "By the way, this idea is stupid, we don't do this anymore", well the idea of completely, uh... uncontrolled, non-transparent hosting of user content really needs to come to an end. But until then, we're duping stuff because the conversation otherwise ends. <br />
<br />
Like if you go to AOL Hometown now, and go, like, "I need my old stuff" they go "That's a shame. That you still need it. Are you sure you need it? Would you like to buy a new account with more space?" Because right now it's OK. <br />
<br />
So Archive Team set out on its mission, and we've started to download things. We've been having a great old time. <br />
<br />
And then GeoCities went down. So, we were like, people came to us and they were like, "Hey, Archive Team. GeoCities? You gonna download it?" <br />
<br />
How many people here know GeoCities? Ah, right! See, they don't want to make noise and call attention to themselves. <br />
<br />
The thing about GeoCities, and I think GeoCities falls into this right now, right, GeoCities is the moromi. GeoCities is this place that started in 1994 as Beverly Hills Internet. Got turned into this very strange hosting company, gets bought by AOL -- not AOL -- by Yahoo for an enormous amount of cash. I mean an astounding amount of cash, billions of dollars, to become hosting. <br />
<br />
Now at the time of its purchase by Yahoo, it is the second or third, depending on the month, most-browsed site on the internet. This is the most popular of popular sites. It is huge. <br />
<br />
And one day, one day, they announced they were shutting it down -- oh, but I don't mean they really ANNOUNCED they were shutting it down. I meant that buried in one of the help files, which somebody brought to our attention, it said "I'm having trouble getting this done", and the answer was "Yes, because of the shutdown that functionality is currently not here." That was it! We're like "Wow, that is burying the fucking lead!" <br />
<br />
And what they did was, they were shutting down "sometime in the summer" And then all of this site was going to go down. Bear in mind that when GeoCities finally went down it was the 218th most browsed site on the net. It'd only gone down a little bit. Yahoo made no attempt to get rid of it, you know, they didn't try to sell it off or anything like that, they just simply said "OK, let's turn this off. Hooray for us." <br />
<br />
Now granted, you'll go look at one of the sites that was on there... And you'll be like "Well, of course. Of course." I mean look at this, then, this is the Rogue Cowboy. "Hey y'all, military couple, been here for a while." I'm reading this for you because you cannot possibly see it over the bucking bronco background. <br />
<br />
Also, I want to point out that there's a little gold item there and it says "HTML Writers Guild." Another thing that's kind of gone by the wayside is HTML guilds. Now it's just stock options. <br />
<br />
And the thing is, you know, you look at a site like that and you're like "Well this is... these guys are awful!" and I want to point out something I've really come to understand, which is how do we do this - how do we get rid of all of something?<br />
<br />
In fact, how do we destroy cultures, how do we destroy lives, how do we do this? And the answer is: disenfranchise, demean, delete. <br />
<br />
Disenfranchise: remove their ability to have any control over something. Like I said, with FaceBook, good luck calling them up to get something fixed. Good luck calling them up because something's not working like you expect. Go ahead and tell me that doesn't cost any money, and I get what I pay for, fine, but I'm telling you that's what the case is. <br />
<br />
Second, demean: tell people that this thing is useless. Look at this thing, it's ugly, by our design standards this thing fails our test. We the board of Vogue, we've decided that this thing is not to our liking. And then, delete. Then say "Who gives a shit about these people? These are nothing. Whatever." <br />
<br />
But we have to realize that for these people, this presentation, this website may be the widest audience that this genetic line has ever reached. And you can't turn away from that kind of power, even if that was never your hope. Printing a color photocopy was $1.50 a page at this time. To be able to do full color, occasionally with musical background, websites, that would have all the things you wanted to say? And it's interesting what people pull, for instance. <br />
<br />
Welcome to space! Now it's interesting that the projector really gets rid of the beauty of the star field. I feel like it's not really there. But bear in mind there's a beautiful star field there. It's not animated, but it's something that's there and this person obviously likes space. And there were areas in GeoCities for you to store, based on Hollywood space, gay queer, western, and so on. And you were able to declare what your kind of interests were and go down there. <br />
<br />
So, this particular person was in Area 51, the space thing. And there's a part in there called personal experiences which I just love, because you read it and their personal experiences are like "Was watching television. Felt outside of myself for twelve minutes. Continued watching television." Yeah. OK, great, hilarious, but also this person wanted to kind of express this, and obviously this leads to interesting conspiracy theorists and the paranormal network, and all of that. <br />
<br />
Let's go with this one. Welcome. Patrick Joel Mielke, born on April 16th, 1981, entered heaven April 17th, 1983. Page lovingly dedicated to Patrick Joel, child of God, uh... loaned to us for a very short time. It's a celebration of his life and the love and joy he so enriched his lives with." <br />
<br />
Now this is a woman, and this is I think what's sometimes not noticed here, the child died in 1983. This website was created in 1996. This is a woman who has enough pain at that time that when she sees GeoCities, where other people say "I'm gonna talk about watching TV" and "I'm gonna talk about my bucking bronco background and join the HTML Writers Guild", here's somebody who's saying "No, the world needs to know about my baby. I want to let everyone know how much I loved him." And she has pages after pages in the ten megabyte space, about how much her baby meant to her. <br />
<br />
So here's a case -- and by the way of course there was a web ring, you know what a web ring was, of, uh, what was it... It's an Angels web ring, so it's a bunch of parents who lost children who are under two, to talk about, you know, they touched an angel for short period of time. This is real stuff. This is as real to save as anything else, I think. So, dig on it. <br />
<br />
(AUDIENCE LAUGHTER AT NEW IMAGE) <br />
<br />
Gets better the longer you look at it. <br />
<br />
So, Jason, a question that now some of you who've never heard of before now will ask, "What the fuck is up with that, Jason?" It's a general question I get about everything. <br />
<br />
Alright, here's the deal. He's an Under Construction GIF. And he got wrapped up in the trawl, basically I went through a bunch of GeoCities stuff and found a bunch of Under Construction GIFs, and he was one of them. <br />
<br />
And I was like "What the hell is that? What's the fucking story about Bulgy McFish-Hat guy?" so I go look it up, and it's this guy in the Hollywood Hills section, and he is gay, and he has a page that he talks about his dream guy. Uh, it's from 1998. And he talks about what he wants in a man, what he will do with the man, where they'll go, the places he'll do, the dreams he'll live, it's from basically, like I said, 1998, and at the bottom it says "This is always under construction." And there is this guy at the bottom. <br />
<br />
In 2005, it's updated. And it says "No need to keep looking. I've found him." And it's just a story that turned out, even with the bulge, to be pretty heartwarming. All of this buried in the little tiny graphics interchange format. Which I believe just got out of copyright. Oh, I'm sorry, patent. <br />
<br />
OK. So when they closed it, right, Yahoo just decide to do this twerpy frigging goddamn thing. This to me is the embodiment of the problem. "Why did GeoCities close?" which by the way should really be said in like kind of a scream with a rending of documents, because that's usually when you "Why did GeoCities close???!!!?!?" <br />
<br />
"We have decided to focus on helping our customers explore and build relationships online in other ways." That's like shooting somebody and saying "I have plans for your car." All right. It's this sort of corporate douchebaggery that ensures that I will never work within a corporate environment again. <br />
<br />
(IMAGE ON SCREEN - 'FUCK YOU') <br />
<br />
I don't know, that's my visceral reaction, what do you think? So that's what Archive Team said, so we said "You know what, Let's download it." <br />
<br />
So... Downloading was very interesting. Downloading GeoCities was somewhat complicated. It took us about a hundred people to download over the course of about six months. We had no idea when the shutdown date was, right, so we just went at it. <br />
<br />
Now it turns out that GeoCities had a very interesting thing. You got a gigabyte of bandwidth a month. But! Only about twelve megabytes of it could come out every hour. Bait and switch. So we would try to do it, it would go "Sorry. Error. 999 error. Content limit has been reached." It didn't take long, by putting our heads together, by having all these assembled people on our IRC channel to have someone go, "Do you think they're locking out Google?" So we go look, and, nope, they're not locking out Google. So we changed all of our user-agents to "Not The GoogleBot". Free! <br />
<br />
At that point, we aimed a couple people at them, we had a hundred virtual machines that downloaded basically all of the -- GeoCities has an old neighborhood and the old neighborhood system which basically would be GeoCities slash WestHollywood slash... you know, Hills slash 2252. These are all pre-1999. <br />
<br />
When Yahoo got their nutsack on it, they just reapplied it across to the Yahoo section. So basically what they did was, you could be "GeoCities.com/~toolbag" and be whatever. So it was going to be harder to find them. But, man, we sent people after them, and we did, and we downloaded as much as we could, which turned out to be a little bit over a terabyte, of GeoCities. <br />
<br />
So, then what do you do? Well, first, bear in mind that this is GeoCities in 1999. This was a 9 terabyte array of theirs. Just to give you an idea of just how pathetic it is now, when people are like "Oh God, what are you going to do, where are you going to keep all that?" I can make a stack of nine terabytes right now that are barely functional. We have to keep in mind that this is a whole cage at Exodus dedicated to GeoCities. <br />
<br />
So we ended up with it, and we're sitting on it and then GeoCities went down and it was the usual like "Who cares?" and I put up those animated GIFs by basically going through this terabyte of data, and coming up with a collection of interesting GIFs. <br />
<br />
But then a year went by, and I thought, you know, we've got to get attention. We've got to remind people that GeoCities went down for no fucking reason. <br />
<br />
So we did what anyone would do, we torrented it. So we put ourselves up on The Pirate Bay, we have a 641GB - because it compressed well - torrent, with 7,854 files that were basically 7zs, and we put that sucker up. And we torrented it. We were until recently, I think it's changed, but we were until recently the second largest torrent that ever appeared. The number one was high-definition versions of all of the World Cup games. So, nice counterpoint, huh? World Cup games, GeoCities. <br />
<br />
Because we knew, right, that by warezing GeoCities, this would bring this massive amount of embarrassment back, and it did. We got all these great interviews and I'd put up this thing saying "Yahoo found a way--" I was quoted by Time for this -- "Yahoo found the way to destroy the most amount of history in the shortest amount of time." Alright, excellent. <br />
<br />
Then Yahoo Video announced it was going down. We got that! Well we were helped, of course because, Yahoo Video sucks. But it was 10 terabytes. We just downloaded all of the video. Everything, all of it. Luckily they used numeric IDs, very easy to go through. We ended up downloading it, we're in the process of getting it all back up again somewhere. <br />
<br />
And yes a lot of it is spam, some of it is really terrible. It seems to be really popular with people in countries that are not America, who were using it as a way that have stuff that needed bandwidth that they didn't have to pay for, where the bandwidth was expensive. But we've got this thing, and we were able to, you know, basically do this through the gift of volunteers who all work together very hard. <br />
<br />
These are all the things that Yahoo! has shut down in the last four years. Just so you understand. Yahoo briefcase, where you were able to store 10 megs, whenever you wanted, and get it from anywhere via FTP. They shut down. Why? No spare USB drive? Content Mash, some of these you won't know.<br />
<br />
Yahoo Pets was funded by Purina for a five year contract and on the day that the contract ran out they shut it down and redirected it to Yahoo Women. I don't know why, but they did. <br />
<br />
But it was a case of there was this secret contract, and when I say they shut down, I mean with no warning. One day it was there, one day it was gone. It had pet pictures, it had forums in it, everything, gone. Totally gone. So in other words I'm saying Yahoo blows, OK? It is a fucking clown car. I wouldn't trust them with like a backup of my nutsack, because these guys... This is a case where a company went speculatively into user generated content and when they decided it wasn't worth it any more, they got out of it. <br />
<br />
Like getting into a library and deciding "Oh, library business isn't working for us" and burning it to the ground. OK? And I've got people, I've got people who come to me and say "Yahoo was great to work at" - yeah, everywhere is great to work at if you're working for an arsonist company it's awesome! "We were trying to change the world", well, you sorta did. Awesome. Now you're using Bing as your search engine and you suck. <br />
<br />
Friendster went down this year. Friendster we only got 12 million of the 112 million accounts because it turns out that digital cameras really came into prominence in 2005, but we basically got most of the larger earlier sites from it, a lot of people, you know it's funny 'cause if you talk to people about Friendster, they know Friendster, they're like "Yeah I remember that, it was like a social network, I think I was on it" and we feel that like collecting this material -- and believe me it was a javascript nightmare, we had to write customized scripts to go through the javascript, negotiate it, we all had to create accounts on Friendster which was still allowed, up to its death. <br />
<br />
And all of us were like "Hobbies? Downloading Friendster." With some really funny giving the finger profile. "I'm downloading Friendster, that's why I'm here, what are you up to? Oh, you like cats, that's great." <br />
<br />
Not everyone likes what we're doing. This is Lulu Poetry - poetry.com. This was exciting. They gave everyone two weeks to get off. Fourteen million poems. And as you can see their suggestion was "Well, be sure to copy and paste your poems before we go down." So you can always remember them. <br />
<br />
"We're unable to save any customer information or poetry." Actually you don't hear that line a whole lot, do you? "I'm sorry, your poetry is unavailable." So we were like, okay, well we can do this. So we did. Within a short time we start getting banned. Locked out. So we switch IPs. They block out more. We have someone switch IPs, and we watch as an entire range is blocked out. We realized that there is a person or persons there, stopping us. <br />
<br />
So we switch to S3. We switched to Amazon instances and we start doing it that way. And they run out of ways to block us out. They threaten one of my members, who is in Australia, and fifteen, with legal whatever, and I'm trying to explain to him that a cease and desist is not a lawsuit. He's fifteen years old, he's in Australia, he's probably not going to be flown to America for downloading poetry without a license. Interpol is not going to get in on this shit. <br />
<br />
But, you know, whatever, make the kid nervous, but basically they were like, "No, no, you don't understand, we're actually going to be bought out, so this will survive, so we know what you're doing, it's OK" and I was like, "It's great you said that - fuck you!" and just kept going at it. So it turned out, as far as we can determine, they were on one shared server in some space, that's what their problem was, we were essentially, like I said, doing a distributed preservation of service attack. We took these guys out. We were taking them out, making a duplicate of them. <br />
<br />
So we got lots of millions and millions of poems, which we're holding on to, because they did go down, by the way at 12.01 of the day they announced they were going down. I could tell that there was like one guy, we were watching cases on a couple of nights where we would actually watch the blocking slow down, because we figured out people got tired and went home. So by three in the morning our Australia guys are like "Oh wow, free and clear!" <br />
<br />
And so, let me tell you nothing is greater than when you give somebody a goal that has blanketed on it some sort of moral righteousness. It does lead to some awesome shit, and fire. Anyway, so basically they, you know, they might come back, they might not. So sometimes you've got to be a little rough. We try not to be. <br />
<br />
So Google Video announced it was going down. Now we were scared. Because Google Videos is huge. So we did it anyway. We started downloading, we were at somewhere like the ninth or tenth terabyte. <br />
<br />
Downloading GeoCities, we had a distributed system that would download from these things, and we were like "Yup, gonna save it, what an embarrassment. Google Video, what an embarrassment, what YouTube, why don't you just transfer stuff to YouTube. What the hell's wrong with you? Why are we doing this? What's wrong with you, Yahoo?" -- Uh, sorry, Yahoo 2: Google! -- And basically, a week or two in, they give us an update. And the update says "OK, we're not going down. We're going to add a 'Migrate to YouTube' button, we're going to do what we need to do." <br />
<br />
So basically, what they-- what I found out later is that internally in Google they went "Look what Archive Team is saying, this is really embarrassing. We have to stop this." And so they went "Yeah, got it", so they went, basically, "Okay, we give up." And so we won. One of the few times we won. It's nice to win. <br />
<br />
So what's this? Guy with nine track tapes. Guy's got Usenet. That is Usenet from 1981 to 1991, all right. So here's what happened. So basically, this stuff is what became Dejan, went into the Google purchase of Deja News, which then became Google Groups, and Google then proceed to ruin it. Okay, if you know anything about Usenet, they ruined it. Unequivocally. <br />
<br />
And we made a very important discovery. A lot of people are starting to think of Google as some sort of archive or library. That they're storing all this data, they're running ads, they're really storing all this data. But Google is a library or an archive in the same way that a supermarket is a food museum. These guys are basically gonna do whatever they gotta do. <br />
<br />
So we took it, we took back, we found the original archives that Google had taken, we put them up on Archive.org. The UTZOO tapes. And people are doing with them - an Archive Team member did this. Not really associated with us, but did it and is part of us. Olduse.net. He is doing a real time posting of Usenet with a thirty-year lag. So you can go on, connect with the newsreader, and go experience what it was like. <br />
<br />
This particular one says "Perhaps you're not aware of it, but a new Star Trek movie is in the making this summer. While that is all well and good, there is a problem with it. It seems that Leonard Nimoy will no longer be available for the role of Spock and thus they're killing him off. Loyal Trekkies here have taken great offense to this, as well they should! There are better ways to remove the necessity of having the character present." And anyway, so that's good. And then we never saw Leonard Nimoy again. So you can connect to this thing and be able to use it right now. That's living history. <br />
<br />
Telehack.com, I don't have time to go into it. Telehack.com. OK, it looks like a command line, it is an entire world. Years of command-line history at that site. Spend a little time on it, get an account, it's unbelievable. All resuscitated from old archives. <br />
<br />
And I don't mind being made fun of, all of this whole thing. This is the stuff for my teammates who make fun of me, we believe in lots of great, crazy humor. <br />
<br />
So where else do we go from here? Well, we've got a group called wikiteam. They've written a thing from the outside, downloads tons and tons of wikis and we'll grab it and then we've been putting them up. So we've got wikiteam, if your wiki is like dying, we're gonna grab it, and we make the tools available so you can grab any wiki from anywhere else. <br />
<br />
URL Team, because URL shortening was a fucking awful idea. URL shortening is like DNS retarded. You're gonna let some third party generally decide what everything you do directs to. You are stupid. I understand use of URL shorteners on a per-site basis, making a Flickr, that's fl.kr, whatever, that makes sense, but this is awful because if these things go down now anyone looking at the history, it's like people are talking cryptographic code. "Here's this awesome site... thing you can't figure out." <br />
<br />
So we have been taking them on, this group over here has been basically taking all these old URL shorteners and turning them into archives that we then torrent. So, I also want to point out, this is Chronomex, Jeroenz0r, Soultcer, Swebb, Underscor, not me. This is not just a Jason Scott project. These guys, I just, I'm planning a fire, but these guys are going somewhere with it and they don't always need me, that's very important. <br />
<br />
What else? What's left to save? I don't know if many of you know Len Sassaman. He was a wonderful cryptographer, wonderful human being who took his own life just a very short period of time ago. His wake was just last week, actually, and he was a big presenter at DEFCON. If you start going through the archives, you will find him there, he's a brilliant person who left a lot of friends and a lot of memories. He's a wonderful guy, and his widow said to me "Can you archive him?" <br />
<br />
So I started a project called "Away From Keyboard". This is on Archive.org. And what we're doing is we are collecting artifacts from various people who have passed on to turn them into collections of files that at least we can get some piece of these people who are gone, and we can remember them and be able to build from them. So it doesn't always have to be about websites, it doesn't always have to be discs that I'm trying to save here. It's everything, and I think that's just critical. <br />
<br />
What did we learn here? <br />
<br />
We learned I'm really loud. We learned a lot of profanity, but hopefully you'll look at a web site that's going down, into something that's dying, and you'll say to yourself "Okay, that's not just a piece of crap, that's something that's meaningful to people. That's something that matters to people." <br />
<br />
And I hope that that, you know, piece, sticks with you, if nothing else I did. So my final question for you is, "OK, is anyone here from Archive Team?" No, fuck you. You are all in Archive Team. I officially deputize you. You are allowed to be in Archive Team. Go where you need to, keep backups, store them somewhere, throw them over to someplace you don't remember, give me copies later, or give my successor copies later, about these things, because it turns out what you walk in is history, because the hardest part of history is to be there when it happens. That's the hardest part of any historian's job. And by being what you are right now doing, is you are in companies, you are with people, you are visiting things, and you are with history. So please, save it for the future because the future will wonder why the fuck we all thought Under Construction GIFs were so important. <br />
<br />
Sometimes I put that as my user profile when I'm downloading a site. It hits the message, doesn't it? I'm just saying, if you take your site down, I'll see you there. <br />
<br />
Jason Scott, Archive Team. Thank you for coming. And please, one more bit. Dedicated to Tim Recher who unfortunately died before really giving a big presentation here, so I'm just proud to say, my secret co-presenter Tim Recher. Thank you so much.<br />
<br />
{{Navigation box}}</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Talk:ArchiveTeam_Warrior&diff=30175Talk:ArchiveTeam Warrior2017-12-08T01:03:19Z<p>Ola norsk: /* Some notes on using KVM/QUEMU */ typo fix</p>
<hr />
<div>==Some raw notes==<br />
<br />
--[[User:BlueMax|BlueMax]] 06:41, 3 August 2012 (EDT)<br />
<br />
Here's my plaintext unformatted dump of what I've written so far: <br />
<br />
Setting up the Warrior:<br />
<br />
* Your machine needs to be relatively powerful with an internet connection to run the Warrior. You'll need at least:<br />
<br />
- a 2.0ghz dual core processor<br />
- 2 gigabytes of RAM<br />
- 100GB of hard drive space<br />
- a fast internet connection (at the very least 1mbps down/up)<br />
<br />
* Although we do recommend something more powerful (or if you plan to do more with your machine while you also use the Warrior):<br />
<br />
- a quad core processor<br />
- 4 gigabytes of RAM<br />
- as fast of an internet connection as possible (we love universities!)<br />
- most or all of your background downloading/uploading programs turned off (such as torrent clients)<br />
<br />
* To use the Warrior, we recommend using VirtualBox. VirtualBox is a program available for all major operating systems that emulates a desktop computer. The Warrior is a preconfigured Linux system (or "virtual appliancce") designed to automate the process of downloading and uploading data for an ArchiveTeam project.<br />
<br />
* Visit https://www.virtualbox.org/wiki/Downloads and download the latest version for your platform, and:<br />
<br />
For Windows:<br />
<br />
-Open and follow the setup as prompted. Default settings will be fine for the most part. Do not unselect the VirtualBox Networking selection where it appears, as it is required to run the Warrior.<br />
<br />
For Mac OS X:<br />
<br />
-TBA<br />
<br />
For Linux systems:<br />
<br />
-TBA<br />
<br />
* Once you have VirtualBox installed, open it via your preferred method (command line, shortcut or what have you).<br />
<br />
* Now download the Warrior using the below link. The current version of the Warrior is a 167MB .ova (virtual appliance) file. You'll download this file and import it into VirtualBox. Save it to somewhere you will be able to access it.<br />
<br />
* When the main menu has opened, click File > Import Appliance. You'll be given a pop-up window. Click the Choose box and navigate to the .ova file you just downloaded, and select it, then click Next. <br />
<br />
* Do not uncheck either of the tick boxes in the next window, simply click Import. It may take a few minutes for the next part of the process to take place.<br />
<br />
Now to boot the ArchiveTeam Warrior and get it working on a project.<br />
<br />
* Double-click the new option that has just appeared in the main VirtualBox window. It should have the name "archiveteam-warrior-2" or similar. A new popup window will appear if you have done it right.<br />
<br />
* While the system boots in the background, a few VirtualBox pop-up messages may appear. Feel free to just click "OK" on them. There should be no need to touch any of the options or press any keys on your keyboard until the warrior has booted up.<br />
<br />
* You'll eventually be presented with a screen that says "Configure your warrior via the web interface." Minimize the VirtualBox window on your desktop (if you are unable to move your mouse, press the Right Control button on your keyboard). <br />
<br />
* Open your choice of web browser. We require a modern web browser (the latest versions of Firefox and Chrome will work, we do not support IE on account of not willing to be suicidal) and enter "http://localhost:8001" without quotes into the address bar, and press Enter.<br />
<br />
* If you've done this correctly you should see the ArchiveTeam Warrior page. On the left side of the screen you'll see several options, including "All projects" and "Your settings". We want to set up your settings first, so click "Your settings".<br />
<br />
* Enter your nickname in the first box so that we can identify who you are on our tracker. Only use letters and numbers (Ph1shF00d would work, but Ph*shF**d wouldn't, for example).<br />
<br />
* The second box is how many items will download at a time. You may put this up as high as 6 if you have a very speedy internet connection, but slower connections may want to stick to the default selection.<br />
<br />
* You can leave the rest of the settings as they are. Click "Save settings" once you're done and then click "All projects" on the left pane.<br />
<br />
* You'll be presented with a list of projects on the right pane of the browser window. If you just want the Warrior to do what the ArchiveTeam wants it to, simply click the "Work on this project" button to the right of the "ArchiveTeam's Choice" project. Your browser will be redirected to the "Current Project" tab and the Warrior will start work on what the main project for ArchiveTeam currently is. If you see a bunch of black windows scrolling down your screen, your Warrior is working as intended and you're free to leave your computer, or do other things. (You may close the web browser window as well, it won't affect the Warrior).<br />
<br />
* If you want to select a different project, simply go to the "All projects" page again and select which one you'd like. Stopping your Warrior is just as simple, go to the All projects page and click "Stop this project" at the top of the page (under "Your current project").<br />
<br />
* To shut down the ArchiveTeam Warrior, click the "Shut down" button on the page, and close the webpage. Eventually the VirtualBox window will close (in the background if you minimised it) and you can close the main VirtualBox window.<br />
<br />
==Some additional notes==<br />
<br />
Here's some of the things I noticed when using the warrior:<br />
<br />
* use the archiveteam-warrior-v2 .ovf from archive.org, not the v1 linked from the article!<br />
* if you use vmware, ignore the warning about the .ovf file not passing validation<br />
** however, before starting the VM, you have to do these steps<br />
***remove the VM from the list of favorites on the left side (simply select it and hit DEL -- don't worry, you won't delete it and will add it again later)<br />
***edit the *.vmx file (virtual machine config file)<br />
**** If on OS X, this can be found by right-clicking on the .vmwarevm file that VMWare generates and selecting "Show package contents" - [[User:Machawk1|Machawk1]] 15:54, 5 August 2014 (EDT)<br />
***change all lines that start with "ide1:1" to "ide1:0" (i.e. replace the second 1 with a 0). This is because the .ovf file specifies the second harddisk as secondary slave, which won't work in VMware if you don't configure a secondary master first.<br />
***re-add the vmx file to VMware. Either double-click it, or drag&drop it to the are where you deleted it earlier.<br />
* The start screen will tell you to open "http://localhost:8001". At least on VMware (and I guess also on VirtualBox, but I don't know) this will not work. Instead, do the following:<br />
**Press Alt-F3 to get to the 3rd console<br />
**login as "root" with password "archiveteam"<br />
**type "ifconfig" and note the IP address given under "eth0"<br />
**enter "http://x.x.x.x:8001" in your browser (where x.x.x.x is the IP address you noted)<br />
**(I really hope the login screen will, at some point in time, display the external IP Address instead of "localhost")<br />
<br />
--[[User:Darkstar|Darkstar]] 18:55, 9 August 2012 (EDT)<br />
<br />
== Autorun for lazy people ==<br />
<br />
I've given up running the warrior on fedora because I have to fix the VirtualBox packages manually every time I update the kernel, but I'm now trying on an Ubuntu machine I don't use personally. Something that the instructions should cover, at some point, is how to set it up so that it's started automatically at system boot. I found only slightly complicated instructions around the web, but also some indication that it may be much easier with VirtualBox 4.2 (Ubuntu repos currently have 4.1, AFAICS). --[[User:Nemo_bis|Nemo]] 07:55, 12 June 2013 (EDT)<br />
<br />
== Lifetime stats page for the warrior? ==<br />
<br />
It'd be nice if the warrior had a lifetime bandwidth stats page.<br />
<br />
== AWS image ==<br />
<br />
To get some additional visibility and make the life of the occasional users of the Warrior on AWS, it would be nice to get an ArchiveTeam Warrior image in the [https://aws.amazon.com/marketplace/ AWS marketplace] (for free, of course). Also in other directories and for other hosting providers, of course, if someone prefers contributing to those. --[[User:Nemo_bis|Nemo]] 11:18, 23 December 2013 (EST)<br />
<br />
== Viewing the Control Panel on a external computer ==<br />
<br />
If you run the Warrior VM on an external computer, it might be useful to view the control panel from your main computer.<br />
<br />
To do this, on the remote computer, go to VirtualBox, select the Warrior VM, Settings, and then Network.<br />
<br />
If not already down, click the Advanced tab and you should then see a button that says Port Forwarding. Click it.<br />
<br />
In the Web Interface Row, find the Host IP Column, and change it to 0.0.0.0 and press OK.<br />
<br />
On your main computer, you can now type in the remote computer's local IP address into the web browser with the port number to view its stats.<br />
<br />
Example: http//192.168.1.57:8001<br />
<br />
--[[User:Crypto|Crypto]] 16:48, 10 August 2014 (EDT)<br />
<br />
== Running more than one warrior on a computer (VirtualBox) ==<br />
<br />
Sometimes the situation might arise that you want to run two or even more warriors on one machine. For this example, we are going to assume you are already running one warrior, and you want to add another one.<br />
<br />
Please note, make sure you have enough resources to run multiple warrior VMs. Each Warrior will use 400MB of ram, and at Max 60GB of Disk Space.<br />
<br />
First off, you are going to need to import the warrior instance. If you still have the .ova file you used to import the original warrior, you may use import it again.<br />
<br />
While importing, make sure to change the name.<br />
<br />
Once imported, go to the second warrior and go to settings.<br />
<br />
In the settings menu, go to Network, then click the Advanced Tab, and click on Port Forwarding.<br />
<br />
Once in the Port Forwarding menu, change the host port to 8002, but leave the guest one alone.<br />
<br />
Done! You can boot up the second warrior, and administrate it through port 8002. You may change the host port to whatever you want, just leave the guest port to 8001.<br />
<br />
--[[User:Crypto|Crypto]] 02:55, 11 August 2014 (EDT)<br />
<br />
That's very useful but instead of importing a new warrior I just right clicked on my original one and chosen clone. I created as many warriors as my laptop could afford -taking into consideration its specs- and then followed you steps about Port Forwarding. It's a very easy and straightforward process.<br />
<br />
--[[User:PanoIgano|PanoIgano]] 16:17, 08 May 2015 (EDT)<br />
<br />
== Fix broken anchor links ==<br />
<br />
Previous proposal "Split the FAQ into "Problems" and "I'm curious" sections" was done. It has left broken HTML anchor URLs that previously linked to specific sections of the guide e.g. http://archiveteam.org/index.php?title=Warrior#Help.21_The_warrior_is_eating_all_my_bandwidth.21 from here: reddit /help_us_archive_the_steam_users_forum_aka_spuf/<br />
Is there a way to specifically add just anchors to a page?<br />
<br />
[[User:VADemon|VADemon]] ([[User talk:VADemon|talk]]) 14:48, 5 October 2017 (EDT)<br />
<br />
== Some notes on using KVM/QEMU ==<br />
<br />
Putting some stuff here for further looking into.<br />
<br />
* [https://wiki.hackzine.org/sysadmin/kvm-import-ova.html KVM: Importing an OVA appliance]<br />
<br />
----</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Talk:ArchiveTeam_Warrior&diff=30173Talk:ArchiveTeam Warrior2017-12-08T00:22:50Z<p>Ola norsk: /* Some notes on using KVM/QUEMU */ new section</p>
<hr />
<div>==Some raw notes==<br />
<br />
--[[User:BlueMax|BlueMax]] 06:41, 3 August 2012 (EDT)<br />
<br />
Here's my plaintext unformatted dump of what I've written so far: <br />
<br />
Setting up the Warrior:<br />
<br />
* Your machine needs to be relatively powerful with an internet connection to run the Warrior. You'll need at least:<br />
<br />
- a 2.0ghz dual core processor<br />
- 2 gigabytes of RAM<br />
- 100GB of hard drive space<br />
- a fast internet connection (at the very least 1mbps down/up)<br />
<br />
* Although we do recommend something more powerful (or if you plan to do more with your machine while you also use the Warrior):<br />
<br />
- a quad core processor<br />
- 4 gigabytes of RAM<br />
- as fast of an internet connection as possible (we love universities!)<br />
- most or all of your background downloading/uploading programs turned off (such as torrent clients)<br />
<br />
* To use the Warrior, we recommend using VirtualBox. VirtualBox is a program available for all major operating systems that emulates a desktop computer. The Warrior is a preconfigured Linux system (or "virtual appliancce") designed to automate the process of downloading and uploading data for an ArchiveTeam project.<br />
<br />
* Visit https://www.virtualbox.org/wiki/Downloads and download the latest version for your platform, and:<br />
<br />
For Windows:<br />
<br />
-Open and follow the setup as prompted. Default settings will be fine for the most part. Do not unselect the VirtualBox Networking selection where it appears, as it is required to run the Warrior.<br />
<br />
For Mac OS X:<br />
<br />
-TBA<br />
<br />
For Linux systems:<br />
<br />
-TBA<br />
<br />
* Once you have VirtualBox installed, open it via your preferred method (command line, shortcut or what have you).<br />
<br />
* Now download the Warrior using the below link. The current version of the Warrior is a 167MB .ova (virtual appliance) file. You'll download this file and import it into VirtualBox. Save it to somewhere you will be able to access it.<br />
<br />
* When the main menu has opened, click File > Import Appliance. You'll be given a pop-up window. Click the Choose box and navigate to the .ova file you just downloaded, and select it, then click Next. <br />
<br />
* Do not uncheck either of the tick boxes in the next window, simply click Import. It may take a few minutes for the next part of the process to take place.<br />
<br />
Now to boot the ArchiveTeam Warrior and get it working on a project.<br />
<br />
* Double-click the new option that has just appeared in the main VirtualBox window. It should have the name "archiveteam-warrior-2" or similar. A new popup window will appear if you have done it right.<br />
<br />
* While the system boots in the background, a few VirtualBox pop-up messages may appear. Feel free to just click "OK" on them. There should be no need to touch any of the options or press any keys on your keyboard until the warrior has booted up.<br />
<br />
* You'll eventually be presented with a screen that says "Configure your warrior via the web interface." Minimize the VirtualBox window on your desktop (if you are unable to move your mouse, press the Right Control button on your keyboard). <br />
<br />
* Open your choice of web browser. We require a modern web browser (the latest versions of Firefox and Chrome will work, we do not support IE on account of not willing to be suicidal) and enter "http://localhost:8001" without quotes into the address bar, and press Enter.<br />
<br />
* If you've done this correctly you should see the ArchiveTeam Warrior page. On the left side of the screen you'll see several options, including "All projects" and "Your settings". We want to set up your settings first, so click "Your settings".<br />
<br />
* Enter your nickname in the first box so that we can identify who you are on our tracker. Only use letters and numbers (Ph1shF00d would work, but Ph*shF**d wouldn't, for example).<br />
<br />
* The second box is how many items will download at a time. You may put this up as high as 6 if you have a very speedy internet connection, but slower connections may want to stick to the default selection.<br />
<br />
* You can leave the rest of the settings as they are. Click "Save settings" once you're done and then click "All projects" on the left pane.<br />
<br />
* You'll be presented with a list of projects on the right pane of the browser window. If you just want the Warrior to do what the ArchiveTeam wants it to, simply click the "Work on this project" button to the right of the "ArchiveTeam's Choice" project. Your browser will be redirected to the "Current Project" tab and the Warrior will start work on what the main project for ArchiveTeam currently is. If you see a bunch of black windows scrolling down your screen, your Warrior is working as intended and you're free to leave your computer, or do other things. (You may close the web browser window as well, it won't affect the Warrior).<br />
<br />
* If you want to select a different project, simply go to the "All projects" page again and select which one you'd like. Stopping your Warrior is just as simple, go to the All projects page and click "Stop this project" at the top of the page (under "Your current project").<br />
<br />
* To shut down the ArchiveTeam Warrior, click the "Shut down" button on the page, and close the webpage. Eventually the VirtualBox window will close (in the background if you minimised it) and you can close the main VirtualBox window.<br />
<br />
==Some additional notes==<br />
<br />
Here's some of the things I noticed when using the warrior:<br />
<br />
* use the archiveteam-warrior-v2 .ovf from archive.org, not the v1 linked from the article!<br />
* if you use vmware, ignore the warning about the .ovf file not passing validation<br />
** however, before starting the VM, you have to do these steps<br />
***remove the VM from the list of favorites on the left side (simply select it and hit DEL -- don't worry, you won't delete it and will add it again later)<br />
***edit the *.vmx file (virtual machine config file)<br />
**** If on OS X, this can be found by right-clicking on the .vmwarevm file that VMWare generates and selecting "Show package contents" - [[User:Machawk1|Machawk1]] 15:54, 5 August 2014 (EDT)<br />
***change all lines that start with "ide1:1" to "ide1:0" (i.e. replace the second 1 with a 0). This is because the .ovf file specifies the second harddisk as secondary slave, which won't work in VMware if you don't configure a secondary master first.<br />
***re-add the vmx file to VMware. Either double-click it, or drag&drop it to the are where you deleted it earlier.<br />
* The start screen will tell you to open "http://localhost:8001". At least on VMware (and I guess also on VirtualBox, but I don't know) this will not work. Instead, do the following:<br />
**Press Alt-F3 to get to the 3rd console<br />
**login as "root" with password "archiveteam"<br />
**type "ifconfig" and note the IP address given under "eth0"<br />
**enter "http://x.x.x.x:8001" in your browser (where x.x.x.x is the IP address you noted)<br />
**(I really hope the login screen will, at some point in time, display the external IP Address instead of "localhost")<br />
<br />
--[[User:Darkstar|Darkstar]] 18:55, 9 August 2012 (EDT)<br />
<br />
== Autorun for lazy people ==<br />
<br />
I've given up running the warrior on fedora because I have to fix the VirtualBox packages manually every time I update the kernel, but I'm now trying on an Ubuntu machine I don't use personally. Something that the instructions should cover, at some point, is how to set it up so that it's started automatically at system boot. I found only slightly complicated instructions around the web, but also some indication that it may be much easier with VirtualBox 4.2 (Ubuntu repos currently have 4.1, AFAICS). --[[User:Nemo_bis|Nemo]] 07:55, 12 June 2013 (EDT)<br />
<br />
== Lifetime stats page for the warrior? ==<br />
<br />
It'd be nice if the warrior had a lifetime bandwidth stats page.<br />
<br />
== AWS image ==<br />
<br />
To get some additional visibility and make the life of the occasional users of the Warrior on AWS, it would be nice to get an ArchiveTeam Warrior image in the [https://aws.amazon.com/marketplace/ AWS marketplace] (for free, of course). Also in other directories and for other hosting providers, of course, if someone prefers contributing to those. --[[User:Nemo_bis|Nemo]] 11:18, 23 December 2013 (EST)<br />
<br />
== Viewing the Control Panel on a external computer ==<br />
<br />
If you run the Warrior VM on an external computer, it might be useful to view the control panel from your main computer.<br />
<br />
To do this, on the remote computer, go to VirtualBox, select the Warrior VM, Settings, and then Network.<br />
<br />
If not already down, click the Advanced tab and you should then see a button that says Port Forwarding. Click it.<br />
<br />
In the Web Interface Row, find the Host IP Column, and change it to 0.0.0.0 and press OK.<br />
<br />
On your main computer, you can now type in the remote computer's local IP address into the web browser with the port number to view its stats.<br />
<br />
Example: http//192.168.1.57:8001<br />
<br />
--[[User:Crypto|Crypto]] 16:48, 10 August 2014 (EDT)<br />
<br />
== Running more than one warrior on a computer (VirtualBox) ==<br />
<br />
Sometimes the situation might arise that you want to run two or even more warriors on one machine. For this example, we are going to assume you are already running one warrior, and you want to add another one.<br />
<br />
Please note, make sure you have enough resources to run multiple warrior VMs. Each Warrior will use 400MB of ram, and at Max 60GB of Disk Space.<br />
<br />
First off, you are going to need to import the warrior instance. If you still have the .ova file you used to import the original warrior, you may use import it again.<br />
<br />
While importing, make sure to change the name.<br />
<br />
Once imported, go to the second warrior and go to settings.<br />
<br />
In the settings menu, go to Network, then click the Advanced Tab, and click on Port Forwarding.<br />
<br />
Once in the Port Forwarding menu, change the host port to 8002, but leave the guest one alone.<br />
<br />
Done! You can boot up the second warrior, and administrate it through port 8002. You may change the host port to whatever you want, just leave the guest port to 8001.<br />
<br />
--[[User:Crypto|Crypto]] 02:55, 11 August 2014 (EDT)<br />
<br />
That's very useful but instead of importing a new warrior I just right clicked on my original one and chosen clone. I created as many warriors as my laptop could afford -taking into consideration its specs- and then followed you steps about Port Forwarding. It's a very easy and straightforward process.<br />
<br />
--[[User:PanoIgano|PanoIgano]] 16:17, 08 May 2015 (EDT)<br />
<br />
== Fix broken anchor links ==<br />
<br />
Previous proposal "Split the FAQ into "Problems" and "I'm curious" sections" was done. It has left broken HTML anchor URLs that previously linked to specific sections of the guide e.g. http://archiveteam.org/index.php?title=Warrior#Help.21_The_warrior_is_eating_all_my_bandwidth.21 from here: reddit /help_us_archive_the_steam_users_forum_aka_spuf/<br />
Is there a way to specifically add just anchors to a page?<br />
<br />
[[User:VADemon|VADemon]] ([[User talk:VADemon|talk]]) 14:48, 5 October 2017 (EDT)<br />
<br />
== Some notes on using KVM/QUEMU ==<br />
<br />
Putting some stuff here for further looking into.<br />
<br />
* [https://wiki.hackzine.org/sysadmin/kvm-import-ova.html KVM: Importing an OVA appliance]<br />
<br />
----</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Talk:Software&diff=30162Talk:Software2017-12-05T19:50:34Z<p>Ola norsk: create discussion page + added personal notes about webrecorders online service</p>
<hr />
<div><br />
<br />
<br />
== Some pros and cons notes about my own experience using Webrecorder's online service ==<br />
<br />
Note that I've only found and used this service today.<br />
<br />
* It's absolutely free (from what i could tell).<br />
* '''Capture sessions seem to be limited to ~3 hours of run time.'''<br />
** This is what made it not ideal for my intended use. Though it seems perfect for many other situations.<br />
* (But) it seems to offer the ability to 'patch' previous WARC captures.<br />
* It does auto-scrolling.<br />
* It seems to offer various browser versions to choose from.<br />
--[[User:Ola norsk|Ola norsk]] ([[User talk:Ola norsk|talk]]) 14:50, 5 December 2017 (EST)</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Software&diff=30161Software2017-12-05T19:24:58Z<p>Ola norsk: /* Hosted tools */ added link to webrecorder github repo</p>
<hr />
<div>__NOTOC__<br />
== WARC Tools ==<br />
[[The WARC Ecosystem]] includes information on wget, Heritrix and a lot of little but handy tools to create, read and process WARC files.<br />
<br />
== General Tools ==<br />
<br />
* [[Wget|GNU WGET]]<br />
** Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"<br />
* [http://curl.haxx.se/ cURL]<br />
* [http://www.httrack.com/ HTTrack] - [[HTTrack options]]<br />
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible<br />
* http://warrick.cs.odu.edu/warrick.html<br />
* [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping<br />
* [http://scrapy.org/ Scrapy] - Fast python library for web scraping<br />
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places<br />
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]).<br />
* [[Mobile Phone Applications]] -- some notes on preserving old versions of mobile apps<br />
* [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of 22:52, 11 June 2016 (EDT)<br />
<br />
== Hosted tools ==<br />
* [http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.<br />
<br />
* [https://webrecorder.io Webrecorder] is both a tool to create high-fidelity, interactive web archives of any web site you browse in WARC format and a platform to make those recordings accessible. See their [https://webrecorder.io/_faq FAQ page] or their [https://github.com/webrecorder/webrecorder Github repo] for more information.<br />
<br />
== Site-Specific ==<br />
<br />
* [[Google]]<br />
* [[Livejournal]]<br />
* [[Twitter]]<br />
* [http://code.google.com/p/somaseek/ SomaFM]<br />
* http://www.allmytweets.net/ - Download the last 3,200 tweets from any user.<br />
<br />
== Format Specific ==<br />
<br />
* [http://www.shlock.co.uk/Utils/OmniFlop/OmniFlop.htm OmniFlop]<br />
<br />
== Web scraping ==<br />
<br />
* See [[Site exploration]]<br />
<br />
{{Navigation pager<br />
| previous = Why Back Up?<br />
| next = Formats<br />
}}<br />
{{Navigation box}}<br />
<br />
[[Category:Tools| ]]</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Software&diff=30159Software2017-12-05T19:11:59Z<p>Ola norsk: /* Hosted tools */ added link to webrecorder FAQ page</p>
<hr />
<div>__NOTOC__<br />
== WARC Tools ==<br />
[[The WARC Ecosystem]] includes information on wget, Heritrix and a lot of little but handy tools to create, read and process WARC files.<br />
<br />
== General Tools ==<br />
<br />
* [[Wget|GNU WGET]]<br />
** Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"<br />
* [http://curl.haxx.se/ cURL]<br />
* [http://www.httrack.com/ HTTrack] - [[HTTrack options]]<br />
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible<br />
* http://warrick.cs.odu.edu/warrick.html<br />
* [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping<br />
* [http://scrapy.org/ Scrapy] - Fast python library for web scraping<br />
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places<br />
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]).<br />
* [[Mobile Phone Applications]] -- some notes on preserving old versions of mobile apps<br />
* [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of 22:52, 11 June 2016 (EDT)<br />
<br />
== Hosted tools ==<br />
* [http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.<br />
<br />
* [https://webrecorder.io Webrecorder] is both a tool to create high-fidelity, interactive web archives of any web site you browse and a platform to make those recordings accessible. See their [https://webrecorder.io/_faq FAQ page] for more information.<br />
<br />
== Site-Specific ==<br />
<br />
* [[Google]]<br />
* [[Livejournal]]<br />
* [[Twitter]]<br />
* [http://code.google.com/p/somaseek/ SomaFM]<br />
* http://www.allmytweets.net/ - Download the last 3,200 tweets from any user.<br />
<br />
== Format Specific ==<br />
<br />
* [http://www.shlock.co.uk/Utils/OmniFlop/OmniFlop.htm OmniFlop]<br />
<br />
== Web scraping ==<br />
<br />
* See [[Site exploration]]<br />
<br />
{{Navigation pager<br />
| previous = Why Back Up?<br />
| next = Formats<br />
}}<br />
{{Navigation box}}<br />
<br />
[[Category:Tools| ]]</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Software&diff=30158Software2017-12-05T19:08:55Z<p>Ola norsk: /* Hosted tools */ distinguished the two sites by bullet points</p>
<hr />
<div>__NOTOC__<br />
== WARC Tools ==<br />
[[The WARC Ecosystem]] includes information on wget, Heritrix and a lot of little but handy tools to create, read and process WARC files.<br />
<br />
== General Tools ==<br />
<br />
* [[Wget|GNU WGET]]<br />
** Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"<br />
* [http://curl.haxx.se/ cURL]<br />
* [http://www.httrack.com/ HTTrack] - [[HTTrack options]]<br />
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible<br />
* http://warrick.cs.odu.edu/warrick.html<br />
* [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping<br />
* [http://scrapy.org/ Scrapy] - Fast python library for web scraping<br />
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places<br />
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]).<br />
* [[Mobile Phone Applications]] -- some notes on preserving old versions of mobile apps<br />
* [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of 22:52, 11 June 2016 (EDT)<br />
<br />
== Hosted tools ==<br />
* [http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.<br />
<br />
* [https://webrecorder.io Webrecorder] is both a tool to create high-fidelity, interactive web archives of any web site you browse and a platform to make those recordings accessible.<br />
<br />
== Site-Specific ==<br />
<br />
* [[Google]]<br />
* [[Livejournal]]<br />
* [[Twitter]]<br />
* [http://code.google.com/p/somaseek/ SomaFM]<br />
* http://www.allmytweets.net/ - Download the last 3,200 tweets from any user.<br />
<br />
== Format Specific ==<br />
<br />
* [http://www.shlock.co.uk/Utils/OmniFlop/OmniFlop.htm OmniFlop]<br />
<br />
== Web scraping ==<br />
<br />
* See [[Site exploration]]<br />
<br />
{{Navigation pager<br />
| previous = Why Back Up?<br />
| next = Formats<br />
}}<br />
{{Navigation box}}<br />
<br />
[[Category:Tools| ]]</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=Software&diff=30157Software2017-12-05T19:07:08Z<p>Ola norsk: /* Hosted tools */ Added Webrecorder to hosted tools</p>
<hr />
<div>__NOTOC__<br />
== WARC Tools ==<br />
[[The WARC Ecosystem]] includes information on wget, Heritrix and a lot of little but handy tools to create, read and process WARC files.<br />
<br />
== General Tools ==<br />
<br />
* [[Wget|GNU WGET]]<br />
** Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"<br />
* [http://curl.haxx.se/ cURL]<br />
* [http://www.httrack.com/ HTTrack] - [[HTTrack options]]<br />
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible<br />
* http://warrick.cs.odu.edu/warrick.html<br />
* [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping<br />
* [http://scrapy.org/ Scrapy] - Fast python library for web scraping<br />
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places<br />
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]).<br />
* [[Mobile Phone Applications]] -- some notes on preserving old versions of mobile apps<br />
* [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of 22:52, 11 June 2016 (EDT)<br />
<br />
== Hosted tools ==<br />
[http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.<br />
<br />
[https://webrecorder.io Webrecorder] is both a tool to create high-fidelity, interactive web archives of any web site you browse and a platform to make those recordings accessible.<br />
<br />
== Site-Specific ==<br />
<br />
* [[Google]]<br />
* [[Livejournal]]<br />
* [[Twitter]]<br />
* [http://code.google.com/p/somaseek/ SomaFM]<br />
* http://www.allmytweets.net/ - Download the last 3,200 tweets from any user.<br />
<br />
== Format Specific ==<br />
<br />
* [http://www.shlock.co.uk/Utils/OmniFlop/OmniFlop.htm OmniFlop]<br />
<br />
== Web scraping ==<br />
<br />
* See [[Site exploration]]<br />
<br />
{{Navigation pager<br />
| previous = Why Back Up?<br />
| next = Formats<br />
}}<br />
{{Navigation box}}<br />
<br />
[[Category:Tools| ]]</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=File:Jscott_geocities.png&diff=29923File:Jscott geocities.png2017-11-26T03:45:00Z<p>Ola norsk: Ola norsk uploaded a new version of File:Jscott geocities.png</p>
<hr />
<div>a quote or "fan art" i guess..definitely a quote though</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=File:Jscott_geocities.png&diff=29922File:Jscott geocities.png2017-11-26T03:41:38Z<p>Ola norsk: a quote or "fan art" i guess..definitely a quote though</p>
<hr />
<div>a quote or "fan art" i guess..definitely a quote though</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=User:Ola_norsk&diff=29917User:Ola norsk2017-11-22T23:17:35Z<p>Ola norsk: </p>
<hr />
<div>Merely, drunkard'ly and merrily, a '''Loudmouth'''!<br />
(Including some scarce knowledge of preservation techniques for timber-constructions: ''Soak it in chlorine, and let it dry out!'')</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=User:Ola_norsk&diff=29916User:Ola norsk2017-11-22T23:15:41Z<p>Ola norsk: </p>
<hr />
<div>Merely, drunkard'ly and merrily, a '''Loudmouth'''!<br />
(Including some scarce knowledge of preservation techniques of timber-constructions techniques; Dunk it in chlorine, and let it dry out!)</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=User:Ola_norsk&diff=29915User:Ola norsk2017-11-22T23:06:25Z<p>Ola norsk: </p>
<hr />
<div>Merely, drunkard'ly and merrily, a Loudmouth!<br />
(Including some scarce knowledge of preservation of timber-constructions techniques; Dunk it in chlorine, and let it dry out!)</div>Ola norskhttps://wiki.archiveteam.org/index.php?title=User:Ola_norsk&diff=29914User:Ola norsk2017-11-22T22:57:05Z<p>Ola norsk: Created page with "Merely a loudmouth"</p>
<hr />
<div>Merely a loudmouth</div>Ola norsk