Difference between revisions of "User talk:Archive Maniac"

From Archiveteam
Jump to navigation Jump to search
Line 90: Line 90:


:[[User:bzc6p|bzc6p]] ([[User_talk:bzc6p|talk]]) 19 October 2014, 22:25 (UTC+2)
:[[User:bzc6p|bzc6p]] ([[User_talk:bzc6p|talk]]) 19 October 2014, 22:25 (UTC+2)
::wpull [http://wpull.readthedocs.org/en/master/changelog.html#id3 has just dropped] Python2 support.
::You can run Python programs on Windows if you have Python and the other dependencies installed, don't you? (I haven't tried.)
::[[User:bzc6p|bzc6p]] ([[User_talk:bzc6p|talk]]) 20 October 2014, 17:52 (UTC+2)

Revision as of 15:53, 20 October 2014

Hi Archive Maniac, if you're having trouble, it's best to chat on IRC on the #archiveteam channel on EFnet where more people can help. I don't know how to upload wikis so you will need to join the #wikiteam channel for help. Please be patient and leave your chat client connected to give someone time to answer. Thanks. Chfoo 01:37, 17 February 2014 (EST)

Hi, sorry for not responding to your earlier messages. I don't check the wiki for messages that often because Archive Team does all its discussion on IRC. There's no forums unfortunately. If you have trouble with IRC, you can email me and I can get back to you sooner.

Regarding the best way to store your backups is to keep copies on multiple hard drives. Like VHS tapes and audio cassettes, CDs and DVDs wear out after a while. It's called disk rot. Although hard drives don't last long either, they hold much more data and are cheaper in the long run.

People who run the Warrior scripts manually usually have experience and money to spend on cloud computing for virtual hosts so they can run dozens of the scripts at once. This is why the people at the top of the Warrior leaderboards have gigabytes and gigabytes downloaded.

Archive Team already has a way for people to submit websites to be archived. It's called ArchiveBot and anyone can use it. All Archive Team files are placed into the archiveteam collection. Adding files to collection is restricted since files under this collection show up in the Wayback Machine.

Regarding uploading things to Internet Archive, uploading archives with good conventions is excellent and I wish more people would take initiative and be proactive.

However when uploading websites, you need to upload WARC files instead of a 7z file of the website. With wget, you'll need to use the --warc-file option. For example, --warc-file example will produce a WARC file called example.warc.gz. You want to use WARC files so The Wayback Machine can load them and show the archives properly.

I hope I answered your questions and sorry for missing your earlier messages. --- Chfoo 16:18, 12 April 2014 (EDT)

Some friendly words

The text with small letters is obsolete, see update under that.

I don't like starting private conversations except about technical things. However, I've seen your strange activities on the ArchiveTeam IRC channels recently, and I can't help saying some words.

First, I won't ever be sarcastic or cinycistic with you. Some of AT members may have been, but it's understandable. We have different amounts of patience. They have made assumptions about your age as well, however, we have no information about that.

Seeing your reactions and activities, I think I can understand your behaviour. I used to make similar actions and reactions myself, so we have common traits in some way, if you don't mind me saying this.

I was lucky to be present when the initial affair happened. I read through the lines several times, but the only thing I could conclude is that you accidentally wrote those lines to that window (they were totally out of context and you said this yourself too), but you were immediately banned. I don't remember you asking too much as you state on your user page. It is possible I didn't get something, and SketchCow and the others had the reasons to qualify you as "persona non grata", but I don't see.

Either way, you shouldn't feel offended. If you had logged in with another nickname, no one would have ever remembered your earlier activity. Even if you had logged in as... you know how, even then, I'm sure, no one would have said a word against or about you, provided you acted normally.

Even now, you could see people tried to be friendly towards you. However, what you feel is that they have some kind of hatred against you, and you must take revenge. No. It's not true. People don't hate you, even now, and you don't need to take revenge. I hate to say this but if you go on acting like this, then they may become actually fed up. But it's not too late now to turn back on this crazy way.

What you have been doing is called "demonstrating" on Wikipedia. Sad to see if someone, otherwise valueable member, does that. You seem to be a valuable member, doing useful things for/with/like ArchiveTeam. Please be collaborative and not disruptive. You don't have to say much, or even do much. I myself don't say or do too much (however, more and more as I've been an AT member for more and more time). I'm sure all (or at least 98% of) ArchiveTeam members counts on your work and welcomes you if you don't act in a kind of crazy way, if you don't mind me saying this.

And, one thing about SketchCow: he is not a de jura nor a de facto leader of AT. He writes about himself: "While I am a (generally) beloved figure who is appreciated for his public speaking skills and snappy dressing, Archive Team has collectively disagreed with me and some projects have been approached completely different ways than I would have approached them." What's more, you don't have to talk to him. I myself haven't talked to him yet too, just listen to him and agree or disagree with him in myself.

You write on your user page that there are friendly people here. Definitely, more than you think. As I see, almost every one of them. There are ones who don't seem to be so good mannered – but where aren't people like them? They are good too, just not that patient or have their own problems or such. (SketchCow has really unique manners, some adorable, some maybe not, but the same could be said about any one of us.)

I want to ensure you that you can ask me if you have questions, want to discuss something, and I won't try to get rid of you, and try not to hurt you with my words. And I want to encourage you to take part in ArchiveTeam's nice work. I've been in the group only for some months so far, but every day I know more and more about archiving, web, programming – and archiving is kind of fun, isn't it? Be sure your work is appreciated by everyone, just avoid demonstrating like today. Except your today's demonstrative activities, your work (making website crawls, informing AT about closures, running warriors) is appreciated. I think everyone is ready to forget everything about you immediately, if you return to that kind of work, with a calm tone. The one on your user page is a good starting point.

I know what it is like to be touchy. I am (or used to be) touchy myself. People forget and forgive, and we outgrow our traits like that. So cheer up and ArchiveTeam awaits you in its journey and mission!

Yours truly, bzc6p (talk), 17 October 2014, 14:55 (UTC)

I studied your "history", the events preceding your ban. So basically the only problem was that you talked much offtopic on ArchiveTeam channels and asked many, not-that-much important questions.

About the first thing. No problem that you are chatty. You could think that these IRC channels are also meant for talk you initiated. You didn't mistake too much about it, just a bit. You just need to accept that these channels are not completely like you imagined. It's not a problem with you, nor with the channel. But the two together. You can't do much about it, but don't be angry with channel members. Nor are they angry with you, they just find what you were doing inappropriate.

About the second thing. For some of your questions, the previous paragraph applies. For the others: some answers you may find out yourself, some of them you don't necessarily need to know. No problem with curiosity, but members may find too many questions exhausting. I hope you understand this. (I say this while I myself tend to ask too many questions sometimes, to make sure, but I am also patient answering questions. Not all of us must be like me regarding this thing, it's understandable that some people don't like tons of questions.)

And about both of the two things in general: too much text in IRC channels and logs makes the essence get lost. At least I think this. This is another thing why we should talk only about archiving-related stuff on AT IRC channels.

Still I uphold much of what I wrote earlier. You shouldn't be in cross with AT members, especially not swearing at them. If you consider what I wrote in the preceding paragraphs, you will be welcome on IRC even after these things. (Or, if you want to make sure, you can choose another nickname. That doesn't matter too much, I think.) Don't let revenge lead your actions. That's disruptive and contraproductive. None of us can do quality work if we don't listen to each other, study, sometimes ask. We know more and more every day, and after a point we answer more than we ask. But only if we are collaborative. That's the way it goes.

I'm ready to answer your questions if I can, I think I won't run out of patience too early. (No problem if someone does, but then that person shouldn't be bothered too much.) You can use my talk page if you have questions you think I can answer.

I gladly see you didn't give up archiving, even if you communicated this on IRC in a quite provocative way. I want to repeat that you won't possibly do quality work if you ignore other, more experienced members. Don't get hurt if they say your product is not okay. What to do with incompatible or corrupted or incomplete files? You should accept the pieces of advice. All of us does so. If something, then archiving is a thing which you can't do with completely closed eyes and ears.

And please don't curse SketchCow or anyone else... We must conform to others' manners when we talk to them. They also do so when they talk to us. This is the way it goes, again. I'm sure you know how it feels to be hurt. Why would you hurt others then?

I myself feel that I must be careful when talking to some people, especially if he is much older than me or has strange manners. So do others when talking to us (e.g. not to hurt, being patient etc.) And, about mistakes, we all forget and forgive – and learn.

I know the things I just wrote may be seen as spam, or at least needless and offtopic and too personal for this wiki. However, I just wanted to tell you that your archiving efforts are appreciated, and with some experience you may soon become a valued member of ArchiveTeam, doing lot of good stuff. You only need to be patient yourself, listen to others, read instructions and IRC, try things you are unsure of, and if important or you can't find out, ask. More or less this is what I've been doing, and I haven't had quarrels with others in AT so far, but I'm already on the level of being able to answer some questions and do good work (I think so).

I think I can tell you on behalf of ArchiveTeam that if you consider what I've written above, you'll be fine and your work will be welcome.

I hope we can count on you in the future. That's why I wrote this 10kb-ish post. (Sorry everyone for writing so much, this is one of my weaknesses.)

Yours truly, bzc6p (talk), 18 October 2014, 20:23 (UTC)

You are welcome. However, I think it would be too early and strange if I entered the channel that "Hey guys, Dec-31-99 is sorry and wants you to forgive him"... It will resolve itself, if you wait a couple of days. Then, if you want to tell them something important (in short, to make sure), they won't kick you out, I'm sure – provided you follow the guidelines others and I told you.
I'm sure that not I'm the only one who "understood your situation". Rather, I may be the only time-millionaire who can type 10kBs to "explain ArchiveTeam".
Well, the message "if you know any other Hungarian sites..." is addressed to Hungarian people in the first place, they can find sunsetting sites easier, you guess why... but of course no one is excluded. I myself regularly check Google with keywords like "web site closes" (in Hungarian). (In fact, this way did I find Panoramio and alarmed ArchiveTeam!) As for GPortál, it's a very big WYSIWYG website hosting and has other services as well, I don't expect it to close without any notification, and if it is ever going to shut down, that will be a big thing and will make noise.
For the specific website you mentioned: if you want to archive that site (I don't have the time now, I'm concerned with Demotiváló right now – and you could learn with grabbing this donkeykong), you can do two things. One is that you pass it to ArchiveBot. I haven't used that so you need to check out how it works. (My projects so far needed special care, I think ArchiveBot couldn't have done them itself. But if it's a simple website with not too much awful Javascript, hidden comments etc, it may be able to handle.) The other thing is that you grab the website yourself. For that I recommend wpull, which is a wget-like software designed with creating WARC files in mind. I didn't check the website too deeply, but if I see well, website components reside under "donkeykong.gportal.hu" and "gportal.hu/portal/donkeykong". The wpull command I would try first:
wpull --accept-regex "donkeykong.gportal.hu|gportal.hu/portal/donkeykong" -o log.txt --no-warc-keep-log --recursive --level inf -p -H -Dgportal.hu --tries inf --no-robots --retry-connrefused --retry-dns-error --delete-after --warc-cdx --database DATABASEFILENAME --warc-file WARCFILENAME
where you choose DATABASEFILENAME and WARCFILENAME as you wish. The database file lets you continue the download, only problem is that then wpull ignores the already existing warcfile (and overwrites it). If I archive a larger site, I prepare, and for the warcfilename I give the _01 postfix first, and if wpull gets stopped for some reason, I change the postfix to _02 etc, leaving the other options intact. This is not too elegant, to have several files, but later they may be merged together with some megawarc tool. But if you have a good internet connection (here the problem is that for some reason wpull pretends there is no connection when there is, may be a bug) and the site is not that big, it may come down in one run – in that case you can omit the database file and the postfixes. This latter is the desirable way.
Wpull documentation, including a manpage-style option overview: http://wpull.readthedocs.org
See The WARC Ecosystem for warc-tools.
If you want to test your WARC, try warc-proxy. Even ArchiveTeam uses that sometimes. I've read somewhere that one of your (?) WARCs couldn't be injected into Wayback Machine for some reason. Well, if warc-proxy can read your WARC, that doesn't necessarily imply that Wayback also will, but we can hope.
These are all Linux tools. I don't know any tools for Windows. Software like HTTrack may be good in mirroring, but they don't speak WARC, and WARC is essential for Wayback Machine.
bzc6p (talk) 19 October 2014, 22:25 (UTC+2)
wpull has just dropped Python2 support.
You can run Python programs on Windows if you have Python and the other dependencies installed, don't you? (I haven't tried.)
bzc6p (talk) 20 October 2014, 17:52 (UTC+2)