https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Kaz&feedformat=atomArchiveteam - User contributions [en]2024-03-29T08:08:21ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Current_Projects&diff=46270Current Projects2021-02-02T16:56:23Z<p>Kaz: Amendment to MediaFire tagline</p>
<hr />
<div>__NOTOC__<br />
== Archive Team recruiting ==<br />
* [[Dev|Want to code for Archive Team? Here's a starting point.]]<br />
* Help us: '''[[warrior|☞ Download and run your warrior ☜]]'''.<br><br />
* What's on: [https://tracker.archiveteam.org/ online tracker].<br><br />
<!--Combined project activity graphs [http://zeppelin.xrtc.net/corp.xrtc.net/shilling.corp.xrtc.net/project_items.html here].--><br />
<br />
== Warrior-based projects ==<br />
{{:CurrentWarriorProject}}<br />
<br />
<!-- Urgent projects --><br />
<!-- Long-term projects --><br />
* [[URLTeam]]: URL shorteners were a fucking awful idea. '''IRC Channel {{IRC|urlteam|network=hackint}}'''.<br />
<br />
''There will be fewer Warrior projects than usual due to the virtual appliance being unable to run many newer projects that utilize wget-at. It will take a little bit of time before an updated version is available that can run it.''<br />
<br />
=== Scripts only ===<br />
* [[MediaFire]]: [https://twitter.com/textfiles/status/1349516443654758401 Not 'at-risk' but grabbing speculatively to save historic files] '''IRC Channel {{IRC|mediaonfire|network=hackint}}'''.<br />
* [[.eu domains]]: The Brexit deal is done, and with that comes a purge of UK-based sites no longer eligible to use the .EU domain as of 2021. '''IRC Channel {{IRC|noteurdomain|network=hackint}}'''.<br />
* [[Endomondo]]: GPS workout tracker with optional social networking features, shutting down 2020-12-31. '''IRC Channel {{IRC|findelmundo|network=hackint}}'''.<br />
* [[Flash]] domains: An effort to preserve what remains of a storied legacy of websites hosting Adobe Flash Player content before the web industry takes it behind the shed. '''IRC Channel {{IRC|flashbang|network=hackint}}'''.<br />
* Classic [[Google Sites]]: Making more sites inaccessible to the public starting September 1, 2021. '''IRC Channel {{IRC|nearlylostmygoogles|network=hackint}}'''.<br />
* [[Reddit]]: Banning communities that generate bad PR for Reddit Inc. Currently grabbing ''new'' material. '''IRC Channel {{IRC|shreddit|network=hackint}}'''.<br />
* [[GitHub]]: Embraced-uh, I mean, bought by Microsoft. '''IRC Channel {{IRC|gitgud|network=hackint}}'''.<br />
* [[URLs]]: A random collection of stuff. '''IRC Channel {{IRC|//|network=hackint}}'''.<br />
<br />
== Manual projects ==<br />
* [[Coronavirus|2019-2020 coronavirus outbreak]]: Documenting and preserving data, events, and impacts of the virus on society. '''IRC Channel {{IRC|coronarchive|network=hackint}}'''<br />
* [[ArchiveBot]]: For those with lots of disk space, bandwidth and long-term commitment. '''IRC Channel {{IRC|archivebot|network=hackint}}'''.<br />
* [[WikiTeam]]: Saving wikis dumps (XML). And their external links for the Wayback Machine (WARC) as well as exporting MediaWiki databases. Permanent effort, [https://github.com/WikiTeam/wikiteam/wiki/Tutorial#I_have_no_shell_access_to_server everyone can help] (you choose the size of your downloads). '''IRC Channel {{IRC|wikiteam|network=hackint}}'''.<br />
* [[MP3.com]]: Digging through the WayBack Machine's archives to build a database of all the DAM CDs made available through the site.<br />
<br />
== Upcoming & proposed projects ==<br />
<!-- Websites you would like to have archived. Please create a wikipage about the project with information about the website (shutting down? (when), why should it be archived, etc.). --><br />
<!-- Top priority: could disappear anytime now --><br />
<!-- Shutting down, definite deadline given --><br />
* [[Halo]]: Back to finishing off unfinished business before Bungie kills the original website on February 9, 2021. '''IRC Channel {{IRC|yolohalo|network=hackint}}'''.<br />
* [[Webs]]: Vistaprint is killing off the Freewebs you knew from the 2000s on March 31, 2021, unless you pay up. '''IRC Channel {{IRC|webbed|network=hackint}}'''.<br />
* [[Periscope]]: Another Twitter acquisition, another shutdown. This time, its live-streamer gets to join Vine in the bin at the end of March. '''IRC Channel {{IRC|microscope|network=hackint}}'''.<br />
* [[Google Poly]]: A 3D art repository that Google will send to the trash compactors on June 30, 2021. New uploads cease April 30. '''IRC Channel {{IRC|polygone|network=hackint}}'''.<br />
* [[Chrome Web Store]]: Google has announced a timeline of policy changes that will lead to content being removed between December 1, 2020 and June 2022. '''IRC Channel {{IRC|chromeweblore|network=hackint}}'''.<br />
<!-- Shutting down, vague deadline given --><br />
* [[Kinja]]: Deleting all user pages, maybe? '''IRC Channel {{IRC|gokinjagokinjago|network=hackint}}'''.<br />
* [[Twitter]]: Deleting inactive accounts <s>2019-12-11</s> sometime. '''IRC Channel {{IRC|twitterdead}}'''.<br />
<!-- Shutting down, no deadline given --><br />
<!-- Archiving the archives --><br />
<!-- Misc. projects (unmaintained sites, distrust in owners) --><br />
* [[Imgur]]: Image hoster decided that using it for hosting images is not permitted. '''IRC Channel {{IRC|imgone}}'''.<br />
* [[JamiiForums]]: the Tanzanian government would like this gone. '''IRC Channel {{IRC|jammedforums}}'''.<br />
* [[LiveJournal]]: Very old, widely regarded as in decline, and has a lot of important stuff buried in it. '''IRC Channel {{IRC|recordedjournal}}'''.<br />
* [[Ownlog]]: Ownlog is losing popularity and support from its owners. '''IRC Channel {{IRC|pwnlog}}'''.<br />
* [[The Pirate Bay]]: Recently came back up, grabbing an archive for sanity's sake. '''IRC Channel {{IRC|yarharfiddlededee}}'''.<br />
* [[Valhalla]]: Where to store what even the [[Internet Archive]] doesn't have space for? '''IRC Channel {{IRC|huntinggrounds}}'''.<br />
* [[Giphy]]: Bought by Facebook, to be "integrated" (assimilated) into Instagram https://news.knowyourmeme.com/news/facebook-to-buy-giphy<br />
<br />
== Recently finished projects ==<br />
<!-- put projects here that are still in the tracker but not yet deleted so it won't confuse people --><br />
* [[SmackJeeves]]: Webcomics host being tossed into the incinerator on 2020-12-31. '''IRC Channel {{IRC|archiveteam-bs|network=hackint}}'''.<br />
* [[Voat]]: A reddit competitor from the Ellen Pao days gives its users a Christmas present: it's fucking dead! '''IRC Channel {{IRC|scrapevoat|network=hackint}}'''.<br />
<br />
== Hiatus / Missed the Mark ==<br />
* [[Fast.io]]: A CDN for cloud storage services which will evaporate completely on 2021-01-15. '''IRC Channel {{IRC|slowio|network=hackint}}'''.<br />
* [[Angelfire]]: Angelfire is a web hosting service that contains big chunks of early WWW history and has no proper backup. '''IRC Channel {{IRC|angelonfire}}'''.<br />
* [[Audit2014|Audit 2014]]: It's time to verify our shit. '''IRC Channel {{IRC|auditteam}}'''. THIS PROJECT IS ON HIATUS AND WILL BE RETURNED TO AS AUDIT2018.<br />
* [[Flickr]]: <s>[[Yahoo!]]</s> SmugMug decided to kill it after finding Yahoo!'s plans to do so before they were bought by Verizon. '''IRC Channel {{IRC|flickrfckr|network=hackint}}'''.<br />
* [[FTP]]: Help us find and download all FTP sites! '''IRC Channel {{IRC|effteepee|network=hackint}}'''.<br />
* [[Google Groups]]: "Gone within a year" ([[User:Jscott|SketchCow]], 2016-06-07).<br />
* [[Google News Archive]]: Let's store all newspapers at Google, WCGW? '''IRC Channel {{IRC|papersplease}}'''.<br />
* [[DevPort]]: This [http://developerportfolio.com/ portfolio SaaS provider] has [http://www.lowendtalk.com/discussion/65135/need-some-help-saas-provider-is-dead-but-my-site-is-still-up-how-should-i-grab-it reportedly] been having infrastructure issues, and removed their social media accounts. Possible impending shutdown.<br />
* [[INTERNETARCHIVE.BAK]]: Grab a slice of the big cake of [[Internet Archive|The Archive]]! '''IRC Channel {{IRC|internetarchive.bak}}'''.<br />
* [[ISP Hosting]]: Finding ISP web hosting services before the Grim Reaper finds them. '''IRC Channel {{IRC|webroasting|network=hackint}}'''.<br />
* [[NewsGrabber]]: Saving all news articles. <!-- Help with server power or by finding more news sites.-->Currently paused. '''IRC Channel {{IRC|newsgrabber|network=hackint}}'''.<br />
* [[Project Newsletter]]: Archiving e-newsletters, currently in development. '''IRC Channel {{IRC|projectnewsletter}}'''.<br />
* [[Quizlet]]: Flashcards and other learning tools '''IRC Channel {{IRC|quizletusin}}'''.<br />
* [[Tumblr]]: [[Yahoo!]] considered killing it, now Yahoo has been acquired and Verizon declared war on NSFW blogs. '''IRC Channel {{IRC|tumbledown|network=hackint}}'''.<br />
* [[yuku]]: Lately yuku is very unstable and hosting thousands of forums. Project currently paused. '''IRC Channel {{IRC|archiveteam|network=hackint}}'''.<br />
<br />
<small>ArchiveTeam primarily uses the hackint IRC network – ircs://irc.hackint.org:6697 (TLS required) – webchat: https://webirc.hackint.org/ – [[Archiveteam:IRC|More info]]<br />
<small>ArchiveTeam also has some channels left on the EFnet IRC network – irc://irc.efnet.org – webchat: http://chat.efnet.org:9090 – [[Archiveteam:IRC|More info]]</small><br></div>Kazhttps://wiki.archiveteam.org/index.php?title=Running_Archive_Team_Projects_with_Docker&diff=45974Running Archive Team Projects with Docker2020-12-23T22:02:19Z<p>Kaz: snip some incorrect info, also removed note about pinging for requeues</p>
<hr />
<div>{{notice|1=This page is currently in draft form and is being worked on. Instructions may be incomplete.}}<br />
<br />
<!-- ==What is the Archive Team Warrior?==<br />
--><br />
[[Image:Archive_team.png|100px|left]]<br />
[[File:Archiveteam_warrior_infrastructure.png|thumb|right|256px|[[Dev/Infrastructure|Archive Team Infastructure]]]]<br />
<br />
You can run Archive Team scripts in Docker containers to help with our archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!<br />
<br />
The scripts run in a Docker container, so there is no risk to your computer. The container will only use your bandwidth and some of your disk space. It will get tasks from and report progress to the [[Tracker]].<br />
<br />
== Basic usage ==<br />
<br />
Docker runs on Windows, macOS, and Linux, and is a [https://docs.docker.com/get-docker/ free download]. Docker runs code in '''containers''', and stores code in '''images'''.<br />
<br />
<!-- === Quick start instructions for Docker Desktop on Windows and macOS ===<br />
<br />
# Download and install Docker from the link above.<br />
# Launch Docker Desktop.<br />
# In VirtualBox, click File > Import Appliance and open the file.<br />
# Start the virtual machine.<br />
#* It will fetch the latest updates and will eventually tell you to start your web browser.<br />
# Using your regular web browser, visit http://localhost:8001/<br />
# On the left, click "Your settings".<br />
# Choose a username - we'll show your progress on the [[tracker|leaderboard]].<br />
# On the left, click "Available projects" tab and pick a project to work on.<br />
#* Even better: select "ArchiveTeam's Choice" to let your warrior work on the most urgent project.<br />
--><br />
=== Instructions for using Docker CLI on Windows, macOS, or Linux ===<br />
<br />
# Download and install Docker from the link above.<br />
# Open your terminal. On Windows, you can use either Command Prompt (CMD) or PowerShell, on macOS and Linux you can use Terminal (Bash).<br />
# First, we will set up the [https://containrrr.dev/watchtower/ Watchtower] container. Watchtower automatically checks for updates to Docker containers every five minutes, and if an update is found, it will gracefully shutdown your container, update it, and restart it.<br />Use the following command:<br /><code>docker run -d --name watchtower --restart=unless-stopped -v /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --label-enable</code>.<br />Explanation:<br />
#* <code>-d</code>: Detaches the container from the terminal and runs it in the background.<br />
#* <code>--name watchtower</code>: The name that is displayed for the container. A name other than "watchtower" can be specified here if needed.<br />
#* <code>--restart=unless-stopped</code>: This tells Docker to restart the container unless you stop it. This also means that it will restart the container automatically when you reboot your system.<br />
#* <code>-v /var/run/docker.sock:/var/run/docker.sock</code>: This provides the Watchtower container access to your system's Docker socket. Watchtower uses this to communicate with Docker on your system to gracefully shutdown and update your containers.<br />
#* <code>--label-enable</code>: This tells Watchtower only to update containers that are specifically tagged for auto-updating. This is included to prevent Watchtower from updating any other containers you may have running on your system. If you are only using Docker to run Archive Team projects, or wish to automatically update all containers including those that are not for Archive Team projects, you can leave this off.<br />
# Now we will set up a project container. You'll need to know the image address for the script for the project you want to help out with. If you don't know it, you can ask us on [[IRC]].<br />Use the following command:<br /><code>docker run -d --name archiveteam --label=com.centurylinklabs.watchtower.enable=true --restart=unless-stopped [image address] --concurrent 1 [username]</code>.<br />Explanation:<br />
#* <code>-d</code>: Detaches the container from the terminal and runs it in the background.<br />
#* <code>--name archiveteam</code>: The name that is displayed for the container. A name other than "archiveteam" can be specified here if needed.<br />
#* <code>--label=com.centurylinklabs.watchtower.enable=true</code>: Labels the container to be automatically updated by Watchtower. You can leave this off if you did not include <code>--label-enable</code> when launching the Watchtower container.<br />
#* <code>--restart=unless-stopped</code>: This tells Docker to restart the container unless you stop it. This also means that it will restart the container automatically when you reboot your system.<br />
#* <code>[image address]</code>: Replace this with the image address for the project you would like to help with. The brackets should not be included in the final command. Additionally, the address should not include <code>https://</code> or <code>http://</code>, and all characters must be lowercase.<br />
#* <code>--concurrent 1</code>: Process 1 item at a time. Although this varies for each project, the maximum recommended value is 5, and the maximum allowed value is 20. Leave this at 1, or check with us on [[IRC]] if you are unsure.<br />
#* <code>[username]</code>: Choose a username - we'll show your progress on the [[tracker|leaderboard]]. The brackets should not be included in the final command.<br />
# If you wish to stop running your containers, run <code>docker stop -t 300 watchtower archiveteam</code>. If needed, replace "watchtower" and "archiveteam" with the actual container names you used. The <code>-t 300</code> part of the command tells Docker to ask the container to shutdown as soon as possible, but wait 300 seconds (5 minutes) before forcibly stopping the container. If you'd like to allow an hour for the container to gracefully shutdown, user <code>-t 3600</code>. If you want to stop the container immediately, use <code>-t 0</code>.<br />
# Similarly, to start your containers again in the future, run <code>docker start watchtower archiveteam</code>. If needed, replace "watchtower" and "archiveteam" with the actual container names you used.<br />
# To delete a container, run <code>docker rm archiveteam</code>. If needed, replace "archiveteam" with the name of the actual container you want to delete. To free up disk space, you can also purge your unused Docker images by running <code>docker image prune</code>. Note that this command will delete all Docker images on your system that are not associated with a container, not just Archive Team ones.<br />
{{notice|1=On Windows and macOS, once you have completed steps 1-4, you can also start, stop, and delete containers in the Docker Desktop UI. However, initial setup and switching projects can only be done from the command line for the time being.}}<br />
# <li value="8"> Remember to periodically check our [[IRC]] channels and homepage so you switch your scripts to a current project. Projects change frequently at Archive Team, and at the moment we don't have a way to automatically switch the projects run in Docker containers. To switch projects, simply stop your existing Archive Team container by running <code>docker stop archiveteam</code>, and delete it by running <code>docker rm archiveteam</code> and run a new one by repeating step 4. Then, you can optionally prune your unused Docker images as in step 7. Note: you don't need to stop or replace your Watchtower container, just make sure it is still running by using <code>docker ps -f name=watchtower</code>. If Watchtower is not running or you are unsure, run <code>docker start watchtower</code>.<br />
<br />
<br />
__TOC__<br />
<br />
== FAQ ==<br />
<br />
=== Why a Docker container in the first place? ===<br />
<br />
A Docker container is a quick, safe, and easy way for newcomers to help us out. It offers many features:<br />
<br />
* Self-updating software infrastructure provided by Watchtower<br />
* Allows for unattended use<br />
* In case of software faults, your machine is not ruined<br />
* Restarts itself in case of runaway programs<br />
* Runs on Windows, macOS, and Linux painlessly<br />
* Ensures consistency in the archived data regardless of your machine's quirks<br />
* Restarts automatically after a system restart<br />
<br />
If you have suggestions for improving this system, please talk to us as described below.<br />
<br />
=== Can I use whatever internet access for running scripts? ===<br />
<br />
No. We need "clean" connections. Please ensure the following:<br />
<br />
* No OpenDNS. No ISP DNS that redirects to a search page. Use non-captive DNS servers.<br />
* No ISP connections that inject advertisements into web pages.<br />
* No proxies. Proxies can return bad data. The original HTTP headers and IP address are needed for the WARC file.<br />
* No content-filtering firewalls.<br />
* No censorship. If you believe your country implements censorship, do not run Archive Team scripts. <br />
* No Tor. The server may return an error page instead of content if they ban exit nodes.<br />
* No free cafe wifi. Archiving your cafe's wifi service agreement repeatedly is not helpful.<br />
* No VPNs. Data integrity is a very high priority for the Archive Team so use of VPNs with the official crawler is discouraged.<br />
* We prefer connections from many public IP addresses if possible. (For example, if your apartment building uses a single IP address, we don't want your apartment banned.)<br />
<br />
=== I turned my Docker container off. Will those tasks be lost? ===<br />
<br />
If you've killed your Docker instance, then the work your container did has been lost. However, the tasks will be returned to the pool after a period of time, and others may claim them. <br />
<br />
<!-- === I closed my browser or tab with the warrior's web interface. Will those tasks be lost? ===<br />
<br />
No. The web browser interface just provides a user interface to the warrior. As long as the VM or docker container is not stopped, it will continue normally. --><br />
=== How much disk space will the Docker container use? ===<br />
<br />
Short answer: it depends on the project. <!-- (But never more than 60GB.) --><br />
<br />
Long answer: because each project defines items differently, sizes may vary. A single task may be a small file or a whole subsection of a website. <!-- The virtual machine is configured by default to use an absolute maximum of 60GB. Any unused virtual machine disk space is not used on the host computer. You may run the virtual machine on less than 60GB if you like to live dangerously. We're downloading the internet, after all! --><br />
<!-- === How can I run the Docker container headlessly (without leaving a window open)? ===<br />
<br />
(add startup and shutdown instructions)<br />
--><br />
<br />
=== How can I see the status of my archiving? ===<br />
You can check the [[tracker|leaderboard]] to see how much you've archived. If you want to see the current status of your Docker container, you can run <code>docker logs -n 0 -f archiveteam</code>. <code>-n 0</code> tells Docker to only show current logs, and <code>-f</code> tells Docker to keep displaying logs as they come in until you press Control-C to stop it. If needed, replace "archiveteam" with the actual name you used for your container.<br />
<br />
<!--<br />
=== How can I set up the Docker container as a system service (so that it starts up on boot and shuts down automatically)? ===<br />
<br />
If you are using VirtualBox and running a Linux distribution that uses the systemd init system (like most recent releases), you can follow the short instructions on [http://www.ericerfanian.com/automatically-starting-virtualbox-vms-on-archlinux-using-systemd/ this page]. (The page title specifies Arch Linux, but this will work for other distros as long as they run systemd.)<br />
--><br />
<!-- === How can I run the warrior without a virtual machine? (The VM has too much overhead for a VPS!) ===<br />
<br />
One option is running a Docker container (see [[#Alternative_virtual_machines|above]]). Docker is based on LXC, and the overhead is far less than running a full VM. If you plan on running the [https://github.com/ArchiveTeam/warrior-dockerfile warrior-dockerfile], make sure to publish the port to allow access to the web interface:<br />
<br />
<pre>docker run -d -p 8001:8001 archiveteam/warrior-dockerfile</pre><br />
<br />
This creates a direct port mapping. For host port 38001 to container port 8001, use <code>38001:8001</code>. Adjust as required. :P<br />
<br />
(Multiple projects can be also run in isolated environments (containers) for rapid deployment using [https://hub.docker.com/r/infrequent/at-as-dockerfile at-as-dockerfile].)<br />
<br />
Another alternative is '''running the project manually.''' If you are managing a VPS, it's likely you are comfortable with some Linux stuff. Consult the project wiki page or the source code repository readme file.<br />
--><br />
=== How can I run tons of containers easily? ===<br />
<br />
We assume you've checked with the current Archive Team project what concurrency and resources are needed or useful!<br />
<br />
Whether your have your own virtual cluster or you're renting someone else's (aka a "[https://fsfe.org/activities/nocloud/ cloud]"), you probably need some [[wikipedia:Category:Orchestration_software|orchestration software]].<br />
<br />
ArchiveTeam volunteers have successfully used a variety of hosting providers and tools (including free trials on AWS and GCE), often just by building their own flavour of virtual server and then repeating it with simple [https://cloudinit.readthedocs.io/ cloud-init] scripts (to install and launch docker as above) or whatever tool the hosting provides. If you desire full automation, the [https://gitlab.com/diggan/archiveteam-infra archiveteam-infra repository by diggan] helps with [[wikipedia:Terraform (software)|Terraform]] on [[wikipedia:DigitalOcean|DigitalOcean]].<br />
<br />
Some custom monitoring scripts also exist, for instance [https://github.com/general-programming/gp-archiveteam-bs/blob/master/tumblr/watcher.py watcher.py].<br />
<br />
=== I'd like to help write code or I want to tweak the scripts to run to my liking. Where can I find more info? Where is the source code and repository? ===<br />
<br />
Check out the [[Dev]] documentation for details on the infrastructure and details of the source code layout.<br />
<br />
=== I still have a question! ===<br />
<br />
Check out the [[Frequently Asked Questions|general FAQ page]]. Talk to us on [[IRC]]. Use [ircs://irc.hackint.org:6697/archiveteam-bs #archiveteam-bs] for general questions or the project IRC channel for project-specific instructions.<br />
<br />
== Troubleshooting ==<br />
<br />
=== (Linux) Running Docker commands gives me a permission denied error. How can I fix this? ===<br />
<br />
There are a few ways to fix this issue. The fastest way is to put <code>sudo</code> before your Docker commands. This runs the process as the root user. You can also log into your system as root and run the Docker commands from there. Alternatively, you can create a <code>docker</code> user group and add your account to it by running <code>sudo groupadd docker</code>, then <code>sudo usermod -aG docker $USER</code>, and then activate the changes by running <code>newgrp docker</code> or simply logging out and logging back in to your system or rebooting your system<ref>{{URL|https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user}}</ref>.<br />
<br />
<!--<br />
=== I can't connect to localhost. ===<br />
<br />
The application is configured to set up port forwarding to the guest machine, and you should be able to access the interface through your web browser at port 8001. If this does not happen, and isn't resolved by rebooting the container (using the ACPI power signals, not suspend/save state and resume), you may need to double-check your machine's network settings (as described [[#How_can_I_run_multiple_virtual_machines_at_the_same_time.3F|above]]). --><br />
=== I see a message that no item was received. ===<br />
<br />
This means that there is no work available. This can happen for several reasons:<br />
<br />
* The project has just finished and someone is inspecting the work done. If a problem is discovered, items may be re-queued and more work will become available.<br />
* You have checked out/claimed too many items. Reduce your concurrency and let others do some of the work too.<br />
* In a rare case, you have been banned by a tracker administrator because there was a problem with your work: you were requesting too much, you were tampering with the scripts, a malfunction has occurred, or your internet connection is "unclean" (see [[#Can_I_use_whatever_internet_access_for_the_warrior.3F|above]]).<br />
<br />
=== I see a message about rate limiting. ===<br />
<br />
Don't worry. Keep in mind that although downloading the internet for fun and digital preservation are the primary goals of all Archive Team activities, serious stress on the target's server may occur. The rate limit is imposed by a [[Tracker#People|tracker administrator]] and should not be subverted.<br />
<br />
(In other words, we don't want to DDoS the servers.)<br />
<br />
If you like, you can switch to another [[Warrior projects|project]] with less load.<br />
<br />
=== I see a message about code being out of date. ===<br />
<br />
Don't worry. There is a new update ready. You do not need to do anything about this if you are running the container with Watchtower; Watchtower will update its code every five minutes. If you are impatient, please stop and remove your container, then repeat step 4 in the setup instructions and it will download the latest code and resume work.<br />
<br />
=== I'm running the scripts manually and I see a message about code being out of date. ===<br />
<br />
This happens when a bug in the scripts is discovered. Bugs are unavoidable, especially when the server is out of our control.<br />
<!--<br />
Try the <code>--auto-update</code> option available in Seesaw version 0.8. However, please be aware that you are now executing code automatically. Be sure to run the scripts in a separate user account for safety. --><br />
<br />
=== I see messages about rsync errors. ===<br />
<br />
Uh-oh! Something is not right. Please notify us immediately in the appropriate [[IRC]] channel.<br />
<br />
<!-- === I told the container to shut down from the web interface, but nothing has changed. ===<br />
<br />
The warrior will attempt to finish the current running tasks before shutting down. If you need to shut down right away, go ahead. Your progress will be lost, but the jobs will eventually cycle out to another user.<br />
--><br />
<!--=== The container is eating all my bandwidth! ===<br />
<br />
(it seems bandwidth limiting is not a feature in Docker)<br />
--><br />
=== The item I'm working on is downloading thousands of URLs and it's taking hours. ===<br />
<br />
Please notify us in the appropriate [[IRC]] channel. You may need to restart the container.<br />
<br />
=== The instructions to run the software/scripts are awful and they are difficult to set up. ===<br />
<br />
Well, excuuuuse me, princess!<br />
<br />
We're not a professional support team so help us help you help us all. See above for [[#Where_can_I_file_a_bug.2C_suggestion.2C_or_a_feature_request.3F|bug reports]], [[#Where_can_I_file_a_bug.2C_suggestion.2C_or_a_feature_request.3F|suggestions]], or [[#I.27d_like_to_help_write_code._Where_can_I_find_more_info.3F|code contributions]].<br />
<br />
=== Where can I file a bug, suggestion, or a feature request? ===<br />
<br />
If the issue is related to the web interface or the library that grab scripts are using, see [https://github.com/ArchiveTeam/seesaw-kit/issues seesaw-kit issues]. Other issues should be filed into their own [[Dev/Source_Code|repositories]].<br />
<br />
<!-- == Projects ==<br />
<br />
See [[Warrior projects]].<br />
--><br />
== Are you a coder? ==<br />
<br />
Like our scripts? Interested in how it works under the hood? Got software skills? '''[[Dev|Help us improve it!]]'''<br />
<br />
Note: some of the content of this page has been adapted from [[ArchiveTeam Warrior]].<br />
<br />
{{Navigation box}}</div>Kazhttps://wiki.archiveteam.org/index.php?title=Talk:ArchiveTeam_Warrior&diff=45670Talk:ArchiveTeam Warrior2020-10-15T19:05:08Z<p>Kaz: Undo revision 45669 by Nebogipfel2020 (talk)</p>
<hr />
<div>==Some raw notes==<br />
<br />
--[[User:BlueMax|BlueMax]] 06:41, 3 August 2012 (EDT)<br />
<br />
Here's my plaintext unformatted dump of what I've written so far: <br />
<br />
Setting up the Warrior:<br />
<br />
* Your machine needs to be relatively powerful with an internet connection to run the Warrior. You'll need at least:<br />
<br />
- a 2.0ghz dual core processor<br />
- 2 gigabytes of RAM<br />
- 100GB of hard drive space<br />
- a fast internet connection (at the very least 1mbps down/up)<br />
<br />
* Although we do recommend something more powerful (or if you plan to do more with your machine while you also use the Warrior):<br />
<br />
- a quad core processor<br />
- 4 gigabytes of RAM<br />
- as fast of an internet connection as possible (we love universities!)<br />
- most or all of your background downloading/uploading programs turned off (such as torrent clients)<br />
<br />
* To use the Warrior, we recommend using VirtualBox. VirtualBox is a program available for all major operating systems that emulates a desktop computer. The Warrior is a preconfigured Linux system (or "virtual appliancce") designed to automate the process of downloading and uploading data for an ArchiveTeam project.<br />
<br />
* Visit https://www.virtualbox.org/wiki/Downloads and download the latest version for your platform, and:<br />
<br />
For Windows:<br />
<br />
-Open and follow the setup as prompted. Default settings will be fine for the most part. Do not unselect the VirtualBox Networking selection where it appears, as it is required to run the Warrior.<br />
<br />
For Mac OS X:<br />
<br />
-TBA<br />
<br />
For Linux systems:<br />
<br />
-TBA<br />
<br />
* Once you have VirtualBox installed, open it via your preferred method (command line, shortcut or what have you).<br />
<br />
* Now download the Warrior using the below link. The current version of the Warrior is a 167MB .ova (virtual appliance) file. You'll download this file and import it into VirtualBox. Save it to somewhere you will be able to access it.<br />
<br />
* When the main menu has opened, click File > Import Appliance. You'll be given a pop-up window. Click the Choose box and navigate to the .ova file you just downloaded, and select it, then click Next. <br />
<br />
* Do not uncheck either of the tick boxes in the next window, simply click Import. It may take a few minutes for the next part of the process to take place.<br />
<br />
Now to boot the ArchiveTeam Warrior and get it working on a project.<br />
<br />
* Double-click the new option that has just appeared in the main VirtualBox window. It should have the name "archiveteam-warrior-2" or similar. A new popup window will appear if you have done it right.<br />
<br />
* While the system boots in the background, a few VirtualBox pop-up messages may appear. Feel free to just click "OK" on them. There should be no need to touch any of the options or press any keys on your keyboard until the warrior has booted up.<br />
<br />
* You'll eventually be presented with a screen that says "Configure your warrior via the web interface." Minimize the VirtualBox window on your desktop (if you are unable to move your mouse, press the Right Control button on your keyboard). <br />
<br />
* Open your choice of web browser. We require a modern web browser (the latest versions of Firefox and Chrome will work, we do not support IE on account of not willing to be suicidal) and enter "http://localhost:8001" without quotes into the address bar, and press Enter.<br />
<br />
* If you've done this correctly you should see the ArchiveTeam Warrior page. On the left side of the screen you'll see several options, including "All projects" and "Your settings". We want to set up your settings first, so click "Your settings".<br />
<br />
* Enter your nickname in the first box so that we can identify who you are on our tracker. Only use letters and numbers (Ph1shF00d would work, but Ph*shF**d wouldn't, for example).<br />
<br />
* The second box is how many items will download at a time. You may put this up as high as 6 if you have a very speedy internet connection, but slower connections may want to stick to the default selection.<br />
<br />
* You can leave the rest of the settings as they are. Click "Save settings" once you're done and then click "All projects" on the left pane.<br />
<br />
* You'll be presented with a list of projects on the right pane of the browser window. If you just want the Warrior to do what the ArchiveTeam wants it to, simply click the "Work on this project" button to the right of the "ArchiveTeam's Choice" project. Your browser will be redirected to the "Current Project" tab and the Warrior will start work on what the main project for ArchiveTeam currently is. If you see a bunch of black windows scrolling down your screen, your Warrior is working as intended and you're free to leave your computer, or do other things. (You may close the web browser window as well, it won't affect the Warrior).<br />
<br />
* If you want to select a different project, simply go to the "All projects" page again and select which one you'd like. Stopping your Warrior is just as simple, go to the All projects page and click "Stop this project" at the top of the page (under "Your current project").<br />
<br />
* To shut down the ArchiveTeam Warrior, click the "Shut down" button on the page, and close the webpage. Eventually the VirtualBox window will close (in the background if you minimised it) and you can close the main VirtualBox window.<br />
<br />
==Some additional notes==<br />
<br />
Here's some of the things I noticed when using the warrior:<br />
<br />
* use the archiveteam-warrior-v2 .ovf from archive.org, not the v1 linked from the article!<br />
* if you use vmware, ignore the warning about the .ovf file not passing validation<br />
** however, before starting the VM, you have to do these steps<br />
***remove the VM from the list of favorites on the left side (simply select it and hit DEL -- don't worry, you won't delete it and will add it again later)<br />
***edit the *.vmx file (virtual machine config file)<br />
**** If on OS X, this can be found by right-clicking on the .vmwarevm file that VMWare generates and selecting "Show package contents" - [[User:Machawk1|Machawk1]] 15:54, 5 August 2014 (EDT)<br />
***change all lines that start with "ide1:1" to "ide1:0" (i.e. replace the second 1 with a 0). This is because the .ovf file specifies the second harddisk as secondary slave, which won't work in VMware if you don't configure a secondary master first.<br />
***re-add the vmx file to VMware. Either double-click it, or drag&drop it to the are where you deleted it earlier.<br />
* The start screen will tell you to open "http://localhost:8001". At least on VMware (and I guess also on VirtualBox, but I don't know) this will not work. Instead, do the following:<br />
**Press Alt-F3 to get to the 3rd console<br />
**login as "root" with password "archiveteam"<br />
**type "ifconfig" and note the IP address given under "eth0"<br />
**enter "http://x.x.x.x:8001" in your browser (where x.x.x.x is the IP address you noted)<br />
**(I really hope the login screen will, at some point in time, display the external IP Address instead of "localhost")<br />
<br />
--[[User:Darkstar|Darkstar]] 18:55, 9 August 2012 (EDT)<br />
<br />
== Autorun for lazy people ==<br />
<br />
I've given up running the warrior on fedora because I have to fix the VirtualBox packages manually every time I update the kernel, but I'm now trying on an Ubuntu machine I don't use personally. Something that the instructions should cover, at some point, is how to set it up so that it's started automatically at system boot. I found only slightly complicated instructions around the web, but also some indication that it may be much easier with VirtualBox 4.2 (Ubuntu repos currently have 4.1, AFAICS). --[[User:Nemo_bis|Nemo]] 07:55, 12 June 2013 (EDT)<br />
<br />
== Lifetime stats page for the warrior? ==<br />
<br />
It'd be nice if the warrior had a lifetime bandwidth stats page.<br />
<br />
== AWS image ==<br />
<br />
To get some additional visibility and make the life of the occasional users of the Warrior on AWS, it would be nice to get an ArchiveTeam Warrior image in the [https://aws.amazon.com/marketplace/ AWS marketplace] (for free, of course). Also in other directories and for other hosting providers, of course, if someone prefers contributing to those. --[[User:Nemo_bis|Nemo]] 11:18, 23 December 2013 (EST)<br />
<br />
== Viewing the Control Panel on a external computer ==<br />
<br />
If you run the Warrior VM on an external computer, it might be useful to view the control panel from your main computer.<br />
<br />
To do this, on the remote computer, go to VirtualBox, select the Warrior VM, Settings, and then Network.<br />
<br />
If not already down, click the Advanced tab and you should then see a button that says Port Forwarding. Click it.<br />
<br />
In the Web Interface Row, find the Host IP Column, and change it to 0.0.0.0 and press OK.<br />
<br />
On your main computer, you can now type in the remote computer's local IP address into the web browser with the port number to view its stats.<br />
<br />
Example: http//192.168.1.57:8001<br />
<br />
--[[User:Crypto|Crypto]] 16:48, 10 August 2014 (EDT)<br />
<br />
== Running more than one warrior on a computer (VirtualBox) ==<br />
<br />
Sometimes the situation might arise that you want to run two or even more warriors on one machine. For this example, we are going to assume you are already running one warrior, and you want to add another one.<br />
<br />
Please note, make sure you have enough resources to run multiple warrior VMs. Each Warrior will use 400MB of ram, and at Max 60GB of Disk Space.<br />
<br />
First off, you are going to need to import the warrior instance. If you still have the .ova file you used to import the original warrior, you may use import it again.<br />
<br />
While importing, make sure to change the name.<br />
<br />
Once imported, go to the second warrior and go to settings.<br />
<br />
In the settings menu, go to Network, then click the Advanced Tab, and click on Port Forwarding.<br />
<br />
Once in the Port Forwarding menu, change the host port to 8002, but leave the guest one alone.<br />
<br />
Done! You can boot up the second warrior, and administrate it through port 8002. You may change the host port to whatever you want, just leave the guest port to 8001.<br />
<br />
--[[User:Crypto|Crypto]] 02:55, 11 August 2014 (EDT)<br />
<br />
That's very useful but instead of importing a new warrior I just right clicked on my original one and chosen clone. I created as many warriors as my laptop could afford -taking into consideration its specs- and then followed you steps about Port Forwarding. It's a very easy and straightforward process.<br />
<br />
--[[User:PanoIgano|PanoIgano]] 16:17, 08 May 2015 (EDT)<br />
<br />
== Fix broken anchor links ==<br />
<br />
Previous proposal "Split the FAQ into "Problems" and "I'm curious" sections" was done. It has left broken HTML anchor URLs that previously linked to specific sections of the guide e.g. http://archiveteam.org/index.php?title=Warrior#Help.21_The_warrior_is_eating_all_my_bandwidth.21 from here: reddit /help_us_archive_the_steam_users_forum_aka_spuf/<br />
Is there a way to specifically add just anchors to a page?<br />
<br />
[[User:VADemon|VADemon]] ([[User talk:VADemon|talk]]) 14:48, 5 October 2017 (EDT)<br />
<br />
== Some notes on using KVM/QEMU ==<br />
<br />
Putting some stuff here for further looking into.<br />
<br />
* [https://wiki.hackzine.org/sysadmin/kvm-import-ova.html KVM: Importing an OVA appliance]<br />
<br />
----<br />
<br />
== User scripts to launch multiple Warriors ==<br />
<br />
*[https://gist.github.com/Frinkel/0f1d36d8e29820fc593b5876c7895980 Frinkel on DigitalOcean]<br />
*[https://gitlab.com/diggan/archiveteam-infra diggan automated on DigitalOcean]<br />
*[https://pastebin.com/uE5F0XEr tungol on AWS]<br />
<br />
[[User:Nemo_bis/Tumblr]] is only for basic script running<br />
--[[User:Nemo_bis|Nemo]] 21:52, 16 December 2018 (UTC)</div>Kazhttps://wiki.archiveteam.org/index.php?title=Archiveteam:IRC&diff=45494Archiveteam:IRC2020-09-15T12:11:21Z<p>Kaz: Add extbans line</p>
<hr />
<div>'''IRC''' (Internet Relay Chat) is an internet protocol that allows multiple users to connect to a server and chat. Each IRC "server" can be connected to by a person, then someone joins a "channel" with the particular subject they are interested in.<br />
<br />
ArchiveTeam uses IRC as its one-stop shop for coordinating projects.<br />
<br />
'''Before you go ahead and jump in, if there's nothing else you read on this page please at least take a moment to review the [[#Special ArchiveTeam IRC rules|Special ArchiveTeam IRC rules]] section below.'''<br />
<br />
== How do I chat on IRC? ==<br />
<br />
You will need an IRC client, or you can use a web interface.<br />
<br />
EFnet, one of the networks that ArchiveTeam uses, provides a [http://www.efnet.org/ web interface called Webchat]. Enter a nickname (such as your first name, your pet's name, or a cool pseudonym of your choice) and then the channel's name, ie <code>#archiveteam</code>.<br />
<br />
Since September 2019, most project-specific channels have been hosted on the [https://www.hackint.org/ hackint IRC network] due to desires like less netsplits, channel/nickname registration, secure IRC by default, etc. It too also has a [https://webirc.hackint.org/#irc://irc.hackint.org/#archiveteam web interface]. A comprehensive comparison between the two networks by JAA can be found under [[User:JustAnotherArchivist/hackint vs EFnet]].<br />
<br />
{{notice|1=Please learn IRC '''netiquette'''.<br />
<br />
''Do not barge into an IRC channel demanding help or disparage installation instructions.''<br />
<br />
Archive Team is not a professional support team.<br />
}}<br />
<br />
=== Do I have to use IRC? ===<br />
<br />
We prefer IRC because there is no central point of failure, but see the question about social media on [[Frequently Asked Questions]].<br />
<br />
=== Why does IRC need chat logs? ===<br />
<br />
Unlike a bulletin board or SMS, IRC is a transient medium of communication. As a result, if you aren't there to receive the message, you will never receive it at all.<br />
<br />
If you check the chat logs, your question may already be answered. Unfortunately, some channels are not logged. Don't worry if you accidentally interrupt someone's conversation or repeat a question.<br />
<br />
See the section [[#IRC_Logs|IRC Logs]].<br />
<br />
=== I asked a question and waited but it scrolled off the window and was ignored. ===<br />
<br />
''Don't'' get discouraged; '''do''' ask again. Topics get intermixed and timezones break up normal conversations. Be persistent but friendly.<br />
<br />
=== How do I get someone's attention in a public channel? ===<br />
<br />
Some chat clients will alert the user if you say their nickname.<br />
<br />
Some clients support nickname auto-completion. Start typing the first few letters of their nickname and press tab.<br />
<br />
=== Why won't anyone respond? ===<br />
<br />
If no one answers, please be patient. We're volunteers so we can't always respond immediately. We eat, drink, sleep, and archive just like you! Note that IRC channels are '''not''' like Discord, Telegram, Slack, or similar channels - do not expect real-time responses the next second. Wait a few minutes, but be prepared to stay around for a little bit.<br />
<br />
Sometimes it may be the [https://en.wikipedia.org/wiki/Bystander_effect bystander effect]. Try an [https://en.wiktionary.org/wiki/icebreaker icebreaker] to get the conversation going.<br />
<br />
=== I can't wait; I need immediate attention. Who's in charge? ===<br />
<br />
See [[Who We Are]].<br />
<br />
=== Special ArchiveTeam IRC rules ===<br />
<br />
Besides the expectation of being civilized, patient and tactful, there are some rules you should follow when in ArchiveTeam IRC channels. Breaking them makes you become annoying in the community and you can easily find yourself banned. The most pertinent of these are as follows:<br />
<br />
* <code>#archiveteam</code> is generally reserved for short and important information exchange, e.g. concise announcements about websites shutting down, project status updates, easily answerable important questions, etc. '''All general and in-depth archiving-related discussion happens in <code>#archiveteam-bs</code>''' (this channel is monitored and you will very likely not need to wait hours for an answer). Project-specific discussions go in their [[Projects|respective]] channels. General topics not related to computers and/or archiving at all are not welcome even in <code>#archiveteam-bs</code> (try <code>#archiveteam-ot</code> for such topics instead).<ref>https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2014-09-09,Tue&sel=110#l106</ref><br />
* Don't ask too many questions, don't demand answers from others. Sometimes you can look it up yourself, sometimes you need to filter your questions for important ones. You can also [https://archive.fart.website/bin/irclogger_logs search the logs].<ref>https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2014-09-09,Tue&sel=126#l122</ref><br />
* Don't maliciously/demandingly criticize Archive Team, its members, nor the Internet Archive, especially in general, empty phrases.<ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2016-01-02,Sat&sel=143#l139</ref><ref>https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2016-02-15,Mon&sel=43#l39</ref> If you have a remark/idea, be concrete and constructive (and polite and patient), and if you can, realize it yourself (we're volunteers otherwise busy). Remember the money-back guarantee!<ref name="moneybackguarantee">https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-07-05,Sun&sel=170#l166</ref><br />
* Don't try to convince ArchiveTeam about that archiving is bad. We make very few exceptions when it's about archiving. Also, our rule of thumb is "archive first, ask questions later".<ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-06-25,Thu&sel=58#l54</ref><ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-10-17,Sat&sel=154#l150</ref><ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-10-17,Sat&sel=231#l227</ref> Our IRC channels are the #1 worst place to ask "why we are keeping this"!<ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-10-17,Sat&sel=233#l229</ref><br />
* Don't be childish.<ref>https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2015-10-28,Wed&sel=7#l3</ref><ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2016-01-09,Sat&sel=285#l281</ref><ref name="moneybackguarantee" /><br />
* Don't feed the trolls. (Don't engage into arguments with people not behaving appropriately.)<ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-06-10,Wed&sel=442#l438</ref><br />
* Don't explain us evident things in detail.<ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2016-03-10,Thu&sel=425#l421</ref><br />
* Don't let your IRC client flood the channels with join/leave notifications due to you unstable connection.<ref>https://archive.fart.website/bin/irclogger_log/archiveteam?date=2015-12-14,Mon&sel=232#l228</ref><ref>https://archive.fart.website/bin/irclogger_log/archiveteam-bs?date=2016-06-11,Sat&sel=140#l136</ref><br />
<br />
<div style="font-size:80%"><references/></div><br />
<br />
== ArchiveTeam on IRC ==<br />
<br />
Below is a list of ArchiveTeam's '''general-purpose''' IRC channels. '''Project-specific''' channels can be found in the [[Projects]]' list. All the channels listed below are on the [http://www.efnet.org/ EFnet] network. <br />
<br />
(Back then we had a separate list of project-specific IRC channels, under the general channels. For historical interest, you can find them on the [[IRC/Old]] page.)<br />
<br />
{| border="1" align="center" style="text-align:center;" cellpadding="6"<br />
! Channel name !! Channel hashtag !! Channel description<br />
|-<br />
| ArchiveTeam || [irc://irc.efnet.org/archiveteam #archiveteam] || The main ArchiveTeam channel, mainly used for news, announcement and early project planning.<br />
|-<br />
| -bs || [irc://irc.efnet.org/archiveteam-bs #archiveteam-bs] || Lengthy discussion for general archival and projects which don't have a separate channel.<br />
|-<br />
| -dev || [irc://irc.efnet.org/archiveteam-dev #archiveteam-dev] || Discussion about general (i.e. not project-specific) ArchiveTeam software development<br />
|-<br />
| -ot || [irc://irc.efnet.org/archiveteam-ot #archiveteam-ot] || Off-topic discussion<br />
|-<br />
| -twitter || [irc://irc.efnet.org/archiveteam-twitter #archiveteam-twitter] || <s>We have a twitter bot and it owns you.</s> All tweets by us, to us, or about us used to be displayed here by swebb bot, until twitter killed API v1.<br />
|-<br />
| Warrior || [irc://irc.efnet.org/warrior #warrior] || Channel for the discussion and development of the ArchiveTeam Warrior<br />
|-<br />
| ArchiveBot || [irc://irc.efnet.org/archivebot #archivebot] || Channel for controlling [[ArchiveBot]]. Discussions about ArchiveBot development also take place here.<br />
|}<br />
<br />
== IRC Logs ==<br />
<br />
You can log the channels where you are using your client, generally. But if you want a 24/7 bot logging your channel, you can use a script like [https://web.archive.org/web/20100323000206/http://toolserver.org/~bryan/TsLogBot/TsLogBot.py this] (change the server and channel variables).<br />
<br />
[[User:Chfoo|chfoo]] is hosting chat logs of some channels at https://archive.fart.website/bin/irclogger_logs. It also has a search function.<br />
<br />
== hackint specifics ==<br />
This section is documenting how to properly and successfully run ArchiveTeam channels on the hackint IRC network.<br />
<br />
=== Connecting ===<br />
hackint enforces secure connections with TLS (aka "SSL"). The servers have valid certificates, so do ''not'' disable certificate verification in your client when connecting to hackint (or enable it if disabled by default, e.g. use <code>-ssl_verify</code> on irssi and WeeChat).<br />
<br />
==== Weechat ====<br />
<br />
: <code>/server add hackint irc.hackint.org/6697 -ssl -autoconnect</code><br />
: <code>/save</code><br />
: <code>/connect hackint</code><br />
<br />
=== Services & authentication ===<br />
hackint has the usual services one would expect from a sensible IRC network: NickServ to register user accounts, ChanServ to manage channels, HostServ for hostmasks, MemoServ for sending messages (memos) to users currently offline or groups, and GroupServ for groups of people.<br />
<br />
The most important part for most users is registering and authenticating a user account. To register, issue:<br />
<br />
: <code>/msg NickServ REGISTER password email@example.org</code><br />
<br />
A verification email is sent to the email address entered here; this email address is needed for recovery in case you ever lose the authentication data for your nick.<br />
<br />
After registration, you need to authenticate on every connection. There are several ways how you can do that: SASL PLAIN, SASL EXTERNAL, SASL ECDSA-NIST256P, CertFP, or the traditional but least reliable <code>/msg NickServ IDENTIFY password</code>. It is recommended to use SASL since this will authenticate you immediately during the initial connection establishment; CertFP and messaging NickServ may have a delay, which can cause joins to protected channels to fail, for example.<br />
<br />
==== SASL PLAIN (weechat) ====<br />
After you've followed the instructions to register an account you can setup weechat to automatically log you in when connecting to the irc network. Issue the following commands to configure your account, save the config and then reconnect to test it:<br />
<br />
: <code>/set irc.server.hackint.sasl_mechanism PLAIN</code><br />
: <code>/set irc.server.hackint.sasl_username <login></code><br />
: <code>/set irc.server.hackint.sasl_password <password></code><br />
: <code>/save</code><br />
: <code>/reconnect hackint</code><br />
<br />
==== Certificate authentication (SASL EXTERNAL or CertFP) ====<br />
To authenticate using a certificate through either CertFP or SASL EXTERNAL, you need to generate a client certificate. For example:<br />
<br />
: <code>openssl req -nodes -newkey rsa:4096 -keyout /secure/path/nick.key -x509 -days 36500 -out /secure/path/nick.cer</code><br />
<br />
Depending on the client and/or its SSL/TLS library, you may need to combine the two into one file:<br />
<br />
: <code>cat /secure/path/nick.cer /secure/path/nick.key >/secure/path/nick.pem</code><br />
: <code>chmod 600 /secure/path/nick.pem</code><br />
<br />
Then, instruct your client to use this certificate on connecting to hackint. Instructions for various clients can be found on the [https://www.oftc.net/NickServ/CertFP/ CertFP documentation by OFTC].<br />
<br />
On the first connection using this certificate, you need to add its fingerprint to NickServ (after authenticating with <code>IDENTIFY</code>):<br />
<br />
: <code>/msg NickServ CERT ADD</code><br />
<br />
On any later connections, you will get authenticated automatically.<br />
<br />
It is recommended to use SASL EXTERNAL rather than relying on CertFP; only the former will ensure that the authentication happens immediately on connecting and you will be able to join access-restricted channels without issues.<br />
<br />
=== Creating a channel ===<br />
If you're opening a channel for ArchiveTeam usage, it is recommended to register it and set the right flags for the <code>!archiveteam-core</code> group. Make sure you're identified with NickServ.<br />
<br />
Registering the channel is done with:<br />
<br />
: <code>/msg ChanServ REGISTER #example</code><br />
<br />
And setting the flags is done with:<br />
<br />
: <code>/msg ChanServ FLAGS #example !archiveteam-core +*SF</code><br />
<br />
This grants everyone in that group full control over the channel.<br />
<br />
You should also utilise extbans to mirror the banlist from #archiveteam:<br />
<br />
: <code>/mode #example +b $j:#archiveteam</code><br />
<br />
=== Virtual hosts (vhosts) ===<br />
If you are part of the <code>!archiveteam-core</code> group in GroupServ, you can use an ArchiveTeam vhost with:<br />
<br />
: <code>/msg HostServ TAKE archiveteam/$account</code><br />
<br />
(<code>$account</code> is not a placeholder; enter it literally.)<br />
<br />
If you are not part of the group, you can use the general hackint vhost:<br />
<br />
: <code>/msg HostServ TAKE hackint/user/$account</code><br />
<br />
The vhost is activated automatically when you authenticate. Note that your actual host address may still be visible to others.<br />
<br />
<!--<br />
== Unofficial ArchiveTeam QDB (Offline) ==<br />
ArchiveTeamsters are encouraged to visit and contribute to the unofficial [http://www.deaddyingdamned.com/qdb/ ArchiveTeam quote database].<br />
--><br />
<br />
[[Category:Archive Team]]<br />
<br />
{{Navigation box}}</div>Kazhttps://wiki.archiveteam.org/index.php?title=Move_Archiveteam_to_Hackint&diff=45345Move Archiveteam to Hackint2020-08-13T19:49:28Z<p>Kaz: Kaz++</p>
<hr />
<div>__NOTOC__<br />
<big>PROPOSED: MOVE ARCHIVETEAM IRC COMMUNICATION PRIMARILY TO HACKINT FROM EFNET</big><br />
<br />
Proposed by JAA a good long while ago, the proposal is to move Archive Team's IRC channels (and many project sub-channels) from EFNet to HackINT.<br />
<br />
As is typical, we're currently split between the two networks, with many channels in HackInt and many others in EFNet, depending on the preferences and inclinations of various members. Honestly, this can't continue. As most activity is happening in Hackint anyway, and because we might as well use this for a quorum discussion, this page exists for discussion (along with the talk page) *NOW* (Mid-August) to September 30th, at which point it will (hopefully) be very clear which direction we should go. This page is likely to get increased changes and traffic as time goes, so check back often.<br />
<br />
* Information on hackint is here: https://www.hackint.org/<br />
* Information on EFNet is here: http://www.efnet.org/<br />
<br />
= Arguments for moving Archive Team to Hackint =<br />
* IRC Services: No need to micromanage ops in each project channel, ease of administration<br />
* Stability: Handful of netsplits that lasted just a few minutes on Hackint compared to the ones on EFnet<br />
* Support: IRC staff has proven to be very helpful and useful and helped us get running with channel and user administration<br />
* Limits: Per-connection channel limits are much higher on Hackint and do not differ across servers<br />
<br />
= Arguments for keeping Archive Team on EFnet =<br />
* IRC Services: Inherently leads to a certralisation of permissions vs current system.<br />
* It has Always Been EFNet, we shouldn't uproot our long-standing relationship and work with that network.<br />
* EFNet is the longest-lived Network, showing it's here to stay.<br />
* We should just engage with EFNet to make them more hospitable for Archive Team needs.<br />
<br />
== How Moving to HackInt would Work ==<br />
<br />
Jason says "There would almost certainly be a #archiveteam channel on EFNet forever, with some people sitting in related long-time channels like #archiveteam-bs and #archiveteam-ot - but project channels would shut down and move to HackInt. So we'd still have a split, but the channels on EFNet would be more like either social hangouts or represent outreach to guide people to the other location."<br />
<br />
== Signatories ==<br />
<br />
* This is not a vote; this is a show of support in one direction.<br />
<br />
Edit and add your name to one of the lists below if you have a strong opinion one way or another. Describe your thinking, if you'd like. Do not add others if they are not on the Wiki.<br />
<br />
=== In Favor of Move to Hackint ===<br />
* [[User:Kiska]]<br />
* [[User:wessel1512]]<br />
* [[User:Aoede]]<br />
* [[User:Fusl]]<br />
* [[User:Katocala]]<br />
* [[User:ivan]]<br />
* [[User:Flashfire42]]<br />
* [[User:Jake]]<br />
* [[User:Kaz]]<br />
<br />
=== In Favor of Staying at EFNet ===</div>Kazhttps://wiki.archiveteam.org/index.php?title=Move_Archiveteam_to_Hackint&diff=45320Move Archiveteam to Hackint2020-08-11T21:16:12Z<p>Kaz: services!</p>
<hr />
<div><big>PROPOSED: MOVE ARCHIVETEAM IRC COMMUNICATION PRIMARILY TO HACKINT FROM EFNET</big><br />
<br />
Proposed by JAA a good long while ago, the proposal is to move Archive Team's IRC channels (and many project sub-channels) from EFNet to HackINT.<br />
<br />
As is typical, we're currently split between the two networks, with many channels in HackInt and many others in EFNet, depending on the preferences and inclinations of various members. Honestly, this can't continue. As most activity is happening in Hackint anyway, and because we might as well use this for a quorum discussion, this page exists for discussion (along with the talk page) *NOW* (Mid-August) to September 30th, at which point it will (hopefully) be very clear which direction we should go. This page is likely to get increased changes and traffic as time goes, so check back often.<br />
<br />
* Information on hackint is here: https://www.hackint.org/<br />
* Information on EFNet is here: http://www.efnet.org/<br />
<br />
= Arguments for moving Archive Team to Hackint =<br />
* IRC Services: No need to micromanage ops in each project channel, ease of administration<br />
<br />
= Arguments for keeping Archive Team on EFnet =<br />
* IRC Services: Inherently leads to a certralisation of permissions vs current system<br />
<br />
== How Moving to HackInt would Work ==<br />
<br />
Jason says "There would almost certainly be a #archiveteam channel on EFNet forever, with some people sitting in related long-time channels like #archiveteam-bs and #archiveteam-ot - but project channels would shut down and move to HackInt. So we'd still have a split, but the channels on EFNet would be more like either social hangouts or represent outreach to guide people to the other location.</div>Kazhttps://wiki.archiveteam.org/index.php?title=Your_Shot&diff=41454Your Shot2019-10-17T13:06:13Z<p>Kaz: Undo revision 41453 by Kaz (talk)</p>
<hr />
<div>{{Infobox project<br />
| title = yourshot<br />
| image = Yourshot-logo.png<br />
| description = <br />
| URL = https://yourshot.nationalgeographic.com/<br />
| project_status = {{online}}<br />
| irc = gotshot@irc.hackint.org:6697<br />
| tracker = http://tracker.archiveteam.org/yourshot/#show-all<br />
| source = http://github.com/archiveteam/yourshot-grab<br />
| archiving_status = {{in_progress}}<br />
}}<br />
Your Shot is a user contributed photo site run by National Geographic Magazine.<br />
As part of an acquisition of Fox, Disney has decided to shutter it with the last date of access on Oct 31, 2019.<br />
<br />
Search returns a growing 10,254,953 Results but is believed to be accurate.<br />
Latest image ID's are 14240034, implies 4million deleted.<br />
<br />
=== Page types ===<br />
Search: Date, Location, Keyword<br />
* https://yourshot.nationalgeographic.com/photos/?start_date=2019-08-01&end_date=2019-08-01&sort_by=-publication_date<br />
* https://yourshot.nationalgeographic.com/photos/?end_date=2019-09-28&location=los%20angeles&sort_by=-publication_date<br />
* https://yourshot.nationalgeographic.com/photos/?end_date=2019-09-28&keywords=cats&sort_by=-publication_date<br />
Tags / Category <br />
* https://yourshot.nationalgeographic.com/tags/cats/<br />
* https://yourshot.nationalgeographic.com/categories/travel/<br />
Photo Detail<br />
* https://yourshot.nationalgeographic.com/photos/5356403/<br />
User Profile<br />
* https://yourshot.nationalgeographic.com/profile/254418/<br />
<br />
Need to find:<br />
* Original un-resized images<br />
<br />
=== More content types ===<br />
Any 404 page lists additional content that will need further examination<br />
https://yourshot.nationalgeographic.com/404<br />
<br />
* Homepage<br />
* Assignments<br />
* Stories<br />
* Photos<br />
* Photographers<br />
* Discussions<br />
* Editor's Blog<br />
* Daily Dozen<br />
<br />
<br />
=== Warnings/Caveats/Bugs ===<br />
<br />
* Using a Firefox UA causes timeouts when using python scripts however a wget UA avoids this behavior<br />
* using too narrow/small of a window will cause a redirect to the mobile version<br />
https://m.yourshot.nationalgeographic.com/<br />
This will also occur during playback.<br />
* GPDR pop-up is thought to be triggered by JS checking a cookie or browser storage. A cookie named "NatGeo_Cookie_Consent__fallback" can be seen matching the response of<br />
https://api.nationalgeographic.com/consent-tracking<br />
{<br />
"country_code": "US",<br />
"consent_required": false<br />
} <br />
<br />
{<br />
"country_code": "AT",<br />
"consent_required": true<br />
} <br />
<br />
=== References ===<br />
* https://yourshot.nationalgeographic.com/discussions/discussion/2376/your-shot-platform-update/p1?new=1</div>Kazhttps://wiki.archiveteam.org/index.php?title=Your_Shot&diff=41453Your Shot2019-10-17T13:02:02Z<p>Kaz: Move to hackint</p>
<hr />
<div>{{Infobox project<br />
| title = yourshot<br />
| image = Yourshot-logo.png<br />
| description = <br />
| URL = https://yourshot.nationalgeographic.com/<br />
| project_status = {{online}}<br />
| irc-hackint = gotshot<br />
| tracker = http://tracker.archiveteam.org/yourshot/#show-all<br />
| source = http://github.com/archiveteam/yourshot-grab<br />
| archiving_status = {{in_progress}}<br />
}}<br />
Your Shot is a user contributed photo site run by National Geographic Magazine.<br />
As part of an acquisition of Fox, Disney has decided to shutter it with the last date of access on Oct 31, 2019.<br />
<br />
Search returns a growing 10,254,953 Results but is believed to be accurate.<br />
Latest image ID's are 14240034, implies 4million deleted.<br />
<br />
=== Page types ===<br />
Search: Date, Location, Keyword<br />
* https://yourshot.nationalgeographic.com/photos/?start_date=2019-08-01&end_date=2019-08-01&sort_by=-publication_date<br />
* https://yourshot.nationalgeographic.com/photos/?end_date=2019-09-28&location=los%20angeles&sort_by=-publication_date<br />
* https://yourshot.nationalgeographic.com/photos/?end_date=2019-09-28&keywords=cats&sort_by=-publication_date<br />
Tags / Category <br />
* https://yourshot.nationalgeographic.com/tags/cats/<br />
* https://yourshot.nationalgeographic.com/categories/travel/<br />
Photo Detail<br />
* https://yourshot.nationalgeographic.com/photos/5356403/<br />
User Profile<br />
* https://yourshot.nationalgeographic.com/profile/254418/<br />
<br />
Need to find:<br />
* Original un-resized images<br />
<br />
=== More content types ===<br />
Any 404 page lists additional content that will need further examination<br />
https://yourshot.nationalgeographic.com/404<br />
<br />
* Homepage<br />
* Assignments<br />
* Stories<br />
* Photos<br />
* Photographers<br />
* Discussions<br />
* Editor's Blog<br />
* Daily Dozen<br />
<br />
<br />
=== Warnings/Caveats/Bugs ===<br />
<br />
* Using a Firefox UA causes timeouts when using python scripts however a wget UA avoids this behavior<br />
* using too narrow/small of a window will cause a redirect to the mobile version<br />
https://m.yourshot.nationalgeographic.com/<br />
This will also occur during playback.<br />
* GPDR pop-up is thought to be triggered by JS checking a cookie or browser storage. A cookie named "NatGeo_Cookie_Consent__fallback" can be seen matching the response of<br />
https://api.nationalgeographic.com/consent-tracking<br />
{<br />
"country_code": "US",<br />
"consent_required": false<br />
} <br />
<br />
{<br />
"country_code": "AT",<br />
"consent_required": true<br />
} <br />
<br />
=== References ===<br />
* https://yourshot.nationalgeographic.com/discussions/discussion/2376/your-shot-platform-update/p1?new=1</div>Kazhttps://wiki.archiveteam.org/index.php?title=Template:IRC-Hackint&diff=41452Template:IRC-Hackint2019-10-17T13:01:18Z<p>Kaz: create hackint template</p>
<hr />
<div><noinclude><br />
For an IRC channel name with a link to hackint's web IRC interface. The sole parameter is the name of the channel (# sign must be omitted).<br />
<br />
Example:<br />
<nowiki>{{IRC-Hackint|archiveteam}}</nowiki><br />
Result:<br />
<span class="plainlinks">[https://webirc.hackint.org/#irc://irc.hackint.org/#archiveteam #archiveteam]</span><br />
<br />
[[Category:Templates]]<br />
</noinclude><includeonly><span class="plainlinks">[https://webirc.hackint.org/#irc://irc.hackint.org/#{{{1}}} #{{{1}}}]</span></includeonly></div>Kazhttps://wiki.archiveteam.org/index.php?title=Yahoo!_Groups&diff=41440Yahoo! Groups2019-10-16T07:06:11Z<p>Kaz: Adding channel</p>
<hr />
<div>{{Infobox project<br />
| title = Yahoo! Groups<br />
| url = http://groups.yahoo.com/<br />
| image = groups-yahoo-com.png<br />
| logo = yahoo-groups-logo.png<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
| irc = yahoosucks<br />
}}<br />
<br />
'''Yahoo! Groups''' is Yahoo's email service; it's the result of the acquisition of eGroups and some other Yahoo! stuff.<br />
<br />
It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)<br />
<br />
== Python Yahoo! Group Archiver == <br />
<br />
The [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] is a Python script which allows a scraping of the group. So far only messages are scraped. It puts all the info and metadata (both rendered message body and raw email) into a Mongo database, and provides a script to dump a static version of the site that can be read off of the filesystem. It works with Neo and with private groups by clunkily using Selenium to do the scraping.<br />
<br />
Another Python-based Archiver is [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver], which is a simple Python script to dump the messages into individual JSON files. No further processing of the messages is done to preserve them in the format Yahoo uses for displaying them. Private groups can be archived by providing the contents of two cookies that Yahoo uses to verify a logged-in user.<br />
<br />
== Perl Yahoo! Group Archiver ==<br />
<br />
Update: Apparently since Yahoo! Groups changed to the neo interface the script no longer functions and is no longer actively maintained.<br />
<br />
<s>The [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver] is a Perl script which allows an export of "the messages (without the attachments), everything from the files section and all the images from the photo section along with their hierarchy on Yahoo". <br />
<br />
It appears that, if you get the "Couldn't get message count" error when trying to use it, the solution is to edit the yahoo2maildir.pl file and replace the bottom line <code>my $url = $HTTP::URI_CLASS->new($redirect, $base)->abs($base);</code> (under the heading <code>sub GetJSRedirect</code>) with <code><nowiki>my $url = "http://groups.yahoo.com/group/$group/messages/$begin_msgid"; </nowiki></code><br />
<br />
More frustratingly, it appears that Yahoo blocks your IP temporarily after hitting some invisible limit of data downloaded (the Archiver will continue to "download" messages for a bit, ending up with a bunch of 0-byte files, then stop completely). It's unknown if there is a solution. <br />
<br />
Also: sometimes, some of the downloaded messages, in the middle of an otherwise normal batch, are 0 in size - almost as if Yahoo blocked your IP for a few seconds, then stopped. Watch out for these so that you can re-download them later.</s><br />
<br />
== Site Structure ==<br />
<br />
There’s a convenient JSON API. May require logging in and joining a group to use all endpoints:<br />
<br />
* Group Information: https://groups.yahoo.com/api/v1/groups/concatenative/<br />
* List of Messages: https://groups.yahoo.com/api/v1/groups/concatenative/messages?count=100<br />
* Specific Message: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/<br />
* Raw Message Content: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw – note that there seems to be a [https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean message encoding problem]<br />
* List of Topics: https://groups.yahoo.com/api/v1/groups/concatenative/topics?count=100<br />
* Specific Topic: https://groups.yahoo.com/api/v1/groups/concatenative/topics/1<br />
* List of Tables: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database<br />
* Specific Table: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/<br />
* Table Content: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records<br />
* List of Files: https://groups.yahoo.com/api/v1/groups/a_furrys_world/files<br />
* List of Attachments: https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments<br />
* List of Polls: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls?count=100<br />
* Specific Poll: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106<br />
* List of Photos: https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos<br />
* List of Albums: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums<br />
* Specific Album: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391<br />
* List Moderators: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/moderators<br />
* Members With Incorrect Emails: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/bouncing<br />
* List of Links: https://groups.yahoo.com/api/v1/groups/a_furrys_world/links<br />
* Search: https://groups.yahoo.com/api/v1/search/groups?offset=0&maxHits=20&sortBy=&query=abcdef – sort can be one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST<br />
* Categories: https://groups.yahoo.com/api/v1/dir/categories/0/?start=0<br />
<br />
Note that all paginated responses are limited to the first 500 results and do not return anything new beyond that.<br />
<br />
== Statistics ==<br />
<br />
As of 2017-07-16 the [https://groups.yahoo.com/neo/dir directory] lists 5599562 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.<br />
<br />
The following graphs are slightly outdated:<br />
<br />
[[File:Yahoo_groups_date_created.png]]<br />
[[File:Yahoo_groups_messages_per_group.png]]<br />
[[File:Yahoo_groups_post_date.png]]<br />
<br />
== Software for backups ==<br />
* [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver], Sourceforge<br />
<br />
== External Links ==<br />
<br />
* https://archive.org/details/yahoo_groups<br />
<br />
== References ==<br />
<br />
<references/><br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Yahoo!]]<br />
[[Category:Mailing lists]]</div>Kazhttps://wiki.archiveteam.org/index.php?title=Discord&diff=40360Discord2019-06-15T22:00:50Z<p>Kaz: ++zucc</p>
<hr />
<div>{{Infobox project<br />
| title = Discord<br />
| image = 2018-11-11_discordapp_com.png<br />
| URL = https://discordapp.com/<br />
| description = Free Voice and Text Chat for Gamers<br />
| project_status = {{online}}<br />
| archiving_status = {{specialcase}}<br />
| irc = discard<br />
| lead = <br />
}}<br />
'''Discord''' is basically modern day IRC used by gamers and gamer associates worldwide. It was introduced on 2015-05-13 by the today-eponymous company, Discord Inc. The Discord server and clients are proprietary, the use of Discord is free with paid bonuses.<br />
<br />
Discord enables users to create ''servers'' (called ''guilds'' internally), which further divide into text and voice channels. Those may have different access permissions depending on roles given to users on the server. Discord keeps unlimited logs of text channels. Anybody who joins a Discord server has access to full server history (except for channels with the seldom used "Read Message History" denial). Access to a Discord server is granted through invites in the forms of an URL, which may be unlimited until disabled or automatically expire.<br />
<br />
Discord's status is essentially ''deep web'', as Discord servers are unable to be indexed by conventional search engines and archival tools, even if invites are posted publicly. However, as it keeps history and has a well documented API, it proves possible for users, particularly admins, to create comprehensive archives of servers they have access to.<br />
<br />
'''Should Discord server archives be posted publicly?''' This is a question worth pondering on. On IRC the implicit agreement is that you don't publish logs, but the privacy story is different on Discord, where everybody who joins gains access to years of logs. Given that closed services like Discord and [[Telegram]] are on the road to displace traditional services like message boards, there is culture worth saving. Archival and noindex publication could be appropriate. Note that Discord explicitly forbids collecting and disclosing end user data in their [https://discordapp.com/developers/docs/legal Developer Terms of Service] (point 2.4 End User Data). See also our attitude to [[Robots.txt]]. Make your call.<br />
<br />
== Software ==<br />
Note that ''use data mining, robots, spiders, or similar data gathering and extraction tools'' is against Discord's [https://discordapp.com/terms Terms of Service]. Using these tools with your account may get your account banned. '''Use at your own risk!'''<br />
<br />
Software for archiving Discord servers include:<br />
<br />
* '''[https://github.com/tsudoko/pullcord pullcord]''', created by ArchiveTeam user [[User:moufu|moufu]]. pullcord is a Go command line program which supports incremental archival of channel logs, server logs, attachments, avatars, server icons, server splashes, and emoji.<br />
* '''[https://github.com/Tyrrrz/DiscordChatExporter DiscordChatExporter]''' is a GUI program which can be used to export message history from a Discord channel to a file.<br />
* '''[https://dht.chylex.com/ Discord History Tracker]''' is a ''browser plugin'' that lets you save Discord channel logs one by one.<br />
* '''[https://github.com/guineawheek/zucc Zucc]''' is a Discord bot that dumps messages and other data from a guild/server<br />
<br />
== Discord server aggregators ==<br />
<br />
* [https://disboard.org/ https://disboard.org/] ({{job|cul77udia7xxrtpvlwxso1we9}} on 2019-05-01)<br />
* [https://discordbots.org/ https://discordbots.org/] ({{job|10omwqmi1h5f1efu5mcqkpwb6}} on 2019-05-01)<br />
* [https://discord.me/ https://discord.me/]<br />
<br />
<br />
<br />
{{Template:Instant messengers}}</div>Kazhttps://wiki.archiveteam.org/index.php?title=ArchiveBox&diff=36850ArchiveBox2019-04-17T20:02:54Z<p>Kaz: Revert TheSquashSH changes.</p>
<hr />
<div>{{Historical}}<br />
<br />
'''Note: This is a copy of the ArchiveBox notes [http://piratepad.net/archivebox from PiratePad], duplicated here in case PiratePad temporarily goes down again. For now, PiratePad is still the point of truth - edit that instead of this.'''<br />
<br />
''Archive Team Junior Woodchuck Kit - It's sort of Unix (Quote from SketchCow)''<br />
<br />
This project aims to equip people who have less experience with Linux systems with the needed tools to aid the Archive Team (AT). The project is split into two parts, one part being the live Debian environment with usable archiving tools, the other being a repository for the Archive Team to prepare for new backup projects via packages.<br />
<br />
== Virtual Debian environment ==<br />
A VirtualBox hard drive image that can be booted into a minimal, customized version of Debian Squeeze which will have a terminal environment that downloads according to the "Project Definitions" described in 2.<br />
<br />
Packages in the vanilla Archive Box:<br />
* wget<br />
* curl<br />
* rsync<br />
* archiveteam-console (see 2.1)<br />
* X environment with xfce I think this should be LXDE, personally. Smaller, faster, lighterweight<br />
* perl<br />
<br />
xfce needs to be configured such that archiveteam-console runs in a terminal on boot. If possible we should avoid the user having to log in to the system before use, just power it on and download awesome shit.<br />
<br />
From a securing-ourselves-from-stupid-users perspective it would make sense to chmod the files in a way that would make them hard to delete. Then, after rsyncing it to the server we could give them a token that would allow deletion of the files. But is that too 'evil'?<br />
<br />
== Debian ArchiveTeam repository ==<br />
The Archive Team repository provides Debian packages with the needed scripts (and packages that are needed to use specific scripts for downloads, for example Perl or Python) to help in any archiving projects. The pros with this approach are that we can easily push updates to AT-provided software, and that we can define package dependencies (if an archiving script was written in some scripting language not available in the vanilla install).<br />
<br />
The server is hosted by JC (jch) and is located at 130.225.236.19 or archiveteam.hackerspaces.dk. We probably need a domain name under .archiveteam.org before going live.<br />
<br />
Project definitions are named under the following hierarchy: archiveteam-FOO (foo being the project name, for example flickrfckr). "archiveteam-" packages will also add a module to archiveteam-console (see 2.1) making it easy to navigate and participate in the project without any knowledge of Linux terminal scripting.<br />
<br />
=== archiveteam-console ===<br />
archiveteam-console is the main application in any Archive Box install. It's the springboard to any AT-related task, as it allows:<br />
* Running archiving scripts in the background (behind the scenes we'd like to use screen perhaps) <br />
* Checking the status of current AT efforts. IDEA: our dumping scripts should answer SIGUSR1 with relevant status info to be shown in archiveteam-console.<br />
* Pushing the downloaded data to a remote server with rsync.<br />
* Requesting remote support from IRC users by means of a reverse shell, granting the helper shell access to the machine. <br />
<br />
As stated earlier archiveteam-console will have a directory structure for adding modules into it. This means that "archvieteam-" packages will actively alter the menu structure in archiveteam-console in order to reflect the new module having been installed.<br />
<br />
== Open problems ==<br />
* We need some way to dynamically and reliably resize the file system inside the virtual machine. VirtualBox and other VM hosts can allow this to be done, however, the Archive Box should be able to do this without any user intervention (out-of-the-box). (NOTE from BlueMaxima: Tested importing and exporting an appliance and it seemed that the dynamically expanding hard disk was preserved between import and export.)<br />
* Should we use a VirtualBox disk image or a VirtualBox appliance? It seems that Appliance is the best way to go.<br />
* Is it possible to run into problems like proxies or firewalls? If so, there should be a way to configure the Archive Box to work around these or allow configuration to bypass them.<br />
* Would it make sense to devise a protocol for archiving scripts to auto-report status back to. It would give us much better overview over how far our archiving effort has come. This would make sense. The question is, how would someone review this data and control the archiving.<br />
* Would we give the user a root account or a normal user account, to prevent any problems that may pop up (even though if the archiveteam-console package is designed right this issue could be avoided)<br />
<br />
== Who's working on what ==<br />
* jch: Writes everything, plans everything, etc.. Also, sets up the repository server.<br />
* Auguste: Prepares the actual distribution, currently working on the first version of it.<br />
* SketchCow: Watching us, making sure we know this is a bad idea. :P<br />
* BlueMax(ima): Corrects language, unofficial beta test, unofficial groupie<br />
* Underscor: Able to work on the archiveteam-console app, if people want him to. Python, bash, and ncurses junkie ;D<br />
<br />
{{Navigation box}}</div>Kazhttps://wiki.archiveteam.org/index.php?title=ArchiveToolBox&diff=36849ArchiveToolBox2019-04-17T20:01:47Z<p>Kaz: Kaz moved page ArchiveToolBox to ArchiveBox over redirect: No, changing the page name because your github project isn't getting enough clicks isn't a valid reason.</p>
<hr />
<div>#REDIRECT [[ArchiveBox]]</div>Kazhttps://wiki.archiveteam.org/index.php?title=ArchiveBox&diff=36848ArchiveBox2019-04-17T20:01:46Z<p>Kaz: Kaz moved page ArchiveToolBox to ArchiveBox over redirect: No, changing the page name because your github project isn't getting enough clicks isn't a valid reason.</p>
<hr />
<div>{{Historical}}<br />
<br />
'''Note: This is a copy of the Archive Box notes [http://piratepad.net/archivebox from PiratePad], duplicated here in case PiratePad temporarily goes down again. For now, PiratePad is still the point of truth - edit that instead of this.'''<br />
<br />
''Archive Team Junior Woodchuck Kit - It's sort of Unix (Quote from SketchCow)''<br />
<br />
This project aims to equip people who have less experience with Linux systems with the needed tools to aid the Archive Team (AT). The project is split into two parts, one part being the live Debian environment with usable archiving tools, the other being a repository for the Archive Team to prepare for new backup projects via packages.<br />
<br />
== Virtual Debian environment ==<br />
A VirtualBox hard drive image that can be booted into a minimal, customized version of Debian Squeeze which will have a terminal environment that downloads according to the "Project Definitions" described in 2.<br />
<br />
Packages in the vanilla Archive Box:<br />
* wget<br />
* curl<br />
* rsync<br />
* archiveteam-console (see 2.1)<br />
* X environment with xfce I think this should be LXDE, personally. Smaller, faster, lighterweight<br />
* perl<br />
<br />
xfce needs to be configured such that archiveteam-console runs in a terminal on boot. If possible we should avoid the user having to log in to the system before use, just power it on and download awesome shit.<br />
<br />
From a securing-ourselves-from-stupid-users perspective it would make sense to chmod the files in a way that would make them hard to delete. Then, after rsyncing it to the server we could give them a token that would allow deletion of the files. But is that too 'evil'?<br />
<br />
== Debian ArchiveTeam repository ==<br />
The Archive Team repository provides Debian packages with the needed scripts (and packages that are needed to use specific scripts for downloads, for example Perl or Python) to help in any archiving projects. The pros with this approach are that we can easily push updates to AT-provided software, and that we can define package dependencies (if an archiving script was written in some scripting language not available in the vanilla install).<br />
<br />
The server is hosted by JC (jch) and is located at 130.225.236.19 or archiveteam.hackerspaces.dk. We probably need a domain name under .archiveteam.org before going live.<br />
<br />
Project definitions are named under the following hierarchy: archiveteam-FOO (foo being the project name, for example flickrfckr). "archiveteam-" packages will also add a module to archiveteam-console (see 2.1) making it easy to navigate and participate in the project without any knowledge of Linux terminal scripting.<br />
<br />
=== archiveteam-console ===<br />
archiveteam-console is the main application in any Archive Box install. It's the springboard to any AT-related task, as it allows:<br />
* Running archiving scripts in the background (behind the scenes we'd like to use screen perhaps) <br />
* Checking the status of current AT efforts. IDEA: our dumping scripts should answer SIGUSR1 with relevant status info to be shown in archiveteam-console.<br />
* Pushing the downloaded data to a remote server with rsync.<br />
* Requesting remote support from IRC users by means of a reverse shell, granting the helper shell access to the machine. <br />
<br />
As stated earlier archiveteam-console will have a directory structure for adding modules into it. This means that "archvieteam-" packages will actively alter the menu structure in archiveteam-console in order to reflect the new module having been installed.<br />
<br />
== Open problems ==<br />
* We need some way to dynamically and reliably resize the file system inside the virtual machine. VirtualBox and other VM hosts can allow this to be done, however, the Archive Box should be able to do this without any user intervention (out-of-the-box). (NOTE from BlueMaxima: Tested importing and exporting an appliance and it seemed that the dynamically expanding hard disk was preserved between import and export.)<br />
* Should we use a VirtualBox disk image or a VirtualBox appliance? It seems that Appliance is the best way to go.<br />
* Is it possible to run into problems like proxies or firewalls? If so, there should be a way to configure the Archive Box to work around these or allow configuration to bypass them.<br />
* Would it make sense to devise a protocol for archiving scripts to auto-report status back to. It would give us much better overview over how far our archiving effort has come. This would make sense. The question is, how would someone review this data and control the archiving.<br />
* Would we give the user a root account or a normal user account, to prevent any problems that may pop up (even though if the archiveteam-console package is designed right this issue could be avoided)<br />
<br />
== Who's working on what ==<br />
* jch: Writes everything, plans everything, etc.. Also, sets up the repository server.<br />
* Auguste: Prepares the actual distribution, currently working on the first version of it.<br />
* SketchCow: Watching us, making sure we know this is a bad idea. :P<br />
* BlueMax(ima): Corrects language, unofficial beta test, unofficial groupie<br />
* Underscor: Able to work on the archiveteam-console app, if people want him to. Python, bash, and ncurses junkie ;D<br />
<br />
{{Navigation box}}</div>Kazhttps://wiki.archiveteam.org/index.php?title=Twitch.tv&diff=25996Twitch.tv2016-07-20T19:17:59Z<p>Kaz: It's may HAVE, dammit.</p>
<hr />
<div>{{Infobox project<br />
| title = Twitch.tv<br />
| URL = http://twitch.tv<br />
| image = Twitch_homepage_screenshot.png<br />
| logo = Twitch_Logo.png<br />
| project_status = {{specialcase}} (archives of streams actively purged after an amount of time)<br />
| archiving_status = {{partiallysaved}} (popular videos only)<br />
| irc = burnthetwitch<br />
| source = [https://github.com/ArchiveTeam/twitchtv-discovery-grab Phase 1],[https://github.com/ArchiveTeam/twitchtv-grab Phase 2], [https://github.com/ArchiveTeam/twitchtv-items Items], [https://github.com/ArchiveTeam/twitchtv-items Index]<br />
| tracker = [http://tracker.archiveteam.org/twitchdisco/ Phase 1], [http://tracker.archiveteam.org/twitchtv/ Phase 2]<br />
}}<br />
<br />
Justin.tv—sorry, ''cough'', I mean to say—'''Twitch.tv''' is a live video streaming service.<br />
<br />
Twitch was rumored to have been acquired by [[YouTube]]/[[Google]] but [[Amazon]] was the final buyer.<ref>http://www.twitch.tv/p/thankyou</ref><br />
<br />
== Shutdown ==<br />
<br />
<blockquote><br />
<br />
<p>'''Changes To VODs On Twitch'''</p><br />
<p>Aug 06 2014 · Engineering, Tech</p><br />
<br />
<p>Our goal at Twitch is straightforward: deliver the highest quality video. This includes the ability to watch video on demand (VOD) on all of our platforms, not just the website.</p><br />
<br />
<p>In order to create a system that supports live and VOD across the globe and on multiple platforms, we need to make significant changes to the way we’re currently storing video. Today, we’d like to discuss what these changes are, why they’re necessary, and how they benefit the entire Twitch community now and in the future.</p><br />
<br />
<p>[...]</p><br />
<br />
<p>''Looking at Viewership Data''</p><br />
<br />
<p>We found that the vast majority of past broadcast views happen within the first two weeks after they’re created. On the days following, viewership reduces exponentially.</p><br />
<br />
<p><br />
We also discovered that 80% of our storage capacity is filled with past broadcasts that are never watched. That’s multiple petabytes for video that no one has ever viewed.</p><br />
<br />
<p>Highlights, on the other hand, have much more value and longevity. Over their lifetime, highlights get 9x as many views as past broadcasts.</p><br />
<br />
<p>[...]</p><br />
<br />
<p>As for existing past broadcasts, '''beginning three weeks from today, we will begin removing them from Twitch servers'''. If you would like to keep your past broadcasts, we encourage you to begin exporting or making highlights of your best moments so that they’re saved for posterity.</p><br />
<br />
<p>[...]<ref>http://blog.twitch.tv/2014/08/update-changes-to-vods-on-twitch/</ref></p><br />
<br />
</blockquote><br />
<br />
== Site structure ==<br />
<br />
* HTML page requests: http://secure.twitch.tv/swflibs/TwitchPlayer.swf?videoId=a387099879<br />
* Flash requests: https://api.twitch.tv/api/videos/a387099879?as3=t<br />
* You can just type it directly as well: http://www.twitch.tv/twitchplayspokemon/b/503249758 → https://api.twitch.tv/api/videos/a503249758?as3=t<br />
* There's also this: https://api.justin.tv/api/broadcast/by_archive/503249758.json?onsite=true<br />
* JSON file contains list of URLs to their FLV files.<br />
* Highlights: https://api.twitch.tv/api/videos/c2673085?as3=t (notice the start and end offsets)<br />
* http://www.twitchtools.com/video-download.php provides the above service<br />
* <code>youtube-dl -i</code> appears to do some of them<br />
* Scraping: https://api.twitch.tv/kraken/videos/top?limit=20&offset=0&period=all<br />
* Is there any irregularities? Differences between highlights and past broadcasts?<br />
<br />
=== Storage Issues ===<br />
<br />
* How to decide which are important? 10+ views again? Do a discovery crawl first?<br />
* Tahoe-LAFS? Grab ''all'' the videos into temp storage?<br />
* Compress all the unwatched videos into postage stamp sized videos?<br />
<br />
=== Discovery ===<br />
<br />
All items discovered are located at [https://github.com/ArchiveTeam/twitchtv-items twitchtv-items]. A collated [https://archive.org/details/twitchtv_scrape_dataset_2014-09-05 JSON dump is available.]<br />
<br />
== How can I help? ==<br />
<br />
Download and fire up your [[warrior]]! Then select Twitch Phase 2. Better yet, select Archive Team's Choice. <br />
<br />
Alternatively for advanced users, you can run the scripts manually see below.<br />
<br />
Don't forget to '''[https://archive.org/donate/ donate to the Internet Archive]''' who will be hosting these files. Disk space is cheap but maintaining them is not!<br />
<br />
=== For those not using the Warrior ===<br />
<br />
''Advanced User Quick Start''<br />
<br />
Please run these sysctl tweaks to optimize uploads:<br />
<br />
<pre><br />
# Add to /etc/sysctl.conf and run "sysctl -p"<br />
# increase TCP max buffer size settable using setsockopt()<br />
net.core.rmem_max = 16777216<br />
net.core.wmem_max = 16777216<br />
# increase Linux autotuning TCP buffer limit<br />
net.ipv4.tcp_rmem = 4096 87380 16777216<br />
net.ipv4.tcp_wmem = 4096 65536 16777216<br />
</pre><br />
<br />
You can also issue them without modifying /etc/sysctl.conf by running e.g. <pre>sysctl net.core.rmem_max=16777216 net.core.wmem_max=16777216</pre>, but be aware that those won't stick around across reboots.<br />
<br />
<pre>apt-get install git git-core libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev screen python-dev python-pip bzip2 zlib1g-dev</pre><br />
<br />
<pre>git clone https://github.com/ArchiveTeam/twitchtv-grab<br />
cd ./twitchtv-grab<br />
pip install seesaw<br />
./get-wget-lua.sh<br />
<br />
...<br />
<br />
pip install requests</pre><br />
<br />
<br />
wget-lua may have failed earlier, if so then:<br />
<br />
<pre>cd get-wget-lua.tmp<br />
mv src/wget ../wget-lua<br />
cd ..</pre><br />
<br />
And finally to actually run<br />
<br />
<pre>run-pipeline pipeline.py --concurrent 2 YOURNICKHERE --disable-web-server</pre><br />
<br />
For troubleshooting and the details please see [https://github.com/ArchiveTeam/twitchtv-grab/blob/master/README.md README].<br />
<br />
=== What we are saving ===<br />
<br />
Currently:<br />
<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/01_twitchplayspokemon.txt twitchplayspokemon]: "test" run, estimated 3 TB ($6000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/02_suggestions_19553_100views.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/01_suggestions_wiki-19553.txt these channels] with 100 or more views: estimated 23 TB ($46000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/03_suggestions_19553_100views_2.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/01_suggestions_wiki-19553.txt these channels] with 100 or more views: estimated 0.7 TB ($1400)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/04_suggestions_19616_100views.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/02_suggestions_wiki-19616.txt these channels] with 100 or more views: estimated 4 TB ($8000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/05_top_videos_10000views.txt These top videos] which have 10000 or more views: estimated 20 TB ($40000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/06_suggestions_19795_100views.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/03_suggestions_wiki-19795.txt these channels] with 100 or more views: estimated 8 TB ($16000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/07_socialblade_top_5000views.txt These videos] selected from SocialBlade's top Twitch channels with 5000 or more views: estimated 10 TB ($20000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/08_suggestions_19851_100views.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/05_suggestions_wiki-19851.txt these channels] with 100 or more views: estimated 15 TB ($30000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/09_top_vid_per_suggestion_19553-19851.txt These videos], from previous suggestions, that are most viewed per channel: estimated 0.1TB ($200)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/10_suggestions_19874_100views.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/06_suggestions_wiki-19874.txt these channels] with 100 or more views: estimated 7 TB ($14000)<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/items/video_pages/11_suggestions_19911_100views_and_top1.txt These videos] selected from [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_suggestions/07_suggestions_wiki-19911.txt these channels] with 100 or more views or most viewed video per channel: estimated 1.6 TB ($3200)<br />
<br />
Next:<br />
<br />
* Sorry, no more suggestions! More suggestions may be considered if you [https://archive.org/donate/ donate to the Internet Archive].<br />
<br />
Dollar figures shown to illustrate cost of permanent archives. These are not actual values but are meant to represent simplified values and act as a sane budget. Dollars in USD at $2000 per TB estimate (not per TB of disk space alone).<br />
<br />
Channels not included:<br />
<br />
* speeddemosarchivesda: already in IA<br />
* vinesauce: avoiding duplication, see below<br />
* [https://github.com/ArchiveTeam/twitchtv-items/blob/master/user_blacklists/user_blacklist.txt and others]<br />
<br />
Anything culturally significant to add? Comment on [[Talk:Twitch.tv]]. Don't forget to sign your comments with <code><nowiki>~~~~</nowiki></code><br />
.<br />
<br />
== Archives ==<br />
<br />
=== By Archive Team ===<br />
<br />
Archives will be made available later as [[WARC]] files in the [https://archive.org/details/archiveteam_twitchtv archiveteam_twitchtv] collection at the Internet Archive. You can access them by the Wayback Machine, but you'll need search an index to find the media files. <br />
<br />
A work-in-progress [http://archive.fart.website/archiveteam/twitchtv-index/html/ '''searchable index''' is now available!]<br />
<br />
=== Renegade Stream Archives ===<br />
<br />
These archives are made in a manual fashion through the efforts of streaming communities. Feel free to expand this list.<br />
<br />
* [[Twitch.tv/Vinesauce|Vinesauce Stream Archival Effort]] - A crowdsourced effort by fans of the Vinesauce Group to archive 1714 of their streams.<br />
** [http://vinesauce.com/vinetalk/index.php?topic=4321.msg81672 Vinesauce Forum link]<br />
* [http://archive.klaxa.eu Klaxa.eu's Archive of The 4chan Cup] - An existing, complete archive of The 4chan Cup, starting from the 2014 Autumn Games up till today.<br />
<br />
=== Other ===<br />
<br />
* https://archive.org/details/archiveteam_twitchtv_espesgrab<br />
<br />
== Butt Controversy ==<br />
In June 2016, Twitch deleted a bunch of custom emoticons on the grounds of obscenity. (See {{url|http://www.dailydot.com/esports/twitch-butt-emotes/}} for details.)<br />
<br />
The emotes can be found in the backend at: <nowiki>https://static-cdn.jtvnw.net/emoticons/v1/<id number>/<size></nowiki><br />
where id number ranges from 1 to 103667(as of 20160622), with no leading zeroes, and size is 1.0, 2.0 or 3.0.<br />
<br />
Note: sizes 0.5, 1.5, 2.5, 3.5, 4.0, 4.5 and 5.0 are valid as well, but return the same data for most(all?) emotes as the next highest available 'whole number' or the largest below that one, i.e. for 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, will match 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0 respectively)<br />
<br />
All emote graphics and sizes (but not their associated chat 'shortnames' i.e. "<3" for emote #9, which cannot be easily determined) still existing in the backend system up to emote id 103667 were archived through [[Archivebot]] [http://archive.fart.website/archivebot/viewer/job/b8fnv here]; the resulting WARCs can be downloaded through the viewer.<br />
<br />
== See Also == <br />
<br />
* [[Justin.tv]]<br />
<br />
== External links ==<br />
<br />
* [http://www.reddit.com/r/twitchplayspokemon/comments/2d2vvb/tpps_vods_will_be_saved_in_their_entirety/ "TPP'S VODS WILL BE SAVED IN THEIR ENTIRETY"]<br />
* [http://www.businessinsider.com/amazon-in-talks-to-buy-twitch-2014-8 "Amazon Might Buy Video Game Streaming Service Twitch Before Google Can"]<br />
* [http://www.reddit.com/r/twitchplayspokemon/comments/2fksq1/twitch_is_trying_to_restore_deleted_vods/ "Twitch is Trying to Restore Deleted VODs"]<br />
<br />
== References ==<br />
<br />
<references/><br />
<br />
{{navigation box}}<br />
<br />
[[Category:Video hosting]]</div>Kazhttps://wiki.archiveteam.org/index.php?title=Valhalla&diff=20098Valhalla2014-09-18T22:14:11Z<p>Kaz: </p>
<hr />
<div>This wiki page is a collection of ideas for Project '''Valhalla'''.<br />
<br />
<SketchCow> Basically, we have this situation where we have stuff that is being threatened,<br />
and it's huge, and then it's either not so threatened or it's in a weird quantum state.<br />
So, this really stretches the bounds of what IA does. It's a huge amount of data, it's not likely <br />
to be overly touched if the originals are up, and IA will spend/lose a lot of money pulling it into their infrastructure.<br />
So maybe we can discuss actual, not pie-in-the-sky possibilities of what we can do to have some sort of not-IA pile of storage.<br />
<br />
== Options ==<br />
* Tapes<br />
* [http://www.ollydbg.de/Paperbak/index.html PaperBack]<br />
* [http://ronja.twibright.com/optar/ Optar]<br />
* Blu-ray<br />
* [http://en.wikipedia.org/wiki/M-DISC M-DISC]<br />
* Flash media<br />
** Wears out quickly, not-so-good long term storage<br />
** Soliciting donations for old flash media from people, or sponsorship from flash companies?<br />
* Ink-based Consumer Optical Media (Blu-ray, DVD, etc.) <br />
** Differences between Blu-Ray and DVD? DVDs do not last very long.<br />
<br />
== Non-options ==<br />
* BitTorrent Sync<br />
** Proprietary (currently), so not a good idea to use as an archival format/platform<br />
* Amazon Glacier<br />
** Amazon Glacier seems like a a great idea, until you realize they mean 1 cent per gigabyte per month. This is $120 per terabyte per year. The transfer out of 100TB would also run over $10,000 the month its pulled from the system.</div>Kazhttps://wiki.archiveteam.org/index.php?title=TwitPic&diff=19996TwitPic2014-09-04T18:31:10Z<p>Kaz: irc chan</p>
<hr />
<div>{{Infobox project<br />
| title = TwitPic<br />
| image = Twitpic - Share photos on Twitter 1294869067903.png<br />
| description = TwitPic mainpage in 2011-01-12<br />
| URL = http://twitpic.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{notsavedyet}}<br />
| irc = quitpic<br />
}}<br />
<br />
'''TwitPic''' is an image hosting service. The service is designed mainly for Twitter users - the images uploaded on the service are given short URLs for usage in Twitter posts. Twitter carries a 140-character post limit, the average Twitpic URL is 25/26 characters long.<br />
<br />
On September 4, 2014 TwitPic [http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/ announced] they were shutting down.<br />
<br />
== Downloaders ==<br />
* [http://code.google.com/p/emijrp/source/browse/trunk/scrapers/twitpic.py Downloader by tag] (it saves the full resolution image and metadata: uploader, date and description)<br />
<br />
== External links ==<br />
* http://twitpic.com<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hosting services]]</div>Kaz