https://wiki.archiveteam.org/api.php?action=feedcontributions&user=PurpleSymphony&feedformat=atomArchiveteam - User contributions [en]2024-03-29T11:56:38ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Move_Archiveteam_to_Hackint&diff=45415Move Archiveteam to Hackint2020-08-23T08:35:42Z<p>PurpleSymphony: /* In Favor of Move to Hackint */ Add myself</p>
<hr />
<div>__NOTOC__<br />
<big>PROPOSED: MOVE ARCHIVETEAM IRC COMMUNICATION PRIMARILY TO HACKINT FROM EFNET</big><br />
<br />
Proposed by JAA a good long while ago, the proposal is to move Archive Team's IRC channels (and many project sub-channels) from EFNet to HackINT.<br />
<br />
As is typical, we're currently split between the two networks, with many channels in HackInt and many others in EFNet, depending on the preferences and inclinations of various members. Honestly, this can't continue. As most activity is happening in Hackint anyway, and because we might as well use this for a quorum discussion, this page exists for discussion (along with the talk page) *NOW* (Mid-August) to September 30th, at which point it will (hopefully) be very clear which direction we should go. This page is likely to get increased changes and traffic as time goes, so check back often.<br />
<br />
* Information on hackint is here: https://www.hackint.org/<br />
* Information on EFNet is here: http://www.efnet.org/<br />
<br />
= Arguments for moving Archive Team to Hackint =<br />
* IRC Services: No need to micromanage ops in each project channel, ease of administration<br />
* Stability: Handful of netsplits that lasted just a few minutes on Hackint compared to the ones on EFnet<br />
* Support: IRC staff has proven to be very helpful and useful and helped us get running with channel and user administration<br />
* Limits: Per-connection channel limits are much higher on Hackint and do not differ across servers (123 is the current joined-channels per connection limit and opers are willing to increase this network-wide limit in case we hit it)<br />
<br />
= Arguments for keeping Archive Team on EFnet =<br />
* IRC Services: Inherently leads to a certralisation of permissions vs current system.<br />
* It has Always Been EFNet, we shouldn't uproot our long-standing relationship and work with that network.<br />
* EFNet is the longest-lived Network, showing it's here to stay.<br />
* We should just engage with EFNet to make them more hospitable for Archive Team needs.<br />
<br />
== How Moving to HackInt would Work ==<br />
<br />
Jason says "There would almost certainly be a #archiveteam channel on EFNet forever, with some people sitting in related long-time channels like #archiveteam-bs and #archiveteam-ot - but project channels would shut down and move to HackInt. So we'd still have a split, but the channels on EFNet would be more like either social hangouts or represent outreach to guide people to the other location."<br />
<br />
== Signatories ==<br />
<br />
* This is not a vote; this is a show of support in one direction.<br />
<br />
Edit and add your name to one of the lists below if you have a strong opinion one way or another. Describe your thinking, if you'd like. Do not add others if they are not on the Wiki.<br />
<br />
=== In Favor of Move to Hackint ===<br />
* [[User:Kiska]]<br />
* [[User:wessel1512]]<br />
* [[User:Aoede]]<br />
* [[User:Fusl]]<br />
* [[User:Katocala]]<br />
* [[User:ivan]]<br />
* [[User:JAA]]<br />
* [[User:Flashfire42]]<br />
* [[User:Jake]]<br />
* [[User:Kaz]]<br />
* [[User:Maxfan8]]<br />
* [[User:Jrwr]]<br />
* [[User:Craigle]]<br />
* [[User:systwi]]<br />
* [[User:PurpleSymphony]]<br />
<br />
=== In Favor of Staying at EFNet ===</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=BBC_Mixital&diff=42989BBC Mixital2020-01-27T21:23:57Z<p>PurpleSymphony: Even more API.</p>
<hr />
<div>{{Infobox project<br />
| title = BBC Mixital<br />
| url = https://www.mixital.co.uk/<br />
| image = mixital.png<br />
| logo = mixital-logo.jpeg<br />
| project_status = {{closing}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''BBC Mixital''' was a site allowing users to create their own games, cartoons, music videos and stories. Often, but not always, these creations would be themed around a BBC television property (for example, the section for [https://www.mixital.co.uk/channel/doctor-who-fanfiction Dr Who fan-fiction]).<br />
<br />
In April 2019 the BBC [https://www.bbc.co.uk/blogs/internet/entries/dbb0e8f3-818a-47cf-875f-75025a3721cb announced] that Mixital was to be closed, citing "We’ve learned a number of valuable lessons that we’ll share internally and with partners, but we feel the time is right to take these lessons and explore how we might apply them in other ways online."<ref name="faqs">https://www.mixital.co.uk/faqs</ref><br />
<br />
From April 2019 the ability to publish new creations on the website has been removed. New users are also unable to create an account. THe content will remain online until April 2020 when it "will no longer be publicly available".<ref name="faqs" /><br />
<br />
== Discovery and API ==<br />
<br />
In the shutdown announcement, the BBC stated that Mixital contains "more than one million ... creations".<ref>https://www.bbc.co.uk/blogs/internet/entries/dbb0e8f3-818a-47cf-875f-75025a3721cb</ref><br />
<br />
* Search: https://www.mixital.co.uk/api/search/results?term=&term=&sort=newest&channelId=0&perPage=12&yawfMod=1&over16=1&page=1<br />
* Content: https://www.mixital.co.uk/dmk/digitalmake/play/731<br />
* View count: https://www.mixital.co.uk/api/digitalmake/view/731<br />
* Comments: https://www.mixital.co.uk/digitalmake/731/comments?page=1<br />
* Member’s “makes”: https://www.mixital.co.uk/fetchdom/public-published-makes?member=29239&displayPublic=true&page=1&sort=desc<br />
<br />
== Range of Content ==<br />
<br />
The type of content on Mixital varies wildly. Some items, such as [https://www.mixital.co.uk/digitalmake/ryfwrfl8jw fanfiction] will likely be fairly simple to archive. Other content, such as the [https://www.mixital.co.uk/digitalmake/5iuk9kpm1v dancing robots] will likely require effort to reverse engineer the JavaScript and requests used to make the creation "work".<br />
<br />
== References ==<br />
<references/></div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=BBC_Mixital&diff=42988BBC Mixital2020-01-27T21:17:55Z<p>PurpleSymphony: More API endpoints.</p>
<hr />
<div>{{Infobox project<br />
| title = BBC Mixital<br />
| url = https://www.mixital.co.uk/<br />
| image = mixital.png<br />
| logo = mixital-logo.jpeg<br />
| project_status = {{closing}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''BBC Mixital''' was a site allowing users to create their own games, cartoons, music videos and stories. Often, but not always, these creations would be themed around a BBC television property (for example, the section for [https://www.mixital.co.uk/channel/doctor-who-fanfiction Dr Who fan-fiction]).<br />
<br />
In April 2019 the BBC [https://www.bbc.co.uk/blogs/internet/entries/dbb0e8f3-818a-47cf-875f-75025a3721cb announced] that Mixital was to be closed, citing "We’ve learned a number of valuable lessons that we’ll share internally and with partners, but we feel the time is right to take these lessons and explore how we might apply them in other ways online."<ref name="faqs">https://www.mixital.co.uk/faqs</ref><br />
<br />
From April 2019 the ability to publish new creations on the website has been removed. New users are also unable to create an account. THe content will remain online until April 2020 when it "will no longer be publicly available".<ref name="faqs" /><br />
<br />
== Discovery and API ==<br />
<br />
In the shutdown announcement, the BBC stated that Mixital contains "more than one million ... creations".<ref>https://www.bbc.co.uk/blogs/internet/entries/dbb0e8f3-818a-47cf-875f-75025a3721cb</ref><br />
<br />
A likely avenue for discovery is the 'search' feature, which uses the following API, returning JSON results:<br />
<br />
https://www.mixital.co.uk/api/search/results?term=&term=&sort=newest&channelId=0&perPage=12&yawfMod=1&over16=1&page=1<br />
<br />
Content is embedded in an iFrame https://www.mixital.co.uk/dmk/digitalmake/play/731<br />
<br />
https://www.mixital.co.uk/api/digitalmake/view/731 seems to return the view count.<br />
<br />
== Range of Content ==<br />
<br />
The type of content on Mixital varies wildly. Some items, such as [https://www.mixital.co.uk/digitalmake/ryfwrfl8jw fanfiction] will likely be fairly simple to archive. Other content, such as the [https://www.mixital.co.uk/digitalmake/5iuk9kpm1v dancing robots] will likely require effort to reverse engineer the JavaScript and requests used to make the creation "work".<br />
<br />
== References ==<br />
<references/></div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Chromebot&diff=42865Chromebot2019-12-30T18:53:34Z<p>PurpleSymphony: /* People */ add devops scripts</p>
<hr />
<div>'''chromebot''' aka. '''crocoite''' is an [[IRC]] bot parallel to [[ArchiveBot]] that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. [[WARC]]s are uploaded twice a day to the [https://archive.org/details/archiveteam_chromebot?sort=-publicdate chromebot collection] on archive.org.<br />
<br />
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A [http://chromebot.6xq.net/ dashboard] is available for watching the progress of such jobs.<br />
<br />
== Usage ==<br />
[https://github.com/PromyLOPh/crocoite/blob/184189f0a535996edca01a68182ed07d32e26e9c/README.rst#IRC-bot crocoite usage documentation on GitHub]<br />
<br />
You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with [[ArchiveBot]]. Both “<code>chromebot</code>” and “<code>chromebot:</code>” work, with or without the colon.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Command !! Description<br />
|- <br />
| white-space: nowrap |<br />
<code>chromebot: a <url> -r <policy> -j <concurrency></code><br />
|| Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''.<br />
|-<br />
| <code>chromebot: s <uuid></code></code> || Get job status for ''<uuid>''.<br />
|-<br />
| <code>chromebot: r <uuid></code></code> || Revoke or abort running job with ''<uuid>''.<br />
|}<br />
<br />
Please note that the commands are case-sensitive.<br />
<br />
URL lists can be archived using recursion, for example:<br />
<br />
<code>chromebot: a https://transfer.notkiska.pw/inline/UpfR/HollyConrad-tweets -r 1 -j 4</code><br />
<br />
chromebot will assume all lines starting with http(s):// are valid links. Note that the list itself must be retured by the server as an *inline* document, not as a download (attachment).<br />
<br />
== Restrictions ==<br />
=== Instagram ===<br />
chromebot has been blacklisted by [[Instagram]]. When trying to archive any Instagram.com website, chromebot responds with the following error:<br />
''<Instagram.com URL> cannot be queued: Banned by Instagram''<br />
<br />
=== Cloudflare DDoS protection ===<br />
chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload ([https://github.com/PromyLOPh/crocoite/issues/13 issue #13 on GitHub]).<br />
<br />
== People ==<br />
<br />
[[User:PurpleSymphony|PurpleSym]] maintains [https://github.com/PromyLOPh/crocoite software], [https://github.com/PromyLOPh/chromebot scripts], pays the server bills and has administrative access. katocala is a server administrator.<br />
<br />
[[Category:Bots]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Chromebot&diff=42864Chromebot2019-12-29T10:45:44Z<p>PurpleSymphony: Reflect changes due to server move</p>
<hr />
<div>'''chromebot''' aka. '''crocoite''' is an [[IRC]] bot parallel to [[ArchiveBot]] that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. [[WARC]]s are uploaded twice a day to the [https://archive.org/details/archiveteam_chromebot?sort=-publicdate chromebot collection] on archive.org.<br />
<br />
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A [http://chromebot.6xq.net/ dashboard] is available for watching the progress of such jobs.<br />
<br />
== Usage ==<br />
[https://github.com/PromyLOPh/crocoite/blob/184189f0a535996edca01a68182ed07d32e26e9c/README.rst#IRC-bot crocoite usage documentation on GitHub]<br />
<br />
You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with [[ArchiveBot]]. Both “<code>chromebot</code>” and “<code>chromebot:</code>” work, with or without the colon.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Command !! Description<br />
|- <br />
| white-space: nowrap |<br />
<code>chromebot: a <url> -r <policy> -j <concurrency></code><br />
|| Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''.<br />
|-<br />
| <code>chromebot: s <uuid></code></code> || Get job status for ''<uuid>''.<br />
|-<br />
| <code>chromebot: r <uuid></code></code> || Revoke or abort running job with ''<uuid>''.<br />
|}<br />
<br />
Please note that the commands are case-sensitive.<br />
<br />
URL lists can be archived using recursion, for example:<br />
<br />
<code>chromebot: a https://transfer.notkiska.pw/inline/UpfR/HollyConrad-tweets -r 1 -j 4</code><br />
<br />
chromebot will assume all lines starting with http(s):// are valid links. Note that the list itself must be retured by the server as an *inline* document, not as a download (attachment).<br />
<br />
== Restrictions ==<br />
=== Instagram ===<br />
chromebot has been blacklisted by [[Instagram]]. When trying to archive any Instagram.com website, chromebot responds with the following error:<br />
''<Instagram.com URL> cannot be queued: Banned by Instagram''<br />
<br />
=== Cloudflare DDoS protection ===<br />
chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload ([https://github.com/PromyLOPh/crocoite/issues/13 issue #13 on GitHub]).<br />
<br />
== People ==<br />
<br />
[[User:PurpleSymphony|PurpleSym]] maintains [https://github.com/PromyLOPh/crocoite the software], pays the server bills and has administrative access. katocala is a server administrator.<br />
<br />
[[Category:Bots]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Oman/list&diff=38849ArchiveBot/Governments/Oman/list2019-05-23T11:45:39Z<p>PurpleSymphony: Add social media until mara.gov.om</p>
<hr />
<div>http://baitalbaranda.mm.gov.om/<br />
https://www.cbo.gov.om/<br />
https://cert.gov.om/<br />
https://www.cma.gov.om/<br />
http://www.dm.gov.om/<br />
https://eservices.housing.gov.om/<br />
https://www.fiu.gov.om/<br />
https://home.trc.gov.om/<br />
http://www.ita.gov.om/<br />
https://www.manpower.gov.om/<br />
https://www.mara.gov.om/<br />
http://www.mctmnet.gov.om/<br />
https://meca.gov.om/en/<br />
http://www.mhc.gov.om/<br />
http://www.mm.gov.om/<br />
http://www.mocioman.gov.om/<br />
http://www.mocs.gov.om/<br />
http://www.mod.gov.om/<br />
http://www.moe.gov.om/<br />
http://www.mof.gov.om/<br />
https://www.mofa.gov.om/<br />
http://www.mofw.gov.om/<br />
https://www.moh.gov.om/<br />
http://www.mohe.gov.om/<br />
http://www.moi.gov.om/<br />
https://www.moj.gov.om/<br />
http://www.mola.gov.om/<br />
https://mosa.gov.om/<br />
https://www.mosd.gov.om/<br />
https://www.motc.gov.om/<br />
https://www.mrmwr.gov.om/<br />
http://www.msm.gov.om/<br />
https://www.ncsi.gov.om/<br />
https://www.omantourism.gov.om/<br />
https://www.paca.gov.om/<br />
http://part.gov.om/<br />
http://www.rop.gov.om/<br />
http://www.sai.gov.om/<br />
http://www.shinas.gov.om/<br />
https://shura.om/<br />
https://twitter.com/OmanCERT<br />
https://www.facebook.com/cma.om<br />
https://twitter.com/cmaoman<br />
https://www.instagram.com/cmaoman/<br />
https://twitter.com/DhofarMun<br />
https://www.facebook.com/DhofarMunicipality<br />
https://www.youtube.com/user/DhofarMunicipality<br />
https://www.instagram.com/Dhofar_Municipality/<br />
https://twitter.com/housingomaninfo<br />
https://twitter.com/housingoman<br />
https://www.facebook.com/housingoman<br />
https://www.instagram.com/housingoman/<br />
https://www.youtube.com/user/housingoman<br />
https://www.youtube.com/user/eOmanita<br />
https://www.instagram.com/eoman_ita/<br />
https://twitter.com/eOman_ITA<br />
https://www.facebook.com/eOman.ita<br />
https://www.facebook.com/manpowergov.om<br />
https://twitter.com/manpowergov<br />
https://www.instagram.com/manpowergov/<br />
https://www.youtube.com/user/Omanlabour<br />
https://www.facebook.com/oman.mera<br />
https://www.instagram.com/meraoman/<br />
https://twitter.com/meraoman<br />
https://www.youtube.com/user/meraoman</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Chromebot&diff=38773Chromebot2019-05-22T12:58:36Z<p>PurpleSymphony: /* Usage */ URL lists</p>
<hr />
<div>'''chromebot''' aka. '''crocoite''' is an [[IRC]] bot parallel to [[ArchiveBot]] that uses Google Chrome and thus is able to archive JavaScript-heavy and bottomless websites. Both, [https://github.com/PromyLOPh/crocoite software] and bot, are maintained by [[User:PurpleSymphony]]. [[WARC]]s are uploaded daily to the [https://archive.org/details/archiveteam_chromebot?sort=-publicdate chromebot collection] on archive.org.<br />
<br />
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A [https://6xq.net/chromebot/ dashboard] is available for watching the progress of such jobs.<br />
<br />
== Usage ==<br />
[https://github.com/PromyLOPh/crocoite/blob/184189f0a535996edca01a68182ed07d32e26e9c/README.rst#IRC-bot crocoite usage documentation on GitHub]<br />
<br />
You can call ''chromebot'' on the {{IRC|archivebot}} IRC channel, which chromebot shares with [[ArchiveBot]]. Both “<code>chromebot</code>” and “<code>chromebot:</code>” work, with or without the colon.<br />
<br />
{| class="wikitable"<br />
|-<br />
! Command !! Description<br />
|- <br />
| white-space: nowrap |<br />
<code>chromebot: a <url> -r <policy> -j <concurrency></code><br />
|| Archive ''<url>'' with ''<concurrency>'' processes according to recursion ''<policy>''.<br />
|-<br />
| <code>chromebot: s <uuid></code></code> || Get job status for ''<uuid>''.<br />
|-<br />
| <code>chromebot: r <uuid></code></code> || Revoke or abort running job with ''<uuid>''.<br />
|}<br />
<br />
Please note that the commands are case-sensitive.<br />
<br />
URL lists can be archived using recursion, for example:<br />
<br />
<code>chromebot: a https://transfer.notkiska.pw/inline/UpfR/HollyConrad-tweets -r 1 -j 4</code><br />
<br />
chromebot will assume all lines starting with http(s):// are valid links. Note that the list itself must be retured by the server as an *inline* document, not as a download (attachment).<br />
<br />
== Restrictions ==<br />
=== Instagram ===<br />
chromebot has been blacklisted by [[Instagram]]. When trying to archive any Instagram.com website, chromebot responds with the following error:<br />
''<Instagram.com URL> cannot be queued: Banned by Instagram''<br />
<br />
=== Cloudflare DDoS protection ===<br />
chromebot should be able to circumvent Cloudflare's DDoS protection, but scrolling and other behaviour may be disabled after the reload ([https://github.com/PromyLOPh/crocoite/issues/13 issue #13 on GitHub]).<br />
<br />
[[Category:Bots]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Talk:Chromebot&diff=37946Talk:Chromebot2019-05-08T14:35:29Z<p>PurpleSymphony: /* Handling multiple infinite-scroll boxes? */</p>
<hr />
<div>== What exactly happens when ChromeBot tries to access Instagram's website? ==<br />
<br />
Does [[Chromebot#Instagram.com|Instagram]] just respond with a blank page, a 403 error, 404 error or something else? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 21:52, 26 April 2019 (UTC)<br />
<br />
== How well does it handle [http://m.Twitter.com/ Twitter Lite]? ==<br />
<br />
While Twitter's original desktop website still relies much on HTML source code, Twitter's Mobile page is a '''“Web ''App''”''', powered by [[Wikipedia:AJAX|AJAX]]. In addition, it causes serious compatibility problems with older versions of browsers (but Twitter redirects them to “Mobile Web (M2)”, their legacy mobile website, anyway).<br />The advantage of the AJAX-powered web-app is that allows for smoother browsing because thanks to AJAX, there is no need to reload the entire webpage. But the '''initial''' loading time takes obviously longer, because it needs to download more information into the RAM (if not already in browser cache).<br />
<br />
The downside of AJAX is obvious, especially for YouTube comments. Starting circa 2013, those did no longer load '''within''' the page itself (included into HTML source code). See [[YouTube#Comment loading]] for more information. AJAX has been a death sentence for the Wayback Machine, also for other websites.<br />
<br />Archive.is has partially been able to handle AJAX content, losing it's ability to capture YouTube comments since late 2017 (except for directly linked comments).<br />
<br />
<u><big>But now, there is our mighty '''ChromeBot'''. Thankfully.</U></big><br />
<br />
It is not very likely for Twitter to replace their legacy website (also known as “Twitter Web Client” in tweet source tags) with their new “App” style website (“Twitter Web App”, formerly “Twitter Lite”), but in case it actually happens, or in case it becomes the default and only users who are logged in are able to opt out, '''is ChromeBot prepared?''' …and will it support infinite scroll there too? <br />
<br />
It would be good if Twitter still gives users the choice about which platform to use. If Twitter enforced their AJAX-powered website onto all users, '''''[[ArchiveBot]],''''' (which is more mature and more suited for mass archivals of larger pages rather than [[ChromeBot]] for modern, JS-heavy pages), could be incapacitated.<br />
<br />
––[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 19:08, 30 April 2019 (UTC).<br />
<br />
== JS-Pagination ==<br />
<br />
Some websites that have multiple pages (e.g. Google Desktop website search results) work via URL's that can be put into a list and then fed into [[ArchiveBot]].<br />
<br />
Some websites aleady load the multiple pages into the RAM (via page source code) and accesses them via offline javascript, see the language tabs of {{URL|1=https://www.smart-projects.net/isobuster.php|2=this '''this''' site.}} These pages are entirely acccessible from [[Wayback]] captures and when saved offline.<br />
<br />
Some other websies (e.g. [[YouTube#Comment loading|YouTube]] comments and video lists in 2012, prior to bottomless infinite scrolling) did have pages that can not be accessed via URL (but YouTube had /all_comments?v= back then, which supported pages.).<br />
<br />
We need to find a way to archive website content in an automated way (manually via [[WARC]] recording is already possible) with content that can only be accessed via clicking (e.g. comment pages that get accessed via AJAX instead of URL). --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 19:39, 30 April 2019 (UTC)<br />
<br />
: chromebot clicks JS links on some pages, see [https://github.com/PromyLOPh/crocoite/blob/master/crocoite/data/click.yaml] --[[User:PurpleSymphony|PurpleSymphony]] ([[User talk:PurpleSymphony|talk]]) 14:31, 8 May 2019 (UTC)<br />
<br />
== “bajop-” job ID's? New naming system? ==<br />
<br />
* Yesterday (20190506), all job ID's started with “<code>bajop-muton-</code>”.<br />
* Today (20190507), all job ID's start with “<code>bajop-nanap-</code>”.<br />
<br />
Earlier, job ID's had just random ID's.<br />
<br />
Is there a technical explaination for the new job ID's? Is there a new naming system? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 12:39, 7 May 2019 (UTC)<br />
<br />
=== Vocals and consonants ===<br />
Job ID's no longer contain numbers.<br />
<br />
Example: In the Job ID “b<u>a</u>j<u>o</u>p-n<u>a</u>n<u>a</u>p-r<u>a</u>n<u>u</u>v-v<u>u</u>k<u>a</u>b” (archival of https://twitter.com/search?q=SonySketch ), letter 2 and 4 of all groups of 5 characters are vocals, the other 2 are consonants. Co-incidence or deliberate? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 12:49, 7 May 2019 (UTC)<br />
<br />
: Yes, new and yes, deliberate, see commit [https://github.com/PromyLOPh/crocoite/commit/0299acfb6edf7d54ed112834a2b639567f782ab4] --[[User:PurpleSymphony|PurpleSymphony]] ([[User talk:PurpleSymphony|talk]]) 14:33, 8 May 2019 (UTC)<br />
<br />
== Handling multiple infinite-scroll boxes? ==<br />
<br />
If the page has multiple embeedded infinite-scroll parts in {{W2+|iframe|iFrames,}} does ChromeBot also infinite-scroll crawl them or only the main page? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 14:05, 8 May 2019 (UTC)<br />
<br />
: Yes, it scolls all elements, including frames, see [https://github.com/PromyLOPh/crocoite/blob/master/crocoite/data/scroll.js#L26] --[[User:PurpleSymphony|PurpleSymphony]] ([[User talk:PurpleSymphony|talk]]) 14:35, 8 May 2019 (UTC)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Talk:Chromebot&diff=37944Talk:Chromebot2019-05-08T14:33:37Z<p>PurpleSymphony: /* “bajop-” job ID's? New naming system? */</p>
<hr />
<div>== What exactly happens when ChromeBot tries to access Instagram's website? ==<br />
<br />
Does [[Chromebot#Instagram.com|Instagram]] just respond with a blank page, a 403 error, 404 error or something else? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 21:52, 26 April 2019 (UTC)<br />
<br />
== How well does it handle [http://m.Twitter.com/ Twitter Lite]? ==<br />
<br />
While Twitter's original desktop website still relies much on HTML source code, Twitter's Mobile page is a '''“Web ''App''”''', powered by [[Wikipedia:AJAX|AJAX]]. In addition, it causes serious compatibility problems with older versions of browsers (but Twitter redirects them to “Mobile Web (M2)”, their legacy mobile website, anyway).<br />The advantage of the AJAX-powered web-app is that allows for smoother browsing because thanks to AJAX, there is no need to reload the entire webpage. But the '''initial''' loading time takes obviously longer, because it needs to download more information into the RAM (if not already in browser cache).<br />
<br />
The downside of AJAX is obvious, especially for YouTube comments. Starting circa 2013, those did no longer load '''within''' the page itself (included into HTML source code). See [[YouTube#Comment loading]] for more information. AJAX has been a death sentence for the Wayback Machine, also for other websites.<br />
<br />Archive.is has partially been able to handle AJAX content, losing it's ability to capture YouTube comments since late 2017 (except for directly linked comments).<br />
<br />
<u><big>But now, there is our mighty '''ChromeBot'''. Thankfully.</U></big><br />
<br />
It is not very likely for Twitter to replace their legacy website (also known as “Twitter Web Client” in tweet source tags) with their new “App” style website (“Twitter Web App”, formerly “Twitter Lite”), but in case it actually happens, or in case it becomes the default and only users who are logged in are able to opt out, '''is ChromeBot prepared?''' …and will it support infinite scroll there too? <br />
<br />
It would be good if Twitter still gives users the choice about which platform to use. If Twitter enforced their AJAX-powered website onto all users, '''''[[ArchiveBot]],''''' (which is more mature and more suited for mass archivals of larger pages rather than [[ChromeBot]] for modern, JS-heavy pages), could be incapacitated.<br />
<br />
––[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 19:08, 30 April 2019 (UTC).<br />
<br />
== JS-Pagination ==<br />
<br />
Some websites that have multiple pages (e.g. Google Desktop website search results) work via URL's that can be put into a list and then fed into [[ArchiveBot]].<br />
<br />
Some websites aleady load the multiple pages into the RAM (via page source code) and accesses them via offline javascript, see the language tabs of {{URL|1=https://www.smart-projects.net/isobuster.php|2=this '''this''' site.}} These pages are entirely acccessible from [[Wayback]] captures and when saved offline.<br />
<br />
Some other websies (e.g. [[YouTube#Comment loading|YouTube]] comments and video lists in 2012, prior to bottomless infinite scrolling) did have pages that can not be accessed via URL (but YouTube had /all_comments?v= back then, which supported pages.).<br />
<br />
We need to find a way to archive website content in an automated way (manually via [[WARC]] recording is already possible) with content that can only be accessed via clicking (e.g. comment pages that get accessed via AJAX instead of URL). --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 19:39, 30 April 2019 (UTC)<br />
<br />
: chromebot clicks JS links on some pages, see [https://github.com/PromyLOPh/crocoite/blob/master/crocoite/data/click.yaml] --[[User:PurpleSymphony|PurpleSymphony]] ([[User talk:PurpleSymphony|talk]]) 14:31, 8 May 2019 (UTC)<br />
<br />
== “bajop-” job ID's? New naming system? ==<br />
<br />
* Yesterday (20190506), all job ID's started with “<code>bajop-muton-</code>”.<br />
* Today (20190507), all job ID's start with “<code>bajop-nanap-</code>”.<br />
<br />
Earlier, job ID's had just random ID's.<br />
<br />
Is there a technical explaination for the new job ID's? Is there a new naming system? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 12:39, 7 May 2019 (UTC)<br />
<br />
=== Vocals and consonants ===<br />
Job ID's no longer contain numbers.<br />
<br />
Example: In the Job ID “b<u>a</u>j<u>o</u>p-n<u>a</u>n<u>a</u>p-r<u>a</u>n<u>u</u>v-v<u>u</u>k<u>a</u>b” (archival of https://twitter.com/search?q=SonySketch ), letter 2 and 4 of all groups of 5 characters are vocals, the other 2 are consonants. Co-incidence or deliberate? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 12:49, 7 May 2019 (UTC)<br />
<br />
: Yes, new and yes, deliberate, see commit [https://github.com/PromyLOPh/crocoite/commit/0299acfb6edf7d54ed112834a2b639567f782ab4] --[[User:PurpleSymphony|PurpleSymphony]] ([[User talk:PurpleSymphony|talk]]) 14:33, 8 May 2019 (UTC)<br />
<br />
== Handling multiple infinite-scroll boxes? ==<br />
<br />
If the page has multiple embeedded infinite-scroll parts in {{W2+|iframe|iFrames,}} does ChromeBot also infinite-scroll crawl them or only the main page? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 14:05, 8 May 2019 (UTC)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Talk:Chromebot&diff=37943Talk:Chromebot2019-05-08T14:31:58Z<p>PurpleSymphony: /* JS-Pagination */ Answer</p>
<hr />
<div>== What exactly happens when ChromeBot tries to access Instagram's website? ==<br />
<br />
Does [[Chromebot#Instagram.com|Instagram]] just respond with a blank page, a 403 error, 404 error or something else? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 21:52, 26 April 2019 (UTC)<br />
<br />
== How well does it handle [http://m.Twitter.com/ Twitter Lite]? ==<br />
<br />
While Twitter's original desktop website still relies much on HTML source code, Twitter's Mobile page is a '''“Web ''App''”''', powered by [[Wikipedia:AJAX|AJAX]]. In addition, it causes serious compatibility problems with older versions of browsers (but Twitter redirects them to “Mobile Web (M2)”, their legacy mobile website, anyway).<br />The advantage of the AJAX-powered web-app is that allows for smoother browsing because thanks to AJAX, there is no need to reload the entire webpage. But the '''initial''' loading time takes obviously longer, because it needs to download more information into the RAM (if not already in browser cache).<br />
<br />
The downside of AJAX is obvious, especially for YouTube comments. Starting circa 2013, those did no longer load '''within''' the page itself (included into HTML source code). See [[YouTube#Comment loading]] for more information. AJAX has been a death sentence for the Wayback Machine, also for other websites.<br />
<br />Archive.is has partially been able to handle AJAX content, losing it's ability to capture YouTube comments since late 2017 (except for directly linked comments).<br />
<br />
<u><big>But now, there is our mighty '''ChromeBot'''. Thankfully.</U></big><br />
<br />
It is not very likely for Twitter to replace their legacy website (also known as “Twitter Web Client” in tweet source tags) with their new “App” style website (“Twitter Web App”, formerly “Twitter Lite”), but in case it actually happens, or in case it becomes the default and only users who are logged in are able to opt out, '''is ChromeBot prepared?''' …and will it support infinite scroll there too? <br />
<br />
It would be good if Twitter still gives users the choice about which platform to use. If Twitter enforced their AJAX-powered website onto all users, '''''[[ArchiveBot]],''''' (which is more mature and more suited for mass archivals of larger pages rather than [[ChromeBot]] for modern, JS-heavy pages), could be incapacitated.<br />
<br />
––[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 19:08, 30 April 2019 (UTC).<br />
<br />
== JS-Pagination ==<br />
<br />
Some websites that have multiple pages (e.g. Google Desktop website search results) work via URL's that can be put into a list and then fed into [[ArchiveBot]].<br />
<br />
Some websites aleady load the multiple pages into the RAM (via page source code) and accesses them via offline javascript, see the language tabs of {{URL|1=https://www.smart-projects.net/isobuster.php|2=this '''this''' site.}} These pages are entirely acccessible from [[Wayback]] captures and when saved offline.<br />
<br />
Some other websies (e.g. [[YouTube#Comment loading|YouTube]] comments and video lists in 2012, prior to bottomless infinite scrolling) did have pages that can not be accessed via URL (but YouTube had /all_comments?v= back then, which supported pages.).<br />
<br />
We need to find a way to archive website content in an automated way (manually via [[WARC]] recording is already possible) with content that can only be accessed via clicking (e.g. comment pages that get accessed via AJAX instead of URL). --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 19:39, 30 April 2019 (UTC)<br />
<br />
: chromebot clicks JS links on some pages, see [https://github.com/PromyLOPh/crocoite/blob/master/crocoite/data/click.yaml] --[[User:PurpleSymphony|PurpleSymphony]] ([[User talk:PurpleSymphony|talk]]) 14:31, 8 May 2019 (UTC)<br />
<br />
== “bajop-” job ID's? New naming system? ==<br />
<br />
* Yesterday (20190506), all job ID's started with “<code>bajop-muton-</code>”.<br />
* Today (20190507), all job ID's start with “<code>bajop-nanap-</code>”.<br />
<br />
Earlier, job ID's had just random ID's.<br />
<br />
Is there a technical explaination for the new job ID's? Is there a new naming system? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 12:39, 7 May 2019 (UTC)<br />
<br />
=== Vocals and consonants ===<br />
Job ID's no longer contain numbers.<br />
<br />
Example: In the Job ID “b<u>a</u>j<u>o</u>p-n<u>a</u>n<u>a</u>p-r<u>a</u>n<u>u</u>v-v<u>u</u>k<u>a</u>b” (archival of https://twitter.com/search?q=SonySketch ), letter 2 and 4 of all groups of 5 characters are vocals, the other 2 are consonants. Co-incidence or deliberate? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 12:49, 7 May 2019 (UTC)<br />
<br />
== Handling multiple infinite-scroll boxes? ==<br />
<br />
If the page has multiple embeedded infinite-scroll parts in {{W2+|iframe|iFrames,}} does ChromeBot also infinite-scroll crawl them or only the main page? --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 14:05, 8 May 2019 (UTC)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Sketch&diff=37363Sketch2019-04-27T17:26:58Z<p>PurpleSymphony: Screenshot</p>
<hr />
<div>{{Infobox project<br />
| title = Sketch<br />
| image = Sketch_2019-04-27.png<br />
| description = Explore Sketch<br />
| URL = https://sketch.sonymobile.com<br />
| project_status = {{online}}<br />
| irc = archiveteam-bs<br />
}}<br />
<br />
'''Sketch''' is a image drawing and editing software by Sony. The online parts of Sketch will be discontinued on 2019-09-30<ref>[https://sketch.sonymobile.com/eol EOL notice]</ref><br />
<br />
== API ==<br />
<br />
The following endpoints can be used to list sketches:<br />
<br />
* https://sketch.sonymobile.com/api/1/feed/featured/list/default/<br />
* https://sketch.sonymobile.com/api/1/feed/trending/list/default/<br />
* https://sketch.sonymobile.com/api/1/feed/popular/list/default/<br />
* https://sketch.sonymobile.com/api/1/feed/global/list/default/<br />
<br />
The last one seems to list all (23 million) available sketches.<br />
<br />
* User search (limited to 50 results, no pagination): https://sketch.sonymobile.com/api/1/search/artist/<term><br />
* Tag search: https://sketch.sonymobile.com/api/1/search/tag/<term><br />
* Tags: https://sketch.sonymobile.com/api/1/feed/hashtag/cat/list/<br />
* Single sketch: https://sketch.sonymobile.com/api/1/sharedsketch/be39ea26-ebf6-4dfb-84d5-d9e122d3191e<br />
* Artist (can include multiple uuids, separated by ,): https://sketch.sonymobile.com/api/1/artist/d51e31f9-6aa1-474b-9e0e-3357e9bc2b9c<br />
* Artist’s pictures: https://sketch.sonymobile.com/api/1/feed/artist/92b9afe4-3d4c-48d0-839d-a9fedba8a38a<br />
* Comments: https://sketch.sonymobile.com/api/1/comments/sketch/92b9afe4-3d4c-48d0-839d-a9fedba8a38a/be39ea26-ebf6-4dfb-84d5-d9e122d3191e<br />
* Single image file, redirects to temporary S3 URL: https://storage.sketch.sonymobile.com/feed/6362d4d6-b91a-4603-8062-d59f3629e11d/image<br />
<br />
<br />
== References ==<br />
<br />
<references/></div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=File:Sketch_2019-04-27.png&diff=37362File:Sketch 2019-04-27.png2019-04-27T17:25:44Z<p>PurpleSymphony: sketch.sonymobile.com explore page</p>
<hr />
<div>== Summary ==<br />
sketch.sonymobile.com explore page</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Sketch&diff=37361Sketch2019-04-27T17:23:44Z<p>PurpleSymphony: Create, API info</p>
<hr />
<div>{{Infobox project<br />
| title = Sketch<br />
| image = <br />
| description = <br />
| URL = https://sketch.sonymobile.com<br />
| project_status = {{online}}<br />
| irc = archiveteam-bs<br />
}}<br />
<br />
'''Sketch''' is a image drawing and editing software by Sony. The online parts of Sketch will be discontinued on 2019-09-30<ref>[https://sketch.sonymobile.com/eol EOL notice]</ref><br />
<br />
== API ==<br />
<br />
The following endpoints can be used to list sketches:<br />
<br />
* https://sketch.sonymobile.com/api/1/feed/featured/list/default/<br />
* https://sketch.sonymobile.com/api/1/feed/trending/list/default/<br />
* https://sketch.sonymobile.com/api/1/feed/popular/list/default/<br />
* https://sketch.sonymobile.com/api/1/feed/global/list/default/<br />
<br />
The last one seems to list all (23 million) available sketches.<br />
<br />
* User search (limited to 50 results, no pagination): https://sketch.sonymobile.com/api/1/search/artist/<term><br />
* Tag search: https://sketch.sonymobile.com/api/1/search/tag/<term><br />
* Tags: https://sketch.sonymobile.com/api/1/feed/hashtag/cat/list/<br />
* Single sketch: https://sketch.sonymobile.com/api/1/sharedsketch/be39ea26-ebf6-4dfb-84d5-d9e122d3191e<br />
* Artist (can include multiple uuids, separated by ,): https://sketch.sonymobile.com/api/1/artist/d51e31f9-6aa1-474b-9e0e-3357e9bc2b9c<br />
* Artist’s pictures: https://sketch.sonymobile.com/api/1/feed/artist/92b9afe4-3d4c-48d0-839d-a9fedba8a38a<br />
* Comments: https://sketch.sonymobile.com/api/1/comments/sketch/92b9afe4-3d4c-48d0-839d-a9fedba8a38a/be39ea26-ebf6-4dfb-84d5-d9e122d3191e<br />
* Single image file, redirects to temporary S3 URL: https://storage.sketch.sonymobile.com/feed/6362d4d6-b91a-4603-8062-d59f3629e11d/image<br />
<br />
<br />
== References ==<br />
<br />
<references/></div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot&diff=36294ArchiveBot2019-04-11T08:43:26Z<p>PurpleSymphony: /* Caveats */ low memory recovery</p>
<hr />
<div>[[File:Librarianmotoko.jpg|200px|right|thumb|Imagine Motoko Kusanagi as an archivist.]]<br />
<br />
'''ArchiveBot''' is an [[IRC]] bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, [[Wget_with_WARC_output|records it in a WARC]] file, and then uploads that WARC to ArchiveTeam servers for eventual injection into the [https://archive.org/details/archivebot Internet Archive]'s Wayback Machine (or other archive sites).<br />
<br />
== Details ==<br />
<br />
To use ArchiveBot, drop by the IRC channel [http://chat.efnet.org:9090/?nick=&channels=%23archivebot&Login=Login '''#archivebot'''] on EFNet. To interact with ArchiveBot, you issue [https://archivebot.readthedocs.io/en/latest/commands.html '''commands'''] by typing them into the channel. Note that you will need channel operator (<code>@</code>) or voice (<code>+</code>) permissions in order to issue archiving jobs; please ask for assistance or leave a message describing the website you want to archive.<br />
<br />
The [http://dashboard.at.ninjawedding.org/3 '''ArchiveBot dashboard'''] publicly shows the sites being currently downloaded. The [http://archivebot.at.ninjawedding.org:4567/pipelines pipeline monitor station] shows the status of deployed instances of crawlers. The [http://archive.fart.website/archivebot/viewer/ viewer] assists in browsing and searching archives.<br />
<br />
You can also follow [https://twitter.com/ArchiveBot @ArchiveBot]<ref>Formerly known as [https://twitter.com/ATArchiveBot @ATArchiveBot]</ref> on [[Twitter]], although its tweets may slightly lag behind the current status of the bot. The bot has not tweeted since 12 April 2018.<br />
<br />
== Components ==<br />
<br />
IRC interface<br />
:The bot listens for commands in the IRC channel and then reports back status on the IRC channel. You can ask it to archive a whole website or single webpage, check whether the URL has been saved, change the delay time between requests, or add some ignore rules to avoid crawling certain web cruft. This IRC interface is collaborative, meaning anyone with permission can adjust the parameter of jobs. Note that the bot isn't a chat bot so it will ignore you if it doesn't understand a command.<br />
<br />
Dashboard<br />
:The [http://dashboard.at.ninjawedding.org/3 '''ArchiveBot dashboard'''] is a web-based front-end displaying the URLs being downloaded by the various web crawls. Each URL line in the dashboard is categorized by its HTTP code into successes, warnings, and errors. It will be highlighted in yellow or red. the dashboard also provides RSS feeds.<br />
<br />
Backend<br />
:The backend contains the database of all jobs and several maintenance tasks such as trimming logs and posting Tweets on Twitter. The backend is the centralized portion of ArchiveBot.<br />
<br />
Crawler<br />
:The crawler will download and spider the website into WARC files. The crawler is the distributed portion of ArchiveBot. Volunteers run pipeline nodes connected to the backend. The backend will tell the nodes/pipelines what jobs to run. Once the crawl job has finished, the pipeline reports back to the backend and uploads the WARC files to the staging server. This process is handled by a supervisor script called a pipeline.<br />
<br />
Staging server<br />
:The staging server, known as [[Fortress of Solitude|Fortress of Solitude (FOS)]], is the place where all the WARC files are temporarily uploaded. Once the current batch has been approved, the files will be uploaded to the Internet Archive for consumption by the Wayback Machine.<br />
<br />
ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot. [[Dev|Contributions welcomed]]! Any issues or feature requests may be filed at [https://github.com/ArchiveTeam/ArchiveBot/issues the issue tracker].<br />
<br />
== People ==<br />
<br />
The main server that controls the IRC bot, pipeline manager backend, and web dashboard is operated by [[User:yipdw|yipdw]], although a few other ArchiveTeam members were given SSH access in late 2017. The staging server [[Fortress of Solitude|Fortress of Solitude (FOS)]], where the data sits for final checks before being moved over to the Internet Archive serves, is operated by [[User:jscott|SketchCow]]. The pipelines are operated by various volunteers around the world. Each pipeline typically runs two or three web crawl jobs at any given time.<br />
<br />
== Volunteer to run a Pipeline ==<br />
As of November 2017, ArchiveBot has again started accepting applications from volunteers who want to set up new pipelines. You'll need to have a machine with:<br />
<br />
* lots of disk space (40 GB minimum / 200 GB recommended / 500 GB atypical)<br />
* 512 MB RAM (2 GB recommended, 2 GB swap recommended)<br />
* 10 mbps upload/download speeds (100 mbps recommended)<br />
* long-term availability (2 months minimum)<br />
* always-on unrestricted internet access (absolutely no firewall/proxies/censorship/ISP-injected-ads/DNS-redirection/free-cafe-wifi)<br />
<br />
Suggestion: the $40/month Digital Ocean droplets (4 GB memory/2 CPU/60 GB hard drive) running Ubuntu work pretty well.<br />
<br />
If you have a suitable server available and would like to volunteer, please review the [https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline Pipeline Install] instructions. Then contact ArchiveTeam members [[User:Asparagirl|Asparagirl]], [[User:astrid|astrid]], [[User:JustAnotherArchivist|JAA]], [[User:yipdw|yipdw]], or other ArchiveTeam members hanging out in #archivebot, and we can hook you up, adding your machine to the list of approved pipelines, so that it will start processing incoming ArchiveBot jobs.<br />
<br />
=== Caveats ===<br />
As of August 2018, there are a few things you need to be aware of when operating an ArchiveBot pipeline:<br />
<br />
* '''Never, ever press ^C on the pipeline.''' Use <code>touch STOP</code> in the <code>ArchiveBot/pipeline</code> directory instead to stop the pipeline.<br />
* Please give access to the pipeline for maintenance work when you're away (e.g. holidays, busy IRL) to someone who's around frequently. This is to avoid situations where jobs or pipelines are stuck for weeks or months without anyone being able to intervene.<br />
* Jobs that crash with an error need to be killed manually using <code>kill -9</code>.<br />
* The log files of jobs that are aborted or crash are not uploaded to the Internet Archive. Please keep the temporary <code>tmp-wpull-*.log.gz</code> files in the pipeline directory, rename them so the filename follows the same format as the JSON file (with extension <code>.log.gz</code> instead of <code>.json</code>), and upload them to FOS manually.<br />
** You can find the job ID for these files in the second line.<br />
** Finding the correct filename can be a bit tricky. You can use the viewer or the [https://github.com/JustAnotherArchivist/archivebot-archives archivebot-archives] repository. Keep in mind that the timestamp in the filename should approximately match the one at the beginning of the log file, though there is usually a difference between the two of at least a few seconds (the log file timestamps being later than the filename timestamp).<br />
** Be careful with the filename if there were multiple jobs for the same URL (i.e. the same job ID).<br />
** Here is a public gist on GitHub explaining step by step how to find the proper log file for your crashed or killed job, how to properly rename it, and how to rsync it up to FOS: [[https://gist.github.com/Asparagirl/155bd3c8ee4b8ad5ed737e45bcad1a5a]]<br />
** Contact [[User:JustAnotherArchivist]] if you need help with this.<br />
* Due to a bug somewhere deep in the network stack, connections get stuck from time to time. This causes jobs to slow down or halt entirely.<br />
** As a workaround, you can use the [https://github.com/JustAnotherArchivist/kill-wpull-connections kill-wpull-connections] script; it requires pgrep, lsof, and gdb. Depending on the machine configuration (specifically, the value of <code>kernel.yama.ptrace_scope</code> in <code>/proc/sys/kernel/yama/ptrace_scope</code>), it may also require root/sudo privileges.<br />
** In very rare cases, you may need to use [http://killcx.sourceforge.net/ killcx] to close the connections.<br />
** [https://github.com/kristrev/tcp_closer tcp_closer] works even when the two methods above fail. It uses the SOCK_DESTROY kernel operation provided by Linux >= 4.5.<br />
* Also due to a bug suspected to be in the network stack, wpull processes sometimes use a lot of RAM (and CPU). If a process uses more than 300 MB continuously, that's likely the case. kill-wpull-connections seems to "fix" this issue, though it takes a while (minutes, rarely even an hour or more) from running the script until the usage actually drops down.<br />
** If wpull paused due to high RAM usage try creating a swap file and forcing RAM pages to swap. wpull only checks RAM usage.<br />
<pre>dd if=/dev/zero of=swapfile bs=1024 count=1024000<br />
mkswap swapfile<br />
swapon swapfile<br />
perl -e '$tmp = "a" x 999999999'<br />
swapoff swapfile<br />
rm swapfile</pre><br />
* Make sure that you don't have any <code>search</code> or <code>domain</code> line in <code>/etc/resolv.conf</code>. We've grabbed a number of copies of the websites of OVH and Online.net as a result of such lines and broken <code>http://www/</code> links... (Cf [https://github.com/ArchiveTeam/ArchiveBot/issues/318 this issue on GitHub])<br />
<br />
== Installation ==<br />
<br />
Installing the ArchiveBot can be difficult. The [https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline Pipeline Install] instructions are online, but are tricky.<br />
<br />
But there is a [https://github.com/ArchiveTeam/ArchiveBot/blob/master/.travis.yml Travis.yml automated install script] for [https://travis-ci.org/ArchiveTeam/ArchiveBot Travis-cl] that is designed to test the ArchiveBot. <br />
<br />
Since it's good enough for testing... it's good enough for installation, right? There must be a way to convert it into an installer script.<br />
<br />
== Disclaimers ==<br />
<br />
# Everything is provided on a best-effort basis; nothing is guaranteed to work. (We're volunteers, not a support team.)<br />
# We can decide to stop a job or ban a user if a job is deemed unnecessary. (We don't want to run up operator bandwidth bills and waste Internet Archive donations on costs.)<br />
# We're not Internet Archive. (We do what we want.)<br />
# We're not the Wayback Machine. Specifically, we are not <code>ia_archiver</code> or <code>archive.org_bot</code>. (We don't run crawlers on behalf of other crawlers.)<br />
<br />
Occasionally, we had to ban blocks of IP addresses from the channel. If you think a ban does not apply to you but cannot join the #archivebot channel, please join the main #archiveteam channel instead.<br />
<br />
== Bad Behavior ==<br />
<br />
If you are a website operator and you notice ArchiveBot misbehaving, please contact us on #archivebot or #archiveteam on EFnet (see top of page for links).<br />
<br />
ArchiveBot understands [[robots.txt]] (please read the article) but does not match any directives. It uses it for discovering more links such as sitemaps however.<br />
<br />
Also, please remember that '''we are not the [[Internet Archive|Internet Archive]]'''.<br />
<br />
== More ==<br />
<br />
Like ArchiveBot? Check out our [[Main_Page|homepage]] and other [[projects]]!<br />
<br />
== Notes ==<br />
<br />
<references/><br />
<br />
{{archivebot}}<br />
{{navigation_box}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Yemen/list&diff=35419ArchiveBot/Governments/Yemen/list2019-03-20T09:53:50Z<p>PurpleSymphony: </p>
<hr />
<div>http://www.cama.gov.ye/<br />
http://centralbank.gov.ye/<br />
http://www.coca.gov.ye/<br />
http://www.cra.gov.ye/<br />
http://www.customs.gov.ye/<br />
https://www.facebook.com/civlreg1<br />
https://www.facebook.com/cocayem/<br />
https://www.facebook.com/iyygover<br />
https://www.facebook.com/ltaayemen/<br />
https://www.facebook.com/MOIT.News/<br />
https://www.facebook.com/motyemen1/<br />
https://www.facebook.com/YemenTourismPB<br />
http://www.fiu.gov.ye/<br />
http://ghosanaa.gov.ye/<br />
https://www.htb.gov.ye/<br />
https://www.instagram.com/yementourismpb<br />
http://www.ipna.gov.ye/<br />
http://www.ltaa.gov.ye/<br />
http://www.mofa.gov.ye/<br />
http://www.moh.gov.ye/<br />
http://moit.gov.ye/<br />
http://moj.gov.ye/<br />
https://www.mom.gov.ye/<br />
https://www.mot.gov.ye/<br />
http://www.mtevt.gov.ye/<br />
http://mtevt.gov.ye/<br />
http://www.mwe.gov.ye/<br />
https://plus.google.com/+Yementourism<br />
http://www.ptc.gov.ye/<br />
http://saba.ye/<br />
http://sbdma.gov.ye/<br />
http://www.sfp.gov.ye/<br />
http://www.smc.gov.ye/<br />
http://srs-mohe.gov.ye/<br />
https://www.tax.gov.ye/<br />
https://twitter.com/DYXOTE3bcvoUGQy<br />
https://twitter.com/ipnagovye<br />
https://twitter.com/MOITNewsYE<br />
https://twitter.com/MOTyemen1<br />
https://www.twitter.com/YemenTourismPB<br />
https://twitter.com/ypyemen1<br />
http://www.uajnas.gov.ye/<br />
http://www.yemen.gov.ye/<br />
http://yemenparliament.gov.ye/<br />
http://yementourism.com/<br />
http://yiig.gov.ye/<br />
http://www.yipo.gov.ye/<br />
https://www.youtube.com/channel/UC-I7EaYciN4S3kFDIUHz52g<br />
https://www.youtube.com/channel/UC68y3ARh5lvVxeUZ7rYSLJg/videos?view=57&flow=grid<br />
https://www.youtube.com/channel/UC8bvY_P-ysT7kbmUwZWPd0Q<br />
https://www.youtube.com/channel/UCEq92x5ko3qHpltw_KufwNg<br />
https://www.youtube.com/channel/UCG-ynm41IN1ygWFCbVIWBUQ<br />
https://www.youtube.com/user/MOTYemen<br />
http://www.yrsgisc.gov.ye/<br />
http://agricultureyemen.com/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot&diff=35389ArchiveBot2019-03-19T09:48:26Z<p>PurpleSymphony: /* Volunteer to run a Pipeline */ tcp_closer</p>
<hr />
<div>[[File:Librarianmotoko.jpg|200px|right|thumb|Imagine Motoko Kusanagi as an archivist.]]<br />
<br />
'''ArchiveBot''' is an [[IRC]] bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, [[Wget_with_WARC_output|records it in a WARC]] file, and then uploads that WARC to ArchiveTeam servers for eventual injection into the [https://archive.org/details/archivebot Internet Archive]'s Wayback Machine (or other archive sites).<br />
<br />
== Details ==<br />
<br />
To use ArchiveBot, drop by the IRC channel [http://chat.efnet.org:9090/?nick=&channels=%23archivebot&Login=Login '''#archivebot'''] on EFNet. To interact with ArchiveBot, you issue [https://archivebot.readthedocs.io/en/latest/commands.html '''commands'''] by typing them into the channel. Note that you will need channel operator (<code>@</code>) or voice (<code>+</code>) permissions in order to issue archiving jobs; please ask for assistance or leave a message describing the website you want to archive.<br />
<br />
The [http://dashboard.at.ninjawedding.org/3 '''ArchiveBot dashboard'''] publicly shows the sites being currently downloaded. The [http://archivebot.at.ninjawedding.org:4567/pipelines pipeline monitor station] shows the status of deployed instances of crawlers. The [http://archive.fart.website/archivebot/viewer/ viewer] assists in browsing and searching archives.<br />
<br />
You can also follow [https://twitter.com/ArchiveBot @ArchiveBot]<ref>Formerly known as [https://twitter.com/ATArchiveBot @ATArchiveBot]</ref> on [[Twitter]], although its tweets may slightly lag behind the current status of the bot. The bot has not tweeted since 12 April 2018.<br />
<br />
== Components ==<br />
<br />
IRC interface<br />
:The bot listens for commands in the IRC channel and then reports back status on the IRC channel. You can ask it to archive a whole website or single webpage, check whether the URL has been saved, change the delay time between requests, or add some ignore rules to avoid crawling certain web cruft. This IRC interface is collaborative, meaning anyone with permission can adjust the parameter of jobs. Note that the bot isn't a chat bot so it will ignore you if it doesn't understand a command.<br />
<br />
Dashboard<br />
:The [http://dashboard.at.ninjawedding.org/3 '''ArchiveBot dashboard'''] is a web-based front-end displaying the URLs being downloaded by the various web crawls. Each URL line in the dashboard is categorized by its HTTP code into successes, warnings, and errors. It will be highlighted in yellow or red. the dashboard also provides RSS feeds.<br />
<br />
Backend<br />
:The backend contains the database of all jobs and several maintenance tasks such as trimming logs and posting Tweets on Twitter. The backend is the centralized portion of ArchiveBot.<br />
<br />
Crawler<br />
:The crawler will download and spider the website into WARC files. The crawler is the distributed portion of ArchiveBot. Volunteers run pipeline nodes connected to the backend. The backend will tell the nodes/pipelines what jobs to run. Once the crawl job has finished, the pipeline reports back to the backend and uploads the WARC files to the staging server. This process is handled by a supervisor script called a pipeline.<br />
<br />
Staging server<br />
:The staging server, known as [[Fortress of Solitude|Fortress of Solitude (FOS)]], is the place where all the WARC files are temporarily uploaded. Once the current batch has been approved, the files will be uploaded to the Internet Archive for consumption by the Wayback Machine.<br />
<br />
ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot. [[Dev|Contributions welcomed]]! Any issues or feature requests may be filed at [https://github.com/ArchiveTeam/ArchiveBot/issues the issue tracker].<br />
<br />
== People ==<br />
<br />
The main server that controls the IRC bot, pipeline manager backend, and web dashboard is operated by [[User:yipdw|yipdw]], although a few other ArchiveTeam members were given SSH access in late 2017. The staging server [[Fortress of Solitude|Fortress of Solitude (FOS)]], where the data sits for final checks before being moved over to the Internet Archive serves, is operated by [[User:jscott|SketchCow]]. The pipelines are operated by various volunteers around the world. Each pipeline typically runs two or three web crawl jobs at any given time.<br />
<br />
== Volunteer to run a Pipeline ==<br />
As of November 2017, ArchiveBot has again started accepting applications from volunteers who want to set up new pipelines. You'll need to have a machine with:<br />
<br />
* lots of disk space (40 GB minimum / 200 GB recommended / 500 GB atypical)<br />
* 512 MB RAM (2 GB recommended, 2 GB swap recommended)<br />
* 10 mbps upload/download speeds (100 mbps recommended)<br />
* long-term availability (2 months minimum)<br />
* always-on unrestricted internet access (absolutely no firewall/proxies/censorship/ISP-injected-ads/DNS-redirection/free-cafe-wifi)<br />
<br />
Suggestion: the $40/month Digital Ocean droplets (4 GB memory/2 CPU/60 GB hard drive) running Ubuntu work pretty well.<br />
<br />
If you have a suitable server available and would like to volunteer, please review the [https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline Pipeline Install] instructions. Then contact ArchiveTeam members [[User:Asparagirl|Asparagirl]], [[User:astrid|astrid]], [[User:JustAnotherArchivist|JAA]], [[User:yipdw|yipdw]], or other ArchiveTeam members hanging out in #archivebot, and we can hook you up, adding your machine to the list of approved pipelines, so that it will start processing incoming ArchiveBot jobs.<br />
<br />
=== Caveats ===<br />
As of August 2018, there are a few things you need to be aware of when operating an ArchiveBot pipeline:<br />
<br />
* '''Never, ever press ^C on the pipeline.''' Use <code>touch STOP</code> in the <code>ArchiveBot/pipeline</code> directory instead to stop the pipeline.<br />
* Please give access to the pipeline for maintenance work when you're away (e.g. holidays, busy IRL) to someone who's around frequently. This is to avoid situations where jobs or pipelines are stuck for weeks or months without anyone being able to intervene.<br />
* Jobs that crash with an error need to be killed manually using <code>kill -9</code>.<br />
* The log files of jobs that are aborted or crash are not uploaded to the Internet Archive. Please keep the temporary <code>tmp-wpull-*.log.gz</code> files in the pipeline directory, rename them so the filename follows the same format as the JSON file (with extension <code>.log.gz</code> instead of <code>.json</code>), and upload them to FOS manually.<br />
** You can find the job ID for these files in the second line.<br />
** Finding the correct filename can be a bit tricky. You can use the viewer or the [https://github.com/JustAnotherArchivist/archivebot-archives archivebot-archives] repository. Keep in mind that the timestamp in the filename should approximately match the one at the beginning of the log file, though there is usually a difference between the two of at least a few seconds (the log file timestamps being later than the filename timestamp).<br />
** Be careful with the filename if there were multiple jobs for the same URL (i.e. the same job ID).<br />
** Here is a public gist on GitHub explaining step by step how to find the proper log file for your crashed or killed job, how to properly rename it, and how to rsync it up to FOS: [[https://gist.github.com/Asparagirl/155bd3c8ee4b8ad5ed737e45bcad1a5a]]<br />
** Contact [[User:JustAnotherArchivist]] if you need help with this.<br />
* Due to a bug somewhere deep in the network stack, connections get stuck from time to time. This causes jobs to slow down or halt entirely.<br />
** As a workaround, you can use the [https://github.com/JustAnotherArchivist/kill-wpull-connections kill-wpull-connections] script; it requires pgrep, lsof, and gdb. Depending on the machine configuration (specifically, the value of <code>kernel.yama.ptrace_scope</code> in <code>/proc/sys/kernel/yama/ptrace_scope</code>), it may also require root/sudo privileges.<br />
** In very rare cases, you may need to use [http://killcx.sourceforge.net/ killcx] to close the connections.<br />
** [https://github.com/kristrev/tcp_closer tcp_closer] works even when the two methods above fail. It uses the SOCK_DESTROY kernel operation provided by Linux >= 4.5.<br />
* Also due to a bug suspected to be in the network stack, wpull processes sometimes use a lot of RAM (and CPU). If a process uses more than 300 MB continuously, that's likely the case. kill-wpull-connections seems to "fix" this issue, though it takes a while (minutes, rarely even an hour or more) from running the script until the usage actually drops down.<br />
* Make sure that you don't have any <code>search</code> or <code>domain</code> line in <code>/etc/resolv.conf</code>. We've grabbed a number of copies of the websites of OVH and Online.net as a result of such lines and broken <code>http://www/</code> links... (Cf [https://github.com/ArchiveTeam/ArchiveBot/issues/318 this issue on GitHub])<br />
<br />
== Installation ==<br />
<br />
Installing the ArchiveBot can be difficult. The [https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline Pipeline Install] instructions are online, but are tricky.<br />
<br />
But there is a [https://github.com/ArchiveTeam/ArchiveBot/blob/master/.travis.yml Travis.yml automated install script] for [https://travis-ci.org/ArchiveTeam/ArchiveBot Travis-cl] that is designed to test the ArchiveBot. <br />
<br />
Since it's good enough for testing... it's good enough for installation, right? There must be a way to convert it into an installer script.<br />
<br />
== Disclaimers ==<br />
<br />
# Everything is provided on a best-effort basis; nothing is guaranteed to work. (We're volunteers, not a support team.)<br />
# We can decide to stop a job or ban a user if a job is deemed unnecessary. (We don't want to run up operator bandwidth bills and waste Internet Archive donations on costs.)<br />
# We're not Internet Archive. (We do what we want.)<br />
# We're not the Wayback Machine. Specifically, we are not <code>ia_archiver</code> or <code>archive.org_bot</code>. (We don't run crawlers on behalf of other crawlers.)<br />
<br />
Occasionally, we had to ban blocks of IP addresses from the channel. If you think a ban does not apply to you but cannot join the #archivebot channel, please join the main #archiveteam channel instead.<br />
<br />
== Bad Behavior ==<br />
<br />
If you are a website operator and you notice ArchiveBot misbehaving, please contact us on #archivebot or #archiveteam on EFnet (see top of page for links).<br />
<br />
ArchiveBot understands [[robots.txt]] (please read the article) but does not match any directives. It uses it for discovering more links such as sitemaps however.<br />
<br />
Also, please remember that '''we are not the [[Internet Archive|Internet Archive]]'''.<br />
<br />
== More ==<br />
<br />
Like ArchiveBot? Check out our [[Main_Page|homepage]] and other [[projects]]!<br />
<br />
== Notes ==<br />
<br />
<references/><br />
<br />
{{archivebot}}<br />
{{navigation_box}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Yemen/list&diff=35341ArchiveBot/Governments/Yemen/list2019-03-17T20:55:29Z<p>PurpleSymphony: Add news agency</p>
<hr />
<div>http://www.cama.gov.ye/<br />
http://centralbank.gov.ye/<br />
http://www.coca.gov.ye/<br />
http://www.cra.gov.ye/<br />
http://www.customs.gov.ye/<br />
https://www.facebook.com/civlreg1<br />
https://www.facebook.com/cocayem/<br />
https://www.facebook.com/iyygover<br />
https://www.facebook.com/ltaayemen/<br />
https://www.facebook.com/MOIT.News/<br />
https://www.facebook.com/motyemen1/<br />
https://www.facebook.com/YemenTourismPB<br />
http://www.fiu.gov.ye/<br />
http://ghosanaa.gov.ye/<br />
https://www.htb.gov.ye/<br />
https://www.instagram.com/yementourismpb<br />
http://www.ipna.gov.ye/<br />
http://www.ltaa.gov.ye/<br />
http://www.mofa.gov.ye/<br />
http://www.moh.gov.ye/<br />
http://moit.gov.ye/<br />
http://moj.gov.ye/<br />
https://www.mom.gov.ye/<br />
https://www.mot.gov.ye/<br />
http://www.mtevt.gov.ye/<br />
http://mtevt.gov.ye/<br />
http://www.mwe.gov.ye/<br />
https://plus.google.com/+Yementourism<br />
http://www.ptc.gov.ye/<br />
http://sbdma.gov.ye/<br />
http://www.sfp.gov.ye/<br />
http://www.smc.gov.ye/<br />
http://srs-mohe.gov.ye/<br />
https://www.tax.gov.ye/<br />
https://twitter.com/DYXOTE3bcvoUGQy<br />
https://twitter.com/ipnagovye<br />
https://twitter.com/MOITNewsYE<br />
https://twitter.com/MOTyemen1<br />
https://www.twitter.com/YemenTourismPB<br />
https://twitter.com/ypyemen1<br />
http://www.uajnas.gov.ye/<br />
http://www.yemen.gov.ye/<br />
http://yemenparliament.gov.ye/<br />
http://yementourism.com/<br />
http://yiig.gov.ye/<br />
http://www.yipo.gov.ye/<br />
https://www.youtube.com/channel/UC-I7EaYciN4S3kFDIUHz52g<br />
https://www.youtube.com/channel/UC68y3ARh5lvVxeUZ7rYSLJg/videos?view=57&flow=grid<br />
https://www.youtube.com/channel/UC8bvY_P-ysT7kbmUwZWPd0Q<br />
https://www.youtube.com/channel/UCEq92x5ko3qHpltw_KufwNg<br />
https://www.youtube.com/channel/UCG-ynm41IN1ygWFCbVIWBUQ<br />
https://www.youtube.com/user/MOTYemen<br />
http://www.yrsgisc.gov.ye/<br />
http://saba.ye/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Template:ArchiveBot&diff=35340Template:ArchiveBot2019-03-17T18:27:27Z<p>PurpleSymphony: Add Yemen</p>
<hr />
<div>{{navbox<br />
|name=Template:ArchiveBot<br />
|status={{{status|plain}}}<br />
|title=[[ArchiveBot]]<br />
<br />
|group1=[[GLAM]]<br />
|list1=<br />
[[ArchiveBot/National Archives|National Archives]]{{·}} <br />
[[ArchiveBot/National Film Archives|National Film Archives]]{{·}} <br />
[[ArchiveBot/National Galleries|National Galleries]]{{·}} <br />
[[ArchiveBot/National Libraries|National Libraries]]{{·}} <br />
[[ArchiveBot/National Museums|National Museums]]<br />
<br />
|group2=Governments<br />
|list2=<br />
[[ArchiveBot/Governments/Algeria|Algeria]]{{·}} <br />
[[ArchiveBot/Governments/Cape Verde|Cape Verde]]{{·}} <br />
[[ArchiveBot/Governments/Malta|Malta]]{{·}} <br />
[[ArchiveBot/Governments/Micronesia|Micronesia]]{{·}} <br />
[[ArchiveBot/Governments/Oman|Oman]]{{·}} <br />
[[ArchiveBot/Governments/Yemen|Yemen]]<br />
<br />
|group3=History<br />
|list3=<br />
[[ArchiveBot/Memoria Histórica|Memoria Histórica]]{{·}} <br />
[[ArchiveBot/List of oldest companies|Oldest companies]]<br />
<br />
|group4=[[People]]<br />
|list4=<br />
[[ArchiveBot/Archivists|Archivists]]{{·}} <br />
[[ArchiveBot/Facebook people|Facebook]]{{·}} <br />
[[ArchiveBot/Google people|Google]]{{·}} <br />
[[ArchiveBot/IBM people|IBM]]{{·}} <br />
[[ArchiveBot/Microsoft people|Microsoft]]{{·}} <br />
[[ArchiveBot/Travelers|Travelers]]{{·}} <br />
[[ArchiveBot/People with physical disabilities|People with physical disabilities]]<br />
<br />
|group5=Politics<br />
|list5=<br />
[[ArchiveBot/2018 Brazilian general elections|2018 Brazilian general elections]]{{·}} <br />
[[ArchiveBot/Alternative media (political left)|Alternative media (political left)]]{{·}} <br />
[[ArchiveBot/Venezuela politics|Venezuela politics]]<br />
<br />
|group10=Other<br />
|list10=<br />
[[ArchiveBot/Artificial Intelligence|Artificial Intelligence]]{{·}} <br />
[[ArchiveBot/Futurology|Futurology]]{{·}} <br />
[[ArchiveBot/Knowledge preservation initiatives|Knowledge preservation initiatives]]{{·}} <br />
[[ArchiveBot/Reddit|Reddit]]{{·}} <br />
[[ArchiveBot/Wikis|Wikis]]<br />
<br />
|end=<br />
}}{{#ifeq:{{NAMESPACENUMBER}}|0|[[Category:ArchiveBot]]}}<noinclude>[[Category:Templates]]</noinclude></div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Oman/list&diff=35301ArchiveBot/Governments/Oman/list2019-03-15T20:19:38Z<p>PurpleSymphony: +Shura</p>
<hr />
<div>http://baitalbaranda.mm.gov.om/<br />
https://www.cbo.gov.om/<br />
https://cert.gov.om/<br />
https://www.cma.gov.om/<br />
http://www.dm.gov.om/<br />
https://eservices.housing.gov.om/<br />
https://www.fiu.gov.om/<br />
https://home.trc.gov.om/<br />
http://www.ita.gov.om/<br />
https://www.manpower.gov.om/<br />
https://www.mara.gov.om/<br />
http://www.mctmnet.gov.om/<br />
https://meca.gov.om/en/<br />
http://www.mhc.gov.om/<br />
http://www.mm.gov.om/<br />
http://www.mocioman.gov.om/<br />
http://www.mocs.gov.om/<br />
http://www.mod.gov.om/<br />
http://www.moe.gov.om/<br />
http://www.mof.gov.om/<br />
https://www.mofa.gov.om/<br />
http://www.mofw.gov.om/<br />
https://www.moh.gov.om/<br />
http://www.mohe.gov.om/<br />
http://www.moi.gov.om/<br />
https://www.moj.gov.om/<br />
http://www.mola.gov.om/<br />
https://mosa.gov.om/<br />
https://www.mosd.gov.om/<br />
https://www.motc.gov.om/<br />
https://www.mrmwr.gov.om/<br />
http://www.msm.gov.om/<br />
https://www.ncsi.gov.om/<br />
https://www.omantourism.gov.om/<br />
https://www.paca.gov.om/<br />
http://part.gov.om/<br />
http://www.rop.gov.om/<br />
http://www.sai.gov.om/<br />
http://www.shinas.gov.om/<br />
https://shura.om/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Algeria&diff=35082ArchiveBot/Governments/Algeria2019-03-09T08:53:08Z<p>PurpleSymphony: Created page with "<!-- bot --> <!-- /bot --> {{archivebot}}"</p>
<hr />
<div><!-- bot --><br />
<br />
<!-- /bot --><br />
<br />
{{archivebot}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Algeria/list&diff=35081ArchiveBot/Governments/Algeria/list2019-03-09T08:52:41Z<p>PurpleSymphony: Created page with "http://www.mtp.gov.dz/ https://www.mta.gov.dz/ https://www.facebook.com/algerietourismeartisanat/ https://twitter.com/TourismeArtisan https://www.youtube.com/channel/UC2w-aQkI..."</p>
<hr />
<div>http://www.mtp.gov.dz/<br />
https://www.mta.gov.dz/<br />
https://www.facebook.com/algerietourismeartisanat/<br />
https://twitter.com/TourismeArtisan<br />
https://www.youtube.com/channel/UC2w-aQkIRcP1Bag4AxUiveA?view_as=subscriber<br />
http://www.dcwconstantine.gov.dz/<br />
https://www.cnl.gov.dz/<br />
https://www.mpttn.gov.dz/<br />
http://www.mf.gov.dz/<br />
http://www.andi.gov.dz/<br />
http://www.dcmascara.gov.dz/<br />
http://www.anam.gov.dz/<br />
http://www.mae.gov.dz/<br />
http://www.mtess.gov.dz/<br />
http://www.mf-ctrf.gov.dz/<br />
http://www.education.gov.dz/<br />
http://www.msnfcf.gov.dz/<br />
http://www.mree.gov.dz/<br />
http://www.mree.gov.dz/<br />
http://www.ministerecommunication.gov.dz/<br />
http://www.douane.gov.dz/<br />
https://www.commerce.gov.dz/<br />
http://www.mdipi.gov.dz/<br />
http://www.creg.gov.dz/<br />
http://www.arh.gov.dz/<br />
http://www.sante.gov.dz/<br />
http://www.dge.gov.dz/<br />
http://www.el-mouradia.dz/<br />
http://www.dcwsoukahras.gov.dz/<br />
http://www.mre.gov.dz/<br />
http://www.energy.gov.dz/<br />
https://www.mincommerce.gov.dz/<br />
http://www.mpeche.gov.dz/<br />
http://www.mrp.gov.dz/<br />
https://www.mfep.gov.dz/<br />
http://www.bibans-info.gov.dz/<br />
http://www.premier-ministre.gov.dz/<br />
http://www.sante.gov.dz/<br />
http://www.interieur.gov.dz/<br />
http://www.asal.dz/<br />
http://www.biblionat.dz/<br />
http://www.pt.dz/<br />
http://www.fnadz.org/<br />
http://www.ffs.dz/<br />
http://www.rnd-dz.org/<br />
http://www.pfln.dz/<br />
http://www.majliselouma.dz/<br />
http://www.elmouradia.dz/<br />
http://www.ministere-transports.gov.dz/<br />
http://www.mate.gov.dz/<br />
http://www.dgf.org.dz/<br />
https://www.png-dz.org/<br />
http://www.energy.gov.dz/<br />
http://www.mf.gov.dz/<br />
http://www.mesrs.dz/<br />
http://www.mjustice.dz/<br />
http://www.m-culture.gov.dz/<br />
http://www.sante.dz/<br />
https://www.mpttn.gov.dz/<br />
http://www.apn.gov.dz/<br />
http://www.meer.gov.dz/<br />
http://www.dgfp.gov.dz/<br />
http://www.foncier-finance.gov.dz/<br />
http://www.ministerecommunication.gov.dz/<br />
http://wilaya-batna.gov.dz/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Yemen&diff=35053ArchiveBot/Governments/Yemen2019-03-08T20:59:22Z<p>PurpleSymphony: Created page with "<!-- bot --> <!-- /bot --> {{archivebot}}"</p>
<hr />
<div><!-- bot --><br />
<br />
<!-- /bot --><br />
<br />
{{archivebot}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Yemen/list&diff=35052ArchiveBot/Governments/Yemen/list2019-03-08T20:58:38Z<p>PurpleSymphony: Created page with "http://yementourism.com/ https://www.facebook.com/YemenTourismPB https://plus.google.com/+Yementourism https://www.twitter.com/YemenTourismPB https://www.youtube.com/channel/U..."</p>
<hr />
<div>http://yementourism.com/<br />
https://www.facebook.com/YemenTourismPB<br />
https://plus.google.com/+Yementourism<br />
https://www.twitter.com/YemenTourismPB<br />
https://www.youtube.com/channel/UC8bvY_P-ysT7kbmUwZWPd0Q<br />
https://www.instagram.com/yementourismpb<br />
http://www.yemen.gov.ye/<br />
http://www.yrsgisc.gov.ye/<br />
http://www.sfp.gov.ye/<br />
http://www.smc.gov.ye/<br />
https://www.mom.gov.ye/<br />
http://yemenparliament.gov.ye/<br />
https://www.youtube.com/channel/UC-I7EaYciN4S3kFDIUHz52g<br />
https://twitter.com/ypyemen1<br />
http://www.ipna.gov.ye/<br />
https://twitter.com/ipnagovye<br />
http://www.cama.gov.ye/<br />
http://srs-mohe.gov.ye/<br />
http://mtevt.gov.ye/<br />
http://www.fiu.gov.ye/<br />
https://www.facebook.com/iyygover<br />
http://yiig.gov.ye/<br />
https://www.youtube.com/channel/UCEq92x5ko3qHpltw_KufwNg<br />
http://centralbank.gov.ye/<br />
http://www.mwe.gov.ye/<br />
https://www.mot.gov.ye/<br />
https://www.youtube.com/user/MOTYemen<br />
https://twitter.com/MOTyemen1<br />
https://www.facebook.com/motyemen1/<br />
http://moj.gov.ye/<br />
http://www.ltaa.gov.ye/<br />
https://www.youtube.com/channel/UC68y3ARh5lvVxeUZ7rYSLJg/videos?view=57&flow=grid<br />
https://www.facebook.com/ltaayemen/<br />
https://www.tax.gov.ye<br />
http://www.yipo.gov.ye/<br />
http://www.mofa.gov.ye/<br />
http://moit.gov.ye/<br />
https://twitter.com/MOITNewsYE<br />
https://www.facebook.com/MOIT.News/<br />
http://www.cra.gov.ye/<br />
https://www.facebook.com/civlreg1<br />
http://www.ptc.gov.ye/<br />
http://www.coca.gov.ye/<br />
https://www.youtube.com/channel/UCG-ynm41IN1ygWFCbVIWBUQ<br />
https://twitter.com/DYXOTE3bcvoUGQy<br />
https://www.facebook.com/cocayem/<br />
http://www.mtevt.gov.ye/<br />
http://www.uajnas.gov.ye/<br />
http://sbdma.gov.ye/<br />
https://www.htb.gov.ye/<br />
http://www.customs.gov.ye/<br />
http://ghosanaa.gov.ye/<br />
http://www.moh.gov.ye/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Oman&diff=34973ArchiveBot/Governments/Oman2019-03-07T08:17:14Z<p>PurpleSymphony: Created page with "Omani government sites. <!-- bot --> <!-- /bot --> {{archivebot}}"</p>
<hr />
<div>Omani government sites.<br />
<br />
<!-- bot --><br />
<!-- /bot --><br />
<br />
{{archivebot}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot/Governments/Oman/list&diff=34960ArchiveBot/Governments/Oman/list2019-03-06T16:19:51Z<p>PurpleSymphony: Created page with "http://baitalbaranda.mm.gov.om/ http://part.gov.om/ https://cert.gov.om/ https://eservices.housing.gov.om/ https://home.trc.gov.om/ https://meca.gov.om/en/ https://mosa.gov.om..."</p>
<hr />
<div>http://baitalbaranda.mm.gov.om/<br />
http://part.gov.om/<br />
https://cert.gov.om/<br />
https://eservices.housing.gov.om/<br />
https://home.trc.gov.om/<br />
https://meca.gov.om/en/<br />
https://mosa.gov.om/<br />
https://www.cbo.gov.om/<br />
https://www.cma.gov.om/<br />
https://www.fiu.gov.om/<br />
https://www.manpower.gov.om/<br />
https://www.mara.gov.om/<br />
https://www.mofa.gov.om/<br />
https://www.moh.gov.om/<br />
https://www.moj.gov.om/<br />
https://www.mosd.gov.om/<br />
https://www.motc.gov.om/<br />
https://www.mrmwr.gov.om/<br />
https://www.ncsi.gov.om/<br />
https://www.omantourism.gov.om/<br />
https://www.paca.gov.om/<br />
http://www.dm.gov.om<br />
http://www.ita.gov.om<br />
http://www.mctmnet.gov.om/<br />
http://www.mhc.gov.om<br />
http://www.mm.gov.om/<br />
http://www.mocioman.gov.om<br />
http://www.mocs.gov.om/<br />
http://www.mod.gov.om/<br />
http://www.moe.gov.om<br />
http://www.mof.gov.om<br />
http://www.mofw.gov.om<br />
http://www.mohe.gov.om<br />
http://www.moi.gov.om<br />
http://www.mola.gov.om<br />
http://www.msm.gov.om<br />
http://www.rop.gov.om/<br />
http://www.sai.gov.om/<br />
http://www.shinas.gov.om</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34585IC datasheets2019-02-10T09:06:12Z<p>PurpleSymphony: /* Existing archives */</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
'''Bold''' = Well known<br />
<br />
''Italic'' = No longer exists<br />
<br />
=== Manufacturers ===<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
! Manufacturer<br />
<br />
|-<br />
| [https://www.4dsystems.com.au/ 4D Systems] – [https://www.4dsystems.com.au/products products]<br />
<br />
|-<br />
| [https://www.ablic.com/en/semicon/ ABLIC] – [https://www.ablic.com/en/semicon/products/ products] [https://www.ablic.com/en/semicon/datasheets/ datasheets]<br />
<br />
|-<br />
| [http://www.adda.com.tw/ ADDA]<br />
<br />
|-<br />
| [https://www.aldinc.com/ '''Advanced Linear Devices'''] ('''ALD''')<br />
<br />
|-<br />
| [https://www.alps.com/ ALPS] – [https://www.alps.com/products/ products]<br />
<br />
|-<br />
| [https://www.amphenol.com/ '''Amphenol''']<br />
* [https://www.amphenol-aerospace.com/ Amphenol Aerospace] – [https://www.amphenol-aerospace.com/all datasheets]<br />
* [https://www.amphenol-icc.com/ Amphenol ICC]<br />
<br />
|-<br />
| [https://www.ampleon.com/products.html Ampleon] – [https://www.ampleon.com/ products]<br />
<br />
|-<br />
| [https://www.analog.com/ '''Analog Devices'''] ('''AD''') – [https://www.analog.com/en/products.html products]<br />
* '''''Linear Technology''''' ('''''LT''''')<br />
<br />
|-<br />
| [http://www.angstrem.ru/ Angstrem] (Ангстрем) – [http://www.angstrem.ru/ru/catalog products]<br />
<br />
|-<br />
| [http://www.delevan.com/ API Delevan] (Delevan)<br />
<br />
|-<br />
| [https://www.atlas-scientific.com/ Atlas Scientific]<br />
<br />
|-<br />
| [http://www.bkprecision.com/ B&K Precision] – [http://www.bkprecision.com/products.html products]<br />
<br />
|-<br />
| [https://www.baumer.com/ Baumer] – [https://www.baumer.com/us/en/product-overview/c/276 products]<br />
<br />
|-<br />
| [https://belfuse.com/ Bel] – [https://belfuse.com/home/Products products]<br />
<br />
|-<br />
| [https://www.broadcom.com/ '''Broadcom'''] – [https://www.broadcom.com/products/ products]<br />
<br />
|-<br />
| [https://store.digilentinc.com/ Digilent Inc] (Digilent) – [https://reference.digilentinc.com/ documentation]<br />
<br />
|-<br />
| [https://www.diodes.com/products/ Diodes] – [https://www.diodes.com/products/ products]<br />
<br />
|-<br />
| [https://www.espressif.com/ Espressif] – [https://www.espressif.com/en/products/hardware products] [https://www.espressif.com/en/support/download/documents datasheets]<br />
<br />
|-<br />
| [https://www.infineon.com/ '''Infineon'''] – [https://www.infineon.com/cms/en/product/ products]<br />
<br />
|-<br />
| [https://www.intel.com/ '''Intel''']<br />
<br />
|-<br />
| [https://global.kyocera.com/ KYOCERA] – [https://global.kyocera.com/prdct/index.html products]<br />
* [https://www.avx.com/ AVX] – [https://www.avx.com/products/ products]<br />
** [https://www.abelektronik.com/ AB Elektronik] – [https://www.abelektronik.com/en/products.html products] [https://www.abelektronik.com/en/service/datasheets.html datasheets]<br />
<br />
|-<br />
| [https://www.littelfuse.com/ Littelfuse] – [https://www.littelfuse.com/products.aspx products]<br />
* [http://www.ixys.com/ IXYS] – [http://www.ixys.com/ProductPortfolio.aspx products]<br />
* [http://www.ixysic.com/ IXYS IC]<br />
<br />
|-<br />
| [https://www.exar.com/ MaxLinear]<br />
<br />
|-<br />
| [https://www.mediatek.com/ '''MediaTek''']<br />
<br />
|-<br />
| [https://www.microchip.com/ '''Microchip'''] – [https://www.microchip.com/products products]<br />
* '''''Atmel'''''<br />
<br />
|-<br />
| [https://www.micron.com/ '''Micron Technology'''] ('''Micron''') – [https://www.micron.com/products products]<br />
* ''Elpida''<br />
<br />
|-<br />
| [https://www.nexperia.com/ Nexperia] – [https://www.nexperia.com/products/ products]<br />
<br />
|-<br />
| [http://www.chemi-con.co.jp/e/ Nippon Chemi-Con] – [http://www.chemi-con.co.jp/e/catalog/index.html products]<br />
<br />
|-<br />
| [https://www.nxp.com/ '''NXP'''] – [https://www.nxp.com/products:PCPRODCAT products]<br />
<br />
|-<br />
| [https://www.onsemi.com/ '''ON Semiconductor'''] ('''ON Semi''') – [https://www.onsemi.com/PowerSolutions/products.do products]<br />
* ''Fairchild Semiconductor'' (''Fairchild'')<br />
<br />
|-<br />
| [https://www.panasonic.com/ '''Panasonic''']<br />
<br />
|-<br />
| [https://www.qualcomm.com/ '''Qualcomm'''] – [https://www.qualcomm.com/products products]<br />
<br />
|-<br />
| [https://www.renesas.com/ '''Renesas'''] – [https://www.renesas.com/us/en/products.html products]<br />
<br />
|-<br />
| [http://www.rubycon.com/ Rubycon] – [http://www.rubycon.co.jp/en/products/index.html products]<br />
<br />
|-<br />
| [https://www.samsung.com/semiconductor/ '''Samsung''']<br />
<br />
|-<br />
| [https://www.sii.co.jp/ Seiko Instruments Inc] (SII)<br />
<br />
|-<br />
| [https://www.skhynix.com/ '''SK Hynix'''] – [https://www.skhynix.com/eng/product/productIndex.jsp products]<br />
<br />
|-<br />
| [http://www.skyworksinc.com/ Skyworks] (Skyworks Solutions Inc)<br />
* [http://www.analogictech.com/ AnalogicTech]<br />
<br />
|-<br />
| '''Sony'''<br />
<br />
|-<br />
| [https://www.st.com/ '''STMicroelectronics'''] ('''STM''') – [https://www.st.com/content/st_com/en/product-selector-welcome.html products]<br />
<br />
|-<br />
| [http://www.taiwanalpha.com/ Taiwan Alpha] (Alpha) – [http://www.taiwanalpha.com/products products]<br />
<br />
|-<br />
| [https://www.te.com/ TE Connectivity] (formerly Tyco Electronics)<br />
<br />
|-<br />
| [https://www.ti.com/ '''Texas Instruments'''] ('''TI''')<br />
<br />
|-<br />
| [https://www.toshiba.co.jp/ '''Toshiba''']<br />
<br />
|-<br />
| [https://www.ttelectronics.com/ TT Electronics] – [https://www.ttelectronics.com/products/ products]<br />
<br />
|-<br />
| [http://www.upi-semi.com/ uPi Semiconductor] (uPi Semi) – [http://www.upi-semi.com/en-category-upi-267 products]<br />
<br />
|-<br />
| [https://www.vishay.com/ '''Vishay'''] – [https://www.vishay.com/products/ products]<br />
<br />
|-<br />
| [https://www.westerndigital.com/ Western Digital] (WD) – [https://www.westerndigital.com/products/all-products products]<br />
<br />
|}<br />
<br />
=== Distributors ===<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
! Distributor !! Notes<br />
<br />
|-<br />
| [http://www.4starelectronics.com/ 4 Star Electronics]<br />
||<br />
<br />
|-<br />
| [https://www.arrow.com/ '''Arrow Electronics Inc'''] ('''Arrow''') – [https://www.arrow.com/en/products products] [https://www.arrow.com/en/datasheets datasheets]<br />
* [https://www.chip1stop.com/ Chip One Stop Inc] (Chip One Stop)<br />
|| Can be enumerated using <nowiki>https://www.arrow.com/en/datasheets/<id></nowiki><br />
<br />
|-<br />
| [https://www.avnet.com/ '''Avnet Inc'''] ('''Avnet''') – [https://www.avnet.com/shop/AllProducts products]<br />
||<br />
<br />
|-<br />
| [https://chip1.com/ Chip 1 Exchange]<br />
||<br />
<br />
|-<br />
| [https://www.conrad.com/ Conrad Electronic International GmbH & Co KG] (Conrad)<br />
* [https://www.rapidonline.com/ Rapid Electronics Ltd] (Rapid)<br />
||<br />
<br />
|-<br />
| [https://www.dfsales.com/ DF Sales Co] (DF Sales)<br />
||<br />
<br />
|-<br />
| [https://www.digikey.com/ '''Digi-Key Electronics'''] ('''Digi-Key''')<br />
|| Links to vendors' websites<br />
<br />
|-<br />
| [http://www.electronics123.com/ Electronics123.com Inc] (Electronics123)<br />
||<br />
<br />
|-<br />
| [https://www.eurotech.co.uk/ Euro-Tech Export Ltd] (Euro-Tech Export)<br />
||<br />
<br />
|-<br />
| [https://www.futureelectronics.com/ Future Electronics Inc] (Future Electronics)<br />
||<br />
<br />
|-<br />
| [http://holdelec.net/ Holdelec SAS] (Holdelec)<br />
||<br />
<br />
|-<br />
| [https://www.jameco.com/ Jameco Electronics] (Jameco)<br />
||<br />
<br />
|-<br />
| [https://lcsc.com/ LCSC] – [https://lcsc.com/products products]<br />
||<br />
<br />
|-<br />
| [https://www.masterelectronics.com/ Master Electronics]<br />
||<br />
<br />
|-<br />
| [https://www.mouser.com/ '''Mouser Electronics Inc'''] ('''Mouser''')<br />
|| Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
<br />
|-<br />
| [https://www.onlinecomponents.com/ Online Components]<br />
||<br />
<br />
|-<br />
| [https://www.premierfarnell.com/ Premier Farnell]<br />
* [https://cpc.farnell.com/ Combined Precision Components] (CPC)<br />
* [https://www.farnell.com/ '''Farnell element14'''] ('''Farnell''')<br />
* [https://www.newark.com/ Newark element14] (Newark)<br />
|| Farnell element14: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
<br />
|-<br />
| [https://www.reichelt.de/ Reichelt Elektronik GmbH & Co KG] (Reichelt)<br />
|| Self-hosted<br />
<br />
|-<br />
| [http://www.rs-online.com/ '''RS Components Ltd'''] ('''RS Components''')<br />
* [https://www.alliedelec.com/ '''Allied Electronics & Automation'''] ('''Allied''')<br />
|| RS Components: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
<br />
|-<br />
| [https://www.rutronik.com/ Rutronik]<br />
||<br />
<br />
|-<br />
| [https://www.swatee.com/ Swatee Electronics] (Swatee)<br />
||<br />
<br />
|-<br />
| [https://www.ttiinc.com/ TTI Inc] (TTI)<br />
||<br />
<br />
|-<br />
| [https://www.waytekwire.com/ Waytek Inc] (Waytek)<br />
||<br />
<br />
|-<br />
| [https://www.wpgholdings.com/ WPG Holdings]<br />
||<br />
<br />
|-<br />
| [http://www.wtmec.com/ WT Microelectronics Co Ltd] (WT Microelectronics)<br />
||<br />
<br />
|}<br />
<br />
=== Vendors ===<br />
<br />
* [https://www.arrow.com/en/manufacturers?tab=showall Arrow]<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/<br />
* https://archive.org/details/ic_datasheets<br />
* http://datasheets.chipdb.org/<br />
<br />
== Unsorted list ==<br />
<br />
* [https://www.te.com/ TE Connectivity] (formerly Tyco Electronics)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34518IC datasheets2019-02-08T07:29:31Z<p>PurpleSymphony: /* Distributors */</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
'''Bold''' = Well known<br />
<br />
''Italic'' = No longer exists<br />
<br />
=== Manufacturers ===<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
! Manufacturer<br />
<br />
|-<br />
| [https://www.4dsystems.com.au/ 4D Systems] – [https://www.4dsystems.com.au/products products]<br />
<br />
|-<br />
| [https://www.ablic.com/en/semicon/ ABLIC] – [https://www.ablic.com/en/semicon/products/ products] [https://www.ablic.com/en/semicon/datasheets/ datasheets]<br />
<br />
|-<br />
| [http://www.adda.com.tw/ ADDA]<br />
<br />
|-<br />
| [https://www.aldinc.com/ '''Advanced Linear Devices'''] (ALD)<br />
<br />
|-<br />
| [https://www.alps.com/ ALPS] – [https://www.alps.com/products/ products]<br />
<br />
|-<br />
| [https://www.amphenol.com/ Amphenol]<br />
* [https://www.amphenol-aerospace.com/ Amphenol Aerospace] – [https://www.amphenol-aerospace.com/all datasheets]<br />
* [https://www.amphenol-icc.com/ Amphenol ICC]<br />
<br />
|-<br />
| [https://www.ampleon.com/products.html Ampleon] – [https://www.ampleon.com/ products]<br />
<br />
|-<br />
| [https://www.analog.com/ '''Analog Devices'''] (AD) – [https://www.analog.com/en/products.html products]<br />
* '''''Linear Technology''''' (LT)<br />
<br />
|-<br />
| [http://www.angstrem.ru/ Angstrem] (Ангстрем) – [http://www.angstrem.ru/ru/catalog products]<br />
<br />
|-<br />
| [http://www.delevan.com/ API Delevan] (Delevan)<br />
<br />
|-<br />
| [https://www.atlas-scientific.com/ Atlas Scientific]<br />
<br />
|-<br />
| [http://www.bkprecision.com/ B&K Precision] – [http://www.bkprecision.com/products.html products]<br />
<br />
|-<br />
| [https://www.baumer.com/ Baumer] – [https://www.baumer.com/us/en/product-overview/c/276 products]<br />
<br />
|-<br />
| [https://belfuse.com/ Bel] – [https://belfuse.com/home/Products products]<br />
<br />
|-<br />
| [https://www.broadcom.com/ '''Broadcom'''] – [https://www.broadcom.com/products/ products]<br />
<br />
|-<br />
| [https://www.diodes.com/products/ Diodes] – [https://www.diodes.com/products/ products]<br />
<br />
|-<br />
| [https://store.digilentinc.com/ Digilent Inc] (Digilent) – [https://reference.digilentinc.com/ documentation]<br />
<br />
|-<br />
| [https://www.espressif.com/ Espressif] – [https://www.espressif.com/en/products/hardware products] [https://www.espressif.com/en/support/download/documents datasheets]<br />
<br />
|-<br />
| [https://www.infineon.com/ '''Infineon'''] – [https://www.infineon.com/cms/en/product/ products]<br />
<br />
|-<br />
| [https://www.intel.com/ '''Intel''']<br />
<br />
|-<br />
| [https://global.kyocera.com/ KYOCERA] – [https://global.kyocera.com/prdct/index.html products]<br />
* [https://www.avx.com/ AVX] – [https://www.avx.com/products/ products]<br />
** [https://www.abelektronik.com/ AB Elektronik] – [https://www.abelektronik.com/en/products.html products] [https://www.abelektronik.com/en/service/datasheets.html datasheets]<br />
<br />
|-<br />
| [https://www.littelfuse.com/ Littelfuse] – [https://www.littelfuse.com/products.aspx products]<br />
* [http://www.ixys.com/ IXYS] – [http://www.ixys.com/ProductPortfolio.aspx products]<br />
* [http://www.ixysic.com/ IXYS IC]<br />
<br />
|-<br />
| [https://www.exar.com/ MaxLinear]<br />
<br />
|-<br />
| [https://www.mediatek.com/ '''MediaTek''']<br />
<br />
|-<br />
| [https://www.microchip.com/ '''Microchip'''] – [https://www.microchip.com/products products]<br />
* '''''Atmel'''''<br />
<br />
|-<br />
| [https://www.micron.com/ '''Micron Technology'''] (Micron) – [https://www.micron.com/products products]<br />
* ''Elpida''<br />
<br />
|-<br />
| [https://www.nexperia.com/ Nexperia] – [https://www.nexperia.com/products/ products]<br />
<br />
|-<br />
| [http://www.chemi-con.co.jp/e/ Nippon Chemi-Con] – [http://www.chemi-con.co.jp/e/catalog/index.html products]<br />
<br />
|-<br />
| [https://www.nxp.com/ '''NXP'''] – [https://www.nxp.com/products:PCPRODCAT products]<br />
<br />
|-<br />
| [https://www.onsemi.com/ '''ON Semiconductor'''] (ON Semi) – [https://www.onsemi.com/PowerSolutions/products.do products]<br />
* ''Fairchild Semiconductor'' (Fairchild)<br />
<br />
|-<br />
| [https://www.panasonic.com/ '''Panasonic''']<br />
<br />
|-<br />
| [https://www.qualcomm.com/ '''Qualcomm'''] – [https://www.qualcomm.com/products products]<br />
<br />
|-<br />
| [https://www.renesas.com/ '''Renesas'''] – [https://www.renesas.com/us/en/products.html products]<br />
<br />
|-<br />
| [http://www.rubycon.com/ Rubycon] – [http://www.rubycon.co.jp/en/products/index.html products]<br />
<br />
|-<br />
| [https://www.samsung.com/semiconductor/ '''Samsung''']<br />
<br />
|-<br />
| [https://www.sii.co.jp/ Seiko Instruments Inc] (SII)<br />
<br />
|-<br />
| [https://www.skhynix.com/ '''SK Hynix'''] – [https://www.skhynix.com/eng/product/productIndex.jsp products]<br />
<br />
|-<br />
| [http://www.skyworksinc.com/ Skyworks] (Skyworks Solutions Inc)<br />
* [http://www.analogictech.com/ AnalogicTech]<br />
<br />
|-<br />
| '''Sony'''<br />
<br />
|-<br />
| [https://www.st.com/ '''STMicroelectronics'''] (STM) – [https://www.st.com/content/st_com/en/product-selector-welcome.html products]<br />
<br />
|-<br />
| [http://www.taiwanalpha.com/ Taiwan Alpha] (Alpha) – [http://www.taiwanalpha.com/products products]<br />
<br />
|-<br />
| [https://www.ti.com/ '''Texas Instruments'''] (TI)<br />
<br />
|-<br />
| [https://www.toshiba.co.jp/ '''Toshiba''']<br />
<br />
|-<br />
| [https://www.ttelectronics.com/ TT Electronics] – [https://www.ttelectronics.com/products/ products]<br />
<br />
|-<br />
| [http://www.upi-semi.com/ uPi Semiconductor] (uPi Semi) – [http://www.upi-semi.com/en-category-upi-267 products]<br />
<br />
|-<br />
| [https://www.vishay.com/ Vishay] – [https://www.vishay.com/products/ products]<br />
<br />
|-<br />
| [https://www.westerndigital.com/ Western Digital] (WD) – [https://www.westerndigital.com/products/all-products products]<br />
<br />
|}<br />
<br />
=== Distributors ===<br />
<br />
{| class="wikitable sortable"<br />
|-<br />
! Distributor !! Notes<br />
<br />
|-<br />
| [https://www.arrow.com/ Arrow] – [https://www.arrow.com/en/products products] [https://www.arrow.com/en/datasheets datasheets]<br />
|| Can be enumerated using <nowiki>https://www.arrow.com/en/datasheets/<id></nowiki><br />
<br />
|-<br />
| [https://www.avnet.com/ Avnet] – [https://www.avnet.com/shop/AllProducts products]<br />
||<br />
<br />
|-<br />
| [https://www.digikey.com/ '''Digi-Key''']<br />
|| Links to vendors' websites<br />
<br />
|-<br />
| [https://lcsc.com/ LCSC] – [https://lcsc.com/products products]<br />
||<br />
<br />
|-<br />
| [https://www.mouser.com/ '''Mouser''']<br />
|| Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
<br />
|-<br />
| [https://www.onlinecomponents.com/ Online Components]<br />
||<br />
<br />
|-<br />
| [https://www.premierfarnell.com/ Premier Farnell]<br />
* [https://cpc.farnell.com/ Combined Precision Components] (CPC)<br />
* [https://www.farnell.com/ '''Farnell element14'''] (Farnell)<br />
* [https://www.newark.com/ Newark element14] (Newark)<br />
|| Farnell element14: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
<br />
|-<br />
| [https://www.reichelt.de/ Reichelt]<br />
|| Self-hosted<br />
<br />
|-<br />
| [http://www.rs-online.com/ RS Components]<br />
* [https://www.alliedelec.com/ Allied Electronics & Automation] (Allied Elec)<br />
|| RS Components: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
<br />
|}<br />
<br />
=== Vendors ===<br />
<br />
* [https://www.arrow.com/en/manufacturers?tab=showall Arrow]<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/<br />
* https://archive.org/details/ic_datasheets<br />
<br />
== Unsorted list ==<br />
<br />
* [https://www.te.com/ TE Connectivity] (formerly Tyco Electronics)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Chromebot&diff=34388Chromebot2019-01-31T15:43:38Z<p>PurpleSymphony: Providing some answers on most common questions</p>
<hr />
<div>chromebot is an [[IRC]] bot parallel to [[ArchiveBot]] that uses Google Chrome and thus is able to archive JavaScript-heavy websites. Both, [https://github.com/PromyLOPh/crocoite software] and bot, are maintained by [[User:PurpleSymphony]]. WARCs are uploaded daily to the [https://archive.org/details/archiveteam_chromebot?sort=-publicdate chromebot collection] on archive.org.<br />
<br />
By default the bot only grabs a single URL. However it supports recursion, which is rather slow, since every single page needs to be loaded and rendered by a browser. A [https://6xq.net/chromebot/ dashboard] is available for watching the progress of such jobs.</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34338IC datasheets2019-01-27T08:12:47Z<p>PurpleSymphony: /* Existing archives */</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
'''Bold''' = Well known<br />
<br />
''Italic'' = No longer exists<br />
<br />
* [https://www.4dsystems.com.au/ 4D Systems] – [https://www.4dsystems.com.au/products products]<br />
* [https://www.ablic.com/en/semicon/ ABLIC] – [https://www.ablic.com/en/semicon/products/ products] [https://www.ablic.com/en/semicon/datasheets/ datasheets]<br />
* [http://www.adda.com.tw/ ADDA]<br />
* [https://www.aldinc.com/ '''Advanced Linear Devices'''] (ALD)<br />
* [https://www.alps.com/ ALPS] – [https://www.alps.com/products/ products]<br />
<br />
* [https://www.amphenol.com/ Amphenol]<br />
** [https://www.amphenol-aerospace.com/ Amphenol Aerospace] – [https://www.amphenol-aerospace.com/all datasheets]<br />
** [https://www.amphenol-icc.com/ Amphenol ICC]<br />
<br />
* [https://www.ampleon.com/products.html Ampleon] – [https://www.ampleon.com/ products]<br />
<br />
* [https://www.analog.com/ '''Analog Devices'''] (AD) – [https://www.analog.com/en/products.html products]<br />
** '''''Linear Technology''''' (LT)<br />
<br />
* [http://www.angstrem.ru/ Angstrem] (Ангстрем) – [http://www.angstrem.ru/ru/catalog products]<br />
* [https://www.atlas-scientific.com/ Atlas Scientific]<br />
* [https://www.baumer.com/ Baumer] – [https://www.baumer.com/us/en/product-overview/c/276 products]<br />
* [https://belfuse.com/ Bel] – [https://belfuse.com/home/Products products]<br />
* [http://www.bkprecision.com/ B&K Precision] – [http://www.bkprecision.com/products.html products]<br />
* [https://www.broadcom.com/ '''Broadcom'''] – [https://www.broadcom.com/products/ products]<br />
* [http://www.delevan.com/ API Delevan]<br />
* [https://www.diodes.com/products/ Diodes] – [https://www.diodes.com/products/ products]<br />
* [https://www.espressif.com/ Espressif] – [https://www.espressif.com/en/products/hardware products] [https://www.espressif.com/en/support/download/documents datasheets]<br />
<br />
* [https://global.kyocera.com/ KYOCERA] – [https://global.kyocera.com/prdct/index.html products]<br />
** [https://www.avx.com/ AVX] – [https://www.avx.com/products/ products]<br />
*** [https://www.abelektronik.com/ AB Elektronik] – [https://www.abelektronik.com/en/products.html products] [https://www.abelektronik.com/en/service/datasheets.html datasheets]<br />
<br />
* [https://www.littelfuse.com/ Littelfuse] – [https://www.littelfuse.com/products.aspx products]<br />
** [http://www.ixys.com/ IXYS] – [http://www.ixys.com/ProductPortfolio.aspx products]<br />
** [http://www.ixysic.com/ IXYS IC]<br />
<br />
* [https://www.infineon.com/ '''Infineon'''] – [https://www.infineon.com/cms/en/product/ products]<br />
* [https://www.intel.com/ '''Intel''']<br />
* [https://www.exar.com/ MaxLinear]<br />
* [https://www.mediatek.com/ '''MediaTek''']<br />
<br />
* [https://www.microchip.com/ '''Microchip'''] – [https://www.microchip.com/products products]<br />
** '''''Atmel'''''<br />
<br />
* [https://www.micron.com/ '''Micron Technology'''] (Micron) – [https://www.micron.com/products products]<br />
** ''Elpida''<br />
<br />
* [http://www.chemi-con.co.jp/e/ Nippon Chemi-Con] – [http://www.chemi-con.co.jp/e/catalog/index.html products]<br />
* [https://www.nxp.com/ '''NXP'''] – [https://www.nxp.com/products:PCPRODCAT products]<br />
<br />
* [https://www.onsemi.com/ '''ON Semiconductor'''] (ON Semi) – [https://www.onsemi.com/PowerSolutions/products.do products]<br />
** ''Fairchild Semiconductor'' (Fairchild)<br />
<br />
* [https://www.panasonic.com/ '''Panasonic''']<br />
* [https://www.qualcomm.com/ '''Qualcomm'''] – [https://www.qualcomm.com/products products]<br />
* [https://www.renesas.com/ '''Renesas'''] – [https://www.renesas.com/us/en/products.html products]<br />
* [http://www.rubycon.com/ Rubycon] – [http://www.rubycon.co.jp/en/products/index.html products]<br />
* [https://www.samsung.com/semiconductor/ '''Samsung''']<br />
* [https://www.sii.co.jp/ Seiko Instruments Inc] (SII)<br />
* [https://www.skhynix.com/ '''SK Hynix'''] – [https://www.skhynix.com/eng/product/productIndex.jsp products]<br />
<br />
* [http://www.skyworksinc.com/ Skyworks] (Skyworks Solutions Inc)<br />
** [http://www.analogictech.com/ AnalogicTech]<br />
<br />
* '''Sony'''<br />
* [https://www.st.com/ '''STMicroelectronics'''] (STM) – [https://www.st.com/content/st_com/en/product-selector-welcome.html products]<br />
* [http://www.taiwanalpha.com/ Taiwan Alpha] (Alpha) – [http://www.taiwanalpha.com/products products]<br />
* [https://www.ti.com/ '''Texas Instruments'''] (TI)<br />
* [https://www.toshiba.co.jp/ '''Toshiba''']<br />
* [https://www.ttelectronics.com/ TT Electronics] – [https://www.ttelectronics.com/products/ products]<br />
* [https://www.vishay.com/ Vishay] – [https://www.vishay.com/products/ products]<br />
* [https://www.westerndigital.com/ Western Digital] (WD) – [https://www.westerndigital.com/products/all-products products]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.arrow.com/ Arrow]<br />
* [https://www.avnet.com/ Avnet] – [https://www.avnet.com/shop/AllProducts products]<br />
* [https://www.digikey.com/ '''Digi-Key''']: Links to vendor’s websites<br />
<br />
* [https://www.premierfarnell.com/ Premier Farnell]<br />
** [https://cpc.farnell.com/ CPC] (Combined Precision Components)<br />
** [https://www.farnell.com/ '''Farnell element14''']: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
** [https://www.newark.com/ Newark element14]<br />
<br />
* [https://www.mouser.com/ '''Mouser''']: Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
* [https://www.onlinecomponents.com/ Online Components]<br />
* [https://www.reichelt.de/ Reichelt]: Self-hosted<br />
<br />
* [http://www.rs-online.com/ RS Components]: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
** [https://www.alliedelec.com/ Allied Electronics & Automation]<br />
<br />
=== Vendor lists ===<br />
<br />
* [https://www.arrow.com/en/manufacturers?tab=showall Arrow]<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/<br />
* https://archive.org/details/ic_datasheets<br />
<br />
== Unsorted list ==<br />
<br />
* [https://www.te.com/ TE Connectivity] (formerly Tyco Electronics)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34324IC datasheets2019-01-25T11:19:52Z<p>PurpleSymphony: /* Sources */</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
'''Bold''' = Well known<br />
<br />
''Italic'' = No longer exists<br />
<br />
* [https://www.4dsystems.com.au/ 4D Systems] – [https://www.4dsystems.com.au/products products]<br />
* [https://www.ablic.com/en/semicon/ ABLIC] – [https://www.ablic.com/en/semicon/products/ products] [https://www.ablic.com/en/semicon/datasheets/ datasheets]<br />
* [http://www.adda.com.tw/ ADDA]<br />
* [https://www.aldinc.com/ '''Advanced Linear Devices'''] (ALD)<br />
* [https://www.alps.com/ ALPS] – [https://www.alps.com/products/ products]<br />
<br />
* [https://www.amphenol.com/ Amphenol]<br />
** [https://www.amphenol-aerospace.com/ Amphenol Aerospace] – [https://www.amphenol-aerospace.com/all datasheets]<br />
** [https://www.amphenol-icc.com/ Amphenol ICC]<br />
<br />
* [https://www.ampleon.com/products.html Ampleon] – [https://www.ampleon.com/ products]<br />
<br />
* [https://www.analog.com/ '''Analog Devices'''] (AD) – [https://www.analog.com/en/products.html products]<br />
** '''''Linear Technology''''' (LT)<br />
<br />
* [http://www.angstrem.ru/ Angstrem] (Ангстрем) – [http://www.angstrem.ru/ru/catalog products]<br />
* [https://www.atlas-scientific.com/ Atlas Scientific]<br />
* [https://www.baumer.com/ Baumer] – [https://www.baumer.com/us/en/product-overview/c/276 products]<br />
* [https://belfuse.com/ Bel] – [https://belfuse.com/home/Products products]<br />
* [http://www.bkprecision.com/ B&K Precision] – [http://www.bkprecision.com/products.html products]<br />
* [https://www.broadcom.com/ '''Broadcom'''] – [https://www.broadcom.com/products/ products]<br />
* [http://www.delevan.com/ API Delevan]<br />
* [https://www.diodes.com/products/ Diodes] – [https://www.diodes.com/products/ products]<br />
* [https://www.espressif.com/ Espressif] – [https://www.espressif.com/en/products/hardware products] [https://www.espressif.com/en/support/download/documents datasheets]<br />
<br />
* [https://global.kyocera.com/ KYOCERA] – [https://global.kyocera.com/prdct/index.html products]<br />
** [https://www.avx.com/ AVX] – [https://www.avx.com/products/ products]<br />
*** [https://www.abelektronik.com/ AB Elektronik] – [https://www.abelektronik.com/en/products.html products] [https://www.abelektronik.com/en/service/datasheets.html datasheets]<br />
<br />
* [https://www.littelfuse.com/ Littelfuse] – [https://www.littelfuse.com/products.aspx products]<br />
** [http://www.ixys.com/ IXYS] – [http://www.ixys.com/ProductPortfolio.aspx products]<br />
** [http://www.ixysic.com/ IXYS IC]<br />
<br />
* [https://www.infineon.com/ '''Infineon'''] – [https://www.infineon.com/cms/en/product/ products]<br />
* [https://www.intel.com/ '''Intel''']<br />
* [https://www.exar.com/ MaxLinear]<br />
* [https://www.mediatek.com/ '''MediaTek''']<br />
<br />
* [https://www.microchip.com/ '''Microchip'''] – [https://www.microchip.com/products products]<br />
** '''''Atmel'''''<br />
<br />
* [https://www.micron.com/ '''Micron Technology'''] (Micron) – [https://www.micron.com/products products]<br />
** ''Elpida''<br />
<br />
* [http://www.chemi-con.co.jp/e/ Nippon Chemi-Con] – [http://www.chemi-con.co.jp/e/catalog/index.html products]<br />
* [https://www.nxp.com/ '''NXP'''] – [https://www.nxp.com/products:PCPRODCAT products]<br />
<br />
* [https://www.onsemi.com/ '''ON Semiconductor'''] (ON Semi) – [https://www.onsemi.com/PowerSolutions/products.do products]<br />
** ''Fairchild Semiconductor'' (Fairchild)<br />
<br />
* [https://www.panasonic.com/ '''Panasonic''']<br />
* [https://www.qualcomm.com/ '''Qualcomm'''] – [https://www.qualcomm.com/products products]<br />
* [https://www.renesas.com/ '''Renesas'''] – [https://www.renesas.com/us/en/products.html products]<br />
* [http://www.rubycon.com/ Rubycon] – [http://www.rubycon.co.jp/en/products/index.html products]<br />
* [https://www.samsung.com/semiconductor/ '''Samsung''']<br />
* [https://www.sii.co.jp/ Seiko Instruments Inc] (SII)<br />
* [https://www.skhynix.com/ '''SK Hynix'''] – [https://www.skhynix.com/eng/product/productIndex.jsp products]<br />
<br />
* [http://www.skyworksinc.com/ Skyworks] (Skyworks Solutions Inc)<br />
** [http://www.analogictech.com/ AnalogicTech]<br />
<br />
* '''Sony'''<br />
* [https://www.st.com/ '''STMicroelectronics'''] (STM) – [https://www.st.com/content/st_com/en/product-selector-welcome.html products]<br />
* [http://www.taiwanalpha.com/ Taiwan Alpha] (Alpha) – [http://www.taiwanalpha.com/products products]<br />
* [https://www.ti.com/ '''Texas Instruments'''] (TI)<br />
* [https://www.toshiba.co.jp/ '''Toshiba''']<br />
* [https://www.ttelectronics.com/ TT Electronics] – [https://www.ttelectronics.com/products/ products]<br />
* [https://www.vishay.com/ Vishay] – [https://www.vishay.com/products/ products]<br />
* [https://www.westerndigital.com/ Western Digital] (WD) – [https://www.westerndigital.com/products/all-products products]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.arrow.com/ Arrow]<br />
* [https://www.avnet.com/ Avnet] – [https://www.avnet.com/shop/AllProducts products]<br />
* [https://www.digikey.com/ '''Digi-Key''']: Links to vendor’s websites<br />
<br />
* [https://www.premierfarnell.com/ Premier Farnell]<br />
** [https://cpc.farnell.com/ CPC] (Combined Precision Components)<br />
** [https://www.farnell.com/ '''Farnell element14''']: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
** [https://www.newark.com/ Newark element14]<br />
<br />
* [https://www.mouser.com/ '''Mouser''']: Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
* [https://www.onlinecomponents.com/ Online Components]<br />
* [https://www.reichelt.de/ Reichelt]: Self-hosted<br />
<br />
* [http://www.rs-online.com/ RS Components]: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
** [https://www.alliedelec.com/ Allied Electronics & Automation]<br />
<br />
=== Vendor lists ===<br />
<br />
* [https://www.arrow.com/en/manufacturers?tab=showall Arrow]<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/<br />
<br />
== Unsorted list ==<br />
<br />
* [https://www.te.com/ TE Connectivity] (formerly Tyco Electronics)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Datasheets&diff=34323Datasheets2019-01-25T10:40:38Z<p>PurpleSymphony: PurpleSymphony moved page Datasheets to IC datasheets: Focus on circuits</p>
<hr />
<div>#REDIRECT [[IC datasheets]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34322IC datasheets2019-01-25T10:40:38Z<p>PurpleSymphony: PurpleSymphony moved page Datasheets to IC datasheets: Focus on circuits</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
'''Bold''' = Well known<br />
<br />
''Italic'' = No longer exists<br />
<br />
* [https://www.4dsystems.com.au/ 4D Systems] – [https://www.4dsystems.com.au/products products]<br />
* [https://www.ablic.com/en/semicon/ ABLIC] – [https://www.ablic.com/en/semicon/products/ products] [https://www.ablic.com/en/semicon/datasheets/ datasheets]<br />
* [http://www.adda.com.tw/ ADDA]<br />
* [https://www.aldinc.com/ '''Advanced Linear Devices'''] (ALD)<br />
* [https://www.alps.com/ ALPS] – [https://www.alps.com/products/ products]<br />
<br />
* [https://www.amphenol.com/ Amphenol]<br />
** [https://www.amphenol-aerospace.com/ Amphenol Aerospace] – [https://www.amphenol-aerospace.com/all datasheets]<br />
** [https://www.amphenol-icc.com/ Amphenol ICC]<br />
<br />
* [https://www.ampleon.com/products.html Ampleon] – [https://www.ampleon.com/ products]<br />
<br />
* [https://www.analog.com/ '''Analog Devices'''] (AD) – [https://www.analog.com/en/products.html products]<br />
** '''''Linear Technology''''' (LT)<br />
<br />
* [http://www.angstrem.ru/ Angstrem] (Ангстрем) – [http://www.angstrem.ru/ru/catalog products]<br />
* [https://www.atlas-scientific.com/ Atlas Scientific]<br />
* [https://www.baumer.com/ Baumer] – [https://www.baumer.com/us/en/product-overview/c/276 products]<br />
* [https://belfuse.com/ Bel] – [https://belfuse.com/home/Products products]<br />
* [http://www.bkprecision.com/ B&K Precision] – [http://www.bkprecision.com/products.html products]<br />
* [https://www.broadcom.com/ '''Broadcom'''] – [https://www.broadcom.com/products/ products]<br />
* [http://www.delevan.com/ API Delevan]<br />
* [https://www.diodes.com/products/ Diodes] – [https://www.diodes.com/products/ products]<br />
* [https://www.espressif.com/ Espressif] – [https://www.espressif.com/en/products/hardware products] [https://www.espressif.com/en/support/download/documents datasheets]<br />
<br />
* [https://global.kyocera.com/ KYOCERA] – [https://global.kyocera.com/prdct/index.html products]<br />
** [https://www.avx.com/ AVX] – [https://www.avx.com/products/ products]<br />
*** [https://www.abelektronik.com/ AB Elektronik] – [https://www.abelektronik.com/en/products.html products] [https://www.abelektronik.com/en/service/datasheets.html datasheets]<br />
<br />
* [https://www.littelfuse.com/ Littelfuse] – [https://www.littelfuse.com/products.aspx products]<br />
** [http://www.ixys.com/ IXYS] – [http://www.ixys.com/ProductPortfolio.aspx products]<br />
** [http://www.ixysic.com/ IXYS IC]<br />
<br />
* [https://www.infineon.com/ '''Infineon'''] – [https://www.infineon.com/cms/en/product/ products]<br />
* [https://www.intel.com/ '''Intel''']<br />
* [https://www.mediatek.com/ '''MediaTek''']<br />
<br />
* [https://www.microchip.com/ '''Microchip'''] – [https://www.microchip.com/products products]<br />
** '''''Atmel'''''<br />
<br />
* [https://www.micron.com/ '''Micron Technology'''] (Micron) – [https://www.micron.com/products products]<br />
* [http://www.chemi-con.co.jp/e/ Nippon Chemi-Con] – [http://www.chemi-con.co.jp/e/catalog/index.html products]<br />
* [https://www.nxp.com/ '''NXP'''] – [https://www.nxp.com/products:PCPRODCAT products]<br />
<br />
* [https://www.onsemi.com/ '''ON Semiconductor'''] (ON Semi) – [https://www.onsemi.com/PowerSolutions/products.do products]<br />
** ''Fairchild Semiconductor'' (Fairchild)<br />
<br />
* [https://www.panasonic.com/ '''Panasonic''']<br />
* [https://www.qualcomm.com/ '''Qualcomm'''] – [https://www.qualcomm.com/products products]<br />
* [https://www.renesas.com/ '''Renesas'''] – [https://www.renesas.com/us/en/products.html products]<br />
* [http://www.rubycon.com/ Rubycon] – [http://www.rubycon.co.jp/en/products/index.html products]<br />
* [https://www.samsung.com/semiconductor/ '''Samsung''']<br />
* [https://www.sii.co.jp/ Seiko Instruments Inc] (SII)<br />
* [https://www.skhynix.com/ '''SK Hynix'''] – [https://www.skhynix.com/eng/product/productIndex.jsp products]<br />
<br />
* [http://www.skyworksinc.com/ Skyworks] (Skyworks Solutions Inc)<br />
** [http://www.analogictech.com/ AnalogicTech]<br />
<br />
* '''Sony'''<br />
* [https://www.st.com/ '''STMicroelectronics'''] (STM) – [https://www.st.com/content/st_com/en/product-selector-welcome.html products]<br />
* [http://www.taiwanalpha.com/ Taiwan Alpha] (Alpha) – [http://www.taiwanalpha.com/products products]<br />
* [https://www.ti.com/ '''Texas Instruments'''] (TI)<br />
* [https://www.toshiba.co.jp/ '''Toshiba''']<br />
* [https://www.ttelectronics.com/ TT Electronics] – [https://www.ttelectronics.com/products/ products]<br />
* [https://www.vishay.com/ Vishay] – [https://www.vishay.com/products/ products]<br />
* [https://www.westerndigital.com/ Western Digital] (WD) – [https://www.westerndigital.com/products/all-products products]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.arrow.com/ Arrow]<br />
* [https://www.avnet.com/ Avnet] – [https://www.avnet.com/shop/AllProducts products]<br />
* [https://www.digikey.com/ '''Digi-Key''']: Links to vendor’s websites<br />
<br />
* [https://www.premierfarnell.com/ Premier Farnell]<br />
** [https://cpc.farnell.com/ CPC] (Combined Precision Components)<br />
** [https://www.farnell.com/ '''Farnell element14''']: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
** [https://www.newark.com/ Newark element14]<br />
<br />
* [https://www.mouser.com/ '''Mouser''']: Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
* [https://www.onlinecomponents.com/ Online Components]<br />
* [https://www.reichelt.de/ Reichelt]: Self-hosted<br />
<br />
* [http://www.rs-online.com/ RS Components]: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
** [https://www.alliedelec.com/ Allied Electronics & Automation]<br />
<br />
=== Vendor lists ===<br />
<br />
* [https://www.arrow.com/en/manufacturers?tab=showall Arrow]<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/<br />
<br />
== Unsorted list ==<br />
<br />
* [https://www.te.com/ TE Connectivity] (formerly Tyco Electronics)</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34202IC datasheets2019-01-20T19:15:42Z<p>PurpleSymphony: /* Vendor lists */</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
Well-known:<br />
<br />
* [https://www.analog.com/en/products.html Analog Devices]<br />
* [https://www.infineon.com/cms/en/product/ Infineon]<br />
* [https://www.microchip.com/products Microchip]<br />
* [https://www.nxp.com/products:PCPRODCAT NXP]<br />
* [https://www.qualcomm.com/ Qualcomm]<br />
* [https://www.st.com/ STMicroelectronics]<br />
* [https://www.ti.com/ Texas Instruments]<br />
<br />
Small:<br />
<br />
* [https://www.4dsystems.com.au/products 4D Systems]<br />
* [https://www.ablic.com/en/semicon/datasheets/?rf=slide ABLIC]<br />
* [http://www.taiwanalpha.com/products Alpha]<br />
* [https://www.alps.com/ ALPS]<br />
* [https://www.ampleon.com/products.html Ampleon]<br />
* [http://www.angstrem.ru/ru/catalog Ангстрем]<br />
* [http://www.delevan.com/web/ API Delevan]<br />
* [https://www.diodes.com/products/ Diodes Inc.]<br />
* [http://www.ixys.com/ProductPortfolio.aspx IXYS]<br />
* [http://www.ixysic.com/Index/index.htm IXYS IC]<br />
* [https://www.onsemi.com/PowerSolutions/products.do ON Semiconductor]<br />
* [https://www.ttelectronics.com/products/ TT Electronics]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.arrow.com/ Arrow]<br />
* [https://www.digikey.com/ Digikey]: Links to vendor’s websites<br />
* [https://www.farnell.com/ Farnell]: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
* [https://www.mouser.de/ Mouser]: Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
* [https://www.reichelt.de/ Reichelt]: Self-hosted<br />
* [https://de.rs-online.com/ RS Components]: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
<br />
=== Vendor lists ===<br />
<br />
* [https://www.arrow.com/en/manufacturers?tab=showall Arrow]<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34150IC datasheets2019-01-16T15:51:49Z<p>PurpleSymphony: /* Distributors */</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
Well-known:<br />
<br />
* [https://www.analog.com/en/products.html Analog Devices]<br />
* [https://www.infineon.com/cms/en/product/ Infineon]<br />
* [https://www.microchip.com/products Microchip]<br />
* [https://www.nxp.com/products:PCPRODCAT NXP]<br />
* [https://www.st.com/ STMicroelectronics]<br />
* [https://www.ti.com/ Texas Instruments]<br />
<br />
Small:<br />
<br />
* [https://www.4dsystems.com.au/product/ 4d systems]<br />
* [https://www.ablic.com/en/semicon/datasheets/?rf=slide ablic]<br />
* [https://www.alps.com/ ALPS]<br />
* [https://www.ampleon.com/products.html Ampleon]<br />
* [https://www.diodes.com/products/ Diodes Inc.]<br />
* [https://www.onsemi.com/PowerSolutions/products.do ON Semiconductor]<br />
* [https://www.ttelectronics.com/products/ TT Electronics]<br />
* [http://www.angstrem.ru/ru/catalog Ангстрем]<br />
* [http://www.delevan.com/web/ API Delevan]<br />
* [http://www.ixys.com/ProductPortfolio.aspx IXYS]<br />
* [http://www.ixysic.com/Index/index.htm IXYS IC]<br />
* [http://www.taiwanalpha.com/products alpha]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.digikey.com/ Digikey]: Links to vendor’s websites<br />
* [http://www.farnell.com/ Farnell]: Self-hosted, uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
* [https://www.mouser.de/ Mouser]: Mix between self-hosted datasheets (modified by Mouser, last page appended to PDF) and external links<br />
* [https://www.reichelt.de/ Reichelt]: Self-hosted<br />
* [https://de.rs-online.com/ RS Components]: Self-hosted, random(?) hex ids, sometimes no datasheet at all<br />
<br />
=== Vendor lists ===<br />
<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34099IC datasheets2019-01-14T15:00:24Z<p>PurpleSymphony: Add more vendors</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
Well-known:<br />
<br />
* [https://www.analog.com/en/products.html Analog Devices]<br />
* [https://www.infineon.com/cms/en/product/ Infineon]<br />
* [https://www.microchip.com/products Microchip]<br />
* [https://www.nxp.com/products:PCPRODCAT NXP]<br />
* [https://www.st.com/ STMicroelectronics]<br />
* [https://www.ti.com/ Texas Instruments]<br />
<br />
Small:<br />
<br />
* [https://www.4dsystems.com.au/product/ 4d systems]<br />
* [https://www.ablic.com/en/semicon/datasheets/?rf=slide ablic]<br />
* [https://www.alps.com/ ALPS]<br />
* [https://www.ampleon.com/products.html Ampleon]<br />
* [https://www.diodes.com/products/ Diodes Inc.]<br />
* [https://www.onsemi.com/PowerSolutions/products.do ON Semiconductor]<br />
* [https://www.ttelectronics.com/products/ TT Electronics]<br />
* [http://www.angstrem.ru/ru/catalog Ангстрем]<br />
* [http://www.delevan.com/web/ API Delevan]<br />
* [http://www.ixys.com/ProductPortfolio.aspx IXYS]<br />
* [http://www.ixysic.com/Index/index.htm IXYS IC]<br />
* [http://www.taiwanalpha.com/products alpha]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.digikey.com/ Digikey]: Links to vendor’s websites<br />
* [http://www.farnell.com/ Farnell]: Uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
* [https://www.mouser.de/ Mouser]<br />
* [https://www.reichelt.de/ Reichelt]<br />
* [https://de.rs-online.com/ RS Components]<br />
<br />
=== Vendor lists ===<br />
<br />
* [https://www.mouser.de/supplierpage/ Mouser]<br />
* [https://my.ecianow.org/eweb/DynamicPage.aspx?Site=ecianow&WebCode=OrgResult&FromSearchControl=Yes ECIA]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=IC_datasheets&diff=34077IC datasheets2019-01-13T08:46:07Z<p>PurpleSymphony: Links</p>
<hr />
<div>Datasheets for electronic components are always endangered, since vendors usually don’t keep them online once a product is EOL.<br />
<br />
== Sources ==<br />
<br />
=== Manufacturers ===<br />
<br />
* [https://www.analog.com/en/products.html Analog Devices]<br />
* [https://www.infineon.com/cms/en/product/ Infineon]<br />
* [https://www.microchip.com/products Microchip]<br />
* [https://www.onsemi.com/PowerSolutions/products.do ON Semiconductor]<br />
* [https://www.st.com/ STMicroelectronics]<br />
* [https://www.ti.com/ Texas Instruments]<br />
<br />
=== Distributors ===<br />
<br />
* [https://www.digikey.com/ Digikey]: Links to vendor’s websites<br />
* [http://www.farnell.com/ Farnell]: Uses sequential ids: <nowiki>http://www.farnell.com/datasheets/<id>.pdf</nowiki><br />
* [https://www.mouser.de/ Mouser]<br />
* [https://www.reichelt.de/ Reichelt]<br />
* [https://de.rs-online.com/ RS Components]<br />
<br />
=== Existing archives ===<br />
<br />
* http://www.htmldatasheet.com/<br />
* http://www.alldatasheet.com/<br />
* https://www.datasheetarchive.com/<br />
* http://www.datasheetcatalog.com/</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=GitHub&diff=32691GitHub2018-11-27T17:02:15Z<p>PurpleSymphony: /* External links */ Software heritage archive</p>
<hr />
<div>{{Infobox project<br />
| title = GitHub<br />
| logo = GitHub_logo.png<br />
| image = GitHub 1303511667338.png<br />
| description = A screen shot of the GitHub home page taken on 2015-11-08<br />
| URL = {{url|1=https://github.com/|2=GitHub}}<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
| irc = getgit<br />
}}<br />
<br />
:''See also [[GitHub Downloads]]''<br />
<br />
'''GitHub''' is a software repository powered by Git. Does not seem to have any site issues, often 24 hours uptime (see [http://status.github.com/ site status]). Looks pretty sunny at the moment, but when disaster strikes it would be a problem archiving the private repositories.<br />
<br />
== Size ==<br />
As of 12th August 2012: 1,963,652 people hosting over 3,460,582 repositories [https://github.com/search?type=Repositories&q=fork%3Atrue 1,117,147 public repositories] are forks, which greatly reduces the amount of data required to archive it.<br />
<br />
As of 22 November 2015: There are 32,000,000 repositories, with a similar fork ratio. Back-of-the-envelope calculations suggest 120TB of data in git repositories.<br />
<br />
As of June 2018, there are 79.6 million public repositories in 137 million repository IDs, indicating that around 42 % of all repositories ever created are private or have been deleted.<br />
<br />
== Acquisition by Microsoft ==<br />
It was [https://www.bloomberg.com/news/articles/2018-06-03/microsoft-is-said-to-have-agreed-to-acquire-coding-site-github reported by Bloomberg] and [https://news.microsoft.com/2018/06/04/microsoft-to-acquire-github-for-7-5-billion/ confirmed on June 4, 2018], that Microsoft bought GitHub for 7.5 billion dollars. On 26th October 2018, the new GitHub CEO, Nat Friedman, [https://blog.github.com/2018-10-26-github-and-microsoft/ announced] that the acquisition was complete.<br />
<br />
A discussion into the feasibility of archiving GitHub has commenced in {{IRC|getgit}}.<br />
* Users in the FOSS community fear Microsoft's "embrace, extend, extinguish" schemes in the 1990s and 2000s and many called for a move to rival [[GitLab]] in the wake of the news.<br />
* [[LinkedIn]] shows how user content can be gradually taken away (by means of paywalls and login walls).<br />
<br />
== Backup tools ==<br />
=== git itself ===<br />
<tt>git clone</tt> is the simplest one (and also works outside of GitHub, obviously). However, it does not get some project data that is not stored in git, including issue reports, comments, pull requests.<br />
<br />
When cloning a repository for archival, it is best to use the <tt>--mirror</tt> option. This mirror will include all branches and even the code associated with pull requests. (Note however that the PR code will get purged eventually by Git's GC when you create a clone from this mirror as the PR commits aren't referenced by any branches, though this can be solved by adding a line like <tt>fetch = +refs/pull/*/head:refs/remotes/origin/pr/*</tt> to the repository config file.)<br />
<br />
To pack a clone/mirror into a single, easily handleable file, use <tt>git bundle create FILE --all</tt> inside the clone/mirror.<br />
<br />
=== Other tools ===<br />
<br />
[https://github-backup.branchable.com/ github-backup] runs in a git repository and chases down that information, committing it to a "github" branch. It also chases down the forks and efficiently downloads them as well.<br />
<br />
[http://www.githubarchive.org/ githubarchive.org] and [http://ghtorrent.org/ GHTorrent] are both creating archives of the GitHub "timeline", that is, all events like git pushes, forks, created issues, pull requests, etc.<br />
<br />
[http://codearchive.org codearchive.org] Effort to backup all the versions of all the repos on GitHub and other sources. [https://speakerdeck.com/filosottile/the-code-archive-hope-xi Slides from a talk about it].<br />
<br />
[https://github.com/josegonzalez/python-github-backup python-github-backup] can backup entire users or organisations and retrieves issues, PRs, labels, milestones, hooks, wikis, gists, and LFS data. It can also grab starred repositories and forks.<br />
<br />
See also [[Software Heritage]].<br />
<br />
== GitHub replacement engines ==<br />
<br />
If we ever have to archive the data out of GitHub, the data will need to be exportable to a GitHub-style engine.<br />
<br />
Currently<sup>[when?]</sup>, the best GitHub-style engine that has a Wiki, issues, Git Repo hosting, and is free and open source to use is [http://gitlab.com GitLab]. The engine is used by and paid for by many major organizations, so it is likely to live on in a stable way. Other popular FOSS alternatives to GitHub include [https://gitea.io/en-US/ Gitea] and [https://gogs.io/ Gogs].<br />
<br />
We will need a complete migration system to move a git repository and all related GitHub service information of a repository to GitLab.<br />
<br />
== Things to scrape ==<br />
<br />
In case of emergency, these are the items we need to grab.<br />
<br />
* Git Repository - Accomplished by github-backup<br />
** Forked Repositories - Accomplished by github-backup<br />
** '''Notes on Commits/Lines of Code''' - Not supported by github-backup yet. GitHub API support exists since ca. 2011.<br />
* '''GitHub Gollum Wiki''' - No tool yet, but just clone the whole thing, and then push it to GitLab.<br />
** The wiki is a full-blown git repository, though only few features are exposed on the user interfaces (e.g. no branches). The clone URL is shown on wiki pages and is <tt>https://github.com/owner/repository.wiki.git</tt>.<br />
* '''Releases''' - Tags on GitHub can have binaries attached. These are of high priority to archive.<br />
* Issues + Comments - Accomplished by github-backup<br />
** '''Milestones''' - ''github-backup currently does not archive this yet.''<br />
** '''Labels''' - ''github-backup currently does not archive this yet.''<br />
* '''Hooks''' - Needs some kind of tool to archive GitHub Hooks<br />
<br />
== Lists of repositories ==<br />
<br />
A list of repositories from GitHub API data are maintained by an archive team member at [https://za3k.com/github/ za3k.com]. It scrapes continuously. Public downloads are updated once a day. This list does not include gists.<br />
<br />
The Internet Archive item {{IA item|github_repository_index_201806}} contains another crawl of the API from June 2018.<br />
<br />
== GithubArchive ==<br />
<br />
The metadata generated by the GitHub API are archived to Google BigQuery every hour by [https://www.githubarchive.org/ GithubArchive]. <br />
<br />
It obviously doesn't grab events '''dating before 2011''', so a targeted repository scrape may still be ideal.<br />
<br />
But at least it could be possible to grab all info about a single repository using Google BigQuery's free version, since it would use a low amount of CPU power. However, we need to create such an export script for it when the time comes.<br />
<br />
== ArchiveTeam archival efforts ==<br />
In June 2018, a discovery warrior project was started based on the current list of repositories. The goal was to obtain the number of watchers, stars, forks, and the origin repository (for forks) for each repository – all information which is not returned by the [https://developer.github.com/v3/repos/#list-all-public-repositories repositories API endpoint] which was used to collect the list – so that a prioritisation of content according to those numbers would be possible. The origin repository is needed for storing forks efficiently: since the original repository and all its forks are usually mostly identical, this can be stored in a single repository instead of one clone per fork, thus storing the shared revisions only once.<br />
<br />
== External links ==<br />
* {{url|1=https://github.com/|2=GitHub}}<br />
* {{url|1=https://archive.softwareheritage.org/|2=Software Heritage Archive}}<br />
<br />
{{Navigation box}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=User:HadeanEon&diff=32181User:HadeanEon2018-11-10T19:33:00Z<p>PurpleSymphony: Note about bot edits</p>
<hr />
<div>Thanks.<br />
<br />
* [[ArchiveGLAM]]<br />
* [[Films and documentaries about archiving]]<br />
* [[Knowledge preservation initiatives]]<br />
* [[List of current heads of state and government]]<br />
* [[Live cams]]<br />
* [[People]]<br />
* [[Wikidata lists]]<br />
<br />
''This account is used for automated bot edits. They are signed with “BOT” in the edit summary. Contact VoynichCr on EFnet in case of malfunction.''<br />
<br />
{{deathwatch}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Yahoo!_Groups&diff=31567Yahoo! Groups2018-10-28T16:08:29Z<p>PurpleSymphony: /* Statistics */ Update stats</p>
<hr />
<div>{{Infobox project<br />
| title = Yahoo! Groups<br />
| url = http://groups.yahoo.com/<br />
| image = groups-yahoo-com.png<br />
| logo = yahoo-groups-logo.png<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
<br />
'''Yahoo! Groups''' is Yahoo's email service; it's the result of the acquisition of eGroups and some other Yahoo! stuff.<br />
<br />
It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)<br />
<br />
== Python Yahoo! Group Archiver == <br />
<br />
The [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] is a Python script which allows a scraping of the group. So far only messages are scraped. It puts all the info and metadata (both rendered message body and raw email) into a Mongo database, and provides a script to dump a static version of the site that can be read off of the filesystem. It works with Neo and with private groups by clunkily using Selenium to do the scraping.<br />
<br />
== Yahoo Group Archiver ==<br />
<br />
Update: Apparently since Yahoo! Groups changed to the neo interface the script no longer functions and is no longer actively maintained.<br />
<br />
<s>The [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver] is a Perl script which allows an export of "the messages (without the attachments), everything from the files section and all the images from the photo section along with their hierarchy on Yahoo". <br />
<br />
It appears that, if you get the "Couldn't get message count" error when trying to use it, the solution is to edit the yahoo2maildir.pl file and replace the bottom line <code>my $url = $HTTP::URI_CLASS->new($redirect, $base)->abs($base);</code> (under the heading <code>sub GetJSRedirect</code>) with <code><nowiki>my $url = "http://groups.yahoo.com/group/$group/messages/$begin_msgid"; </nowiki></code><br />
<br />
More frustratingly, it appears that Yahoo blocks your IP temporarily after hitting some invisible limit of data downloaded (the Archiver will continue to "download" messages for a bit, ending up with a bunch of 0-byte files, then stop completely). It's unknown if there is a solution. <br />
<br />
Also: sometimes, some of the downloaded messages, in the middle of an otherwise normal batch, are 0 in size - almost as if Yahoo blocked your IP for a few seconds, then stopped. Watch out for these so that you can re-download them later.</s><br />
<br />
== Site Structure ==<br />
<br />
There’s a convenient JSON API. May require logging in and joining a group to use all endpoints:<br />
<br />
* Group Information: https://groups.yahoo.com/api/v1/groups/concatenative/<br />
* List of Messages: https://groups.yahoo.com/api/v1/groups/concatenative/messages?count=100<br />
* Specific Message: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/<br />
* Raw Message Content: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw – note that there seems to be a [https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean message encoding problem]<br />
* List of Topics: https://groups.yahoo.com/api/v1/groups/concatenative/topics?count=100<br />
* Specific Topic: https://groups.yahoo.com/api/v1/groups/concatenative/topics/1<br />
* List of Tables: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database<br />
* Specific Table: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/<br />
* Table Content: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records<br />
* List of Files: https://groups.yahoo.com/api/v1/groups/a_furrys_world/files<br />
* List of Attachments: https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments<br />
* List of Polls: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls?count=100<br />
* Specific Poll: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106<br />
* List of Photos: https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos<br />
* List of Albums: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums<br />
* Specific Album: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391<br />
* List Moderators: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/moderators<br />
* Members With Incorrect Emails: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/bouncing<br />
* List of Links: https://groups.yahoo.com/api/v1/groups/a_furrys_world/links<br />
* Search: https://groups.yahoo.com/api/v1/search/groups?offset=0&maxHits=20&sortBy=&query=abcdef – sort can be one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST<br />
* Categories: https://groups.yahoo.com/api/v1/dir/categories/0/?start=0<br />
<br />
Note that all paginated responses are limited to the first 500 results and do not return anything new beyond that.<br />
<br />
== Statistics ==<br />
<br />
As of 2017-07-16 the [https://groups.yahoo.com/neo/dir directory] lists 5599562 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.<br />
<br />
The following graphs are slightly outdated:<br />
<br />
[[File:Yahoo_groups_date_created.png]]<br />
[[File:Yahoo_groups_messages_per_group.png]]<br />
[[File:Yahoo_groups_post_date.png]]<br />
<br />
== Software for backups ==<br />
* [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver], Sourceforge<br />
<br />
== External Links ==<br />
<br />
* https://archive.org/details/yahoo_groups<br />
<br />
== References ==<br />
<br />
<references/><br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Yahoo!]]<br />
[[Category:Mailing lists]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Mozilla_Addons&diff=30911Mozilla Addons2018-08-22T16:55:20Z<p>PurpleSymphony: Add screenshot</p>
<hr />
<div>{{Infobox project<br />
| title = Mozilla Addons<br />
| URL = https://addons.mozilla.org/<br />
| project_status = {{specialcase}}<br />
| archiving_status = {{notsaved}}<br />
| irc = outofammo<br />
| image = Amo_screenshot_2018-08-22.png<br />
}}<br />
<br />
'''Mozilla Addons''', also known as '''AMO''' (from its domain, addons.mozilla.org), is a website run by the Mozilla Foundation which hosts extensions and themes for Firefox, Thunderbird, and other Mozilla software.<br />
<br />
Extensions used to be based on XPI until the introduction of WebExtensions around 2016. Since Firefox 57 and Thunderbird 58, only WebExtensions are supported. XPI-based addons (called "legacy") are deprecated but still supported until the end-of-life of Firefox 52 ESR in September 2018. The legacy addons will be removed from AMO in early October 2018<ref>https://blog.mozilla.org/addons/2017/10/03/legacy-add-on-support-on-firefox-esr/#comment-224382</ref><ref>https://blog.mozilla.org/addons/2018/08/21/timeline-for-disabling-legacy-firefox-add-ons/</ref>.<br />
<br />
== References ==<br />
<references/></div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=File:Amo_screenshot_2018-08-22.png&diff=30910File:Amo screenshot 2018-08-22.png2018-08-22T16:53:40Z<p>PurpleSymphony: addons.mozilla.org</p>
<hr />
<div>addons.mozilla.org</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=The_WARC_Ecosystem&diff=29871The WARC Ecosystem2017-11-13T15:51:05Z<p>PurpleSymphony: /* Tools */</p>
<hr />
<div>Everything about the WARC format and the tools that support it.<br />
<br />
== Information ==<br />
* [[wikipedia:Web_ARChive]]<br />
* {{url|1=https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=4817}} - Contains examples of WARC records<br />
* {{url|1=http://bibnum.bnf.fr/WARC|2=The WARC File Format (ISO 28500) - Information, Maintenance, Drafts}}<br />
* {{url|http://archive-access.sourceforge.net/warc/}} - WARC ISO docs<br />
* {{url|http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml}}<br />
* {{url|1=http://www.netpreserve.org/resources/warc-implementation-guidelines-v1}} <br />
* {{url|1=http://www.netpreserve.org/sites/default/files/resources/WARC_Guidelines_v1.pdf}}<br />
* {{url|http://commoncrawl.org/navigating-the-warc-file-format/}}<br />
* {{url|https://www.taricorp.net/2016/web-history-warc}}<br />
<br />
== Tools ==<br />
<br />
{|class="wikitable"<br />
! name<br />
! license<br />
! lang<br />
! testing<br />
! docs<br />
! # of authors<br />
! description<br />
|-<br />
| [https://www.gnu.org/software/wget/ wget v1.14+]<br />
| GPL v3+ || C<br />
| Has a test suite but does not test any warc functionality<br />
| Man pages, website, blog posts all over the net<br />
| 2+ according to the changelog<br />
| A non-interactive network downloader. wget also generates duplicate record ids in warc files. <br />
More information about flags can be found on the [[Wget with WARC output]] page.<br />
|-<br />
| InternetArchive's [https://github.com/internetarchive/warc warc python library]<br />
| GPL v2 || Python<br />
| looks to have a test suite - https://github.com/internetarchive/warc/blob/master/warc/tests/test_warc.py<br />
| A readme with examples online at http://warc.readthedocs.org/en/latest/<br />
| 3 commiters on github<br />
| library to work with WARC files<br />
|-<br />
| [https://github.com/odie5533/WarcMiddleware WarcMiddleware]<br />
| ISC || Python<br />
| Not enough tests<br />
| A readme file + [http://scrapy.org/ Scrapy docs]<br />
| 1 author<br />
| Mirrors websites and saves the results to a WARC file<br />
|-<br />
| [https://github.com/odie5533/WarcProxy WarcProxy]<br />
| ISC || Python<br />
| NO TEST SUITE<br />
| A readme file<br />
| 1 author<br />
| a simple HTTP proxy that saves all HTTP traffic to a file<br />
|-<br />
| [https://github.com/odie5533/WarcMITMProxy WarcMITMProxy] <br />
| ISC<br />
| Python<br />
| NO TEST SUITE<br />
| A readme file<br />
| 1 author<br />
| HTTPS proxy that saves traffic to a WARC file<br />
|-<br />
| [https://github.com/internetarchive/warctools/ warc-tools] <br />
| MIT License<br />
| Python 2.6<br />
| NO TEST SUITE<br />
| A readme file<br />
| 4 commiters<br />
| warc validator, dump, search, index, convert arc to warc<br />
<br />
The previous versions can be found at https://code.google.com/p/warc-tools/ and http://code.hanzoarchives.com/warc-tools .<br />
<br />
old: http://code.hanzoarchives.com/warc-tools/src/6e1d36297688/hanzo/warcextract.py<br /><br />
new (untested): http://code.hanzoarchives.com/warc-tools/src/fd3b49a7ee22fe4eee0d51dc841af40d4b9d2e1e/warcunpack_ia.py?at=default<br />
|-<br />
| [https://github.com/alard/warc-proxy WARC viewer] <br />
| no license information<br />
| Python<br />
| NO TEST SUITE<br />
| A readme file<br />
| 1 author<br />
| WARC viewer for browsing the contents of a WARC file.<br />
|-<br />
| [https://github.com/alard/megawarc Megawarc] <br />
| no license information<br />
| Python<br />
| NO TEST SUITE<br />
| A readme file<br />
| 1 author<br />
| Merge many small warcs into a large one<br />
<br />
Checks if WARC files can be un-gzipped before adding them to the megawarc. Does not check anything else.<br />
|-<br />
| [https://github.com/alard/warctozip-service warc to zip] <br />
| no license information<br />
| python<br />
| NO TEST SUITE<br />
| A readme file<br />
| 1 author<br />
| An HTTP-based warc-to-zip converter<br />
|-<br />
| [https://github.com/chfoo/warcat warcat] <br />
| GPL v3<br />
| Python 3<br />
| yes<br />
| A readme file.<br />
| 1 author<br />
| warcat concat, extract, list, pass, split, verify warc files<br />
<br />
Install: pip-3 install warcat<br /><br />
Run: python3 -m warcat verify mysite.warc.gz<br />
<br />
https://github.com/internetarchive/ia-web-commons <br />
<br />
https://github.com/internetarchive/ia-hadoop-tools <br />
|-<br />
| [https://github.com/ArchiveTeam/archiveteam-megawarc-factory Archive Team megawarc factory] <br />
| no license information<br />
| Bash shell scripting<br />
| NO TEST SUITE<br />
| A readme file.<br />
| 1 author<br />
| Generates 50gb warc files from existing warc files<br />
<br />
Uploads to archive.org<br />
|-<br />
| [https://github.com/rajbot/CDX-Writer CDX Writer] <br />
| no license information<br />
| python<br />
| Has a test suite<br />
| A readme file.<br />
| 1 author<br />
| Create CDX index files from WARC files.<br />
|-<br />
| [https://webarchive.jira.com/wiki/display/Heritrix/Heritrix Heritrix] <br />
| Apache v2.0<br />
| java<br />
| Has a test suite<br />
| javadoc, website<br />
| many authors<br />
| Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.<br />
|-<br />
| [https://github.com/openplaces/heritrix-cassandra Heritrix-Cassandra] <br />
| ? || ? || ? || ? || ?<br />
| A library for writing Heritrix 3 output directly to Cassandra as records.<br />
|-<br />
| [http://landsbokasafn.github.io/DeDuplicator/ DeDuplicator (Heritrix add-on)]<br />
| GPL v2.1 <br />
| Java <br />
| Very few tests <br />
| [https://landsbokasafn.github.io/DeDuplicator/started.html Getting Started] page.<br />
| 1 author<br />
| The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.<br />
|-<br />
| [https://github.com/gwu-libraries/python-heritrix python-heritrix] <br />
| ? || ? || ? || ? || ?<br />
| A simple wrapper around the Heritrix 3.x API. Developed in April 2012 against Heritrix 3.1.0 at GWU Libraries in Washington, DC, USA.<br />
|-<br />
| [http://warcreate.com/ Chrome/Chromium plugin WARCreate] <br />
| GPL v3<br />
| javascript<br />
| ???<br />
| none<br />
| 1 author<br />
| WARCreate is a Google Chrome extension that allows a user to create a Web ARChive (WARC) file from any browseable webpage. [https://github.com/machawk1/warcreate code repo]<br />
|-<br />
| [https://sbforge.org/display/JWAT/JWAT Java Web Archive Toolkit] <br />
| Apache 2.0<br />
| Java<br />
| Partial Test Suite (check coverage profile)<br />
| Online<br />
| 1 author<br />
| jwattools arc2warc, cdx, compress, decompress, extract, interval, pathindex, test, unpack<br />
<br />
[https://bitbucket.org/nclarkekb/jwat/overview code repo]<br />
|-<br />
| [http://matkelly.com/wail/ WAIL] <br />
| CC-BY-SA<br />
| Python, JS<br />
| ???<br />
| Online<br />
| 1<br />
| Web Archiving Integration Layer (WAIL) is a graphical user interface (GUI) atop multiple web archiving tools intended to be used as an easy way for anyone to preserve and replay web pages.<br />
Tools included and accessible through the GUI are Heritrix 3.1.2, Wayback 1.7, and warc-proxy. Support packages include Apache Tomcat, phantomjs and pyinstaller.<br />
<br />
[https://github.com/machawk1/wail code repo]<br />
|-<br />
| [https://github.com/odie5533/pylibwarc/ pylibwarc] <br />
| ISC License<br />
| Python<br />
| ?<br />
| ?<br />
| 1 author<br />
CDX support<br />
Written by odie5533 which frequents #archiveteam, as another independant WARC library for Python.<br />
|-<br />
| [https://github.com/chfoo/wpull Wpull] <br />
| GPL version 3<br />
| Python 3<br />
| many unit tests (Travis CI registered), simple experimental fuzzer<br />
| a quick start readme, brief usage overview, good docstrings coverage<br />
| 1 core author<br />
| Wget-compatible web downloader.<br />
<br />
Beta quality. Lua/Python scripting. PhantomJS (experimental). Used by [[ArchiveBot]].<br />
|-<br />
| [https://ludios.org/grab-site/ grab-site] <br />
| MIT<br />
| Python 3<br />
| no<br />
| readme<br />
| 1 core author<br />
| wpull launcher with the dashboard and ignore patterns from ArchiveBot<br />
|-<br />
| [https://github.com/ikreymer/pywb pywb]<br />
| GPL version 3<br />
| Python 2<br />
| yes<br />
| readme and wiki<br />
| 1 core author<br />
| A full-fledged Python reimplementation of Wayback Machine web archive replay capabilities. Also provides a live rewriting proxy.<br />
|-<br />
| [https://github.com/ikreymer/pywb-webrecorder pywb-webrecorder]<br />
| MIT<br />
| Python 2<br />
| no<br />
| readme<br />
| 1 core author<br />
| An experimental/demo integration of pywb + warcprox to allow live recording to WARC. Allows instant replay of recorded content from WARC.<br />
|-<br />
| [https://github.com/ikreymer/webarchiveplayer webarchiveplayer]<br />
| GPL version 3<br />
| Python 2<br />
| not yet, though most testable functionality in pywb<br />
| readme<br />
| 1 core author<br />
| Point-and-click wrapper for Windows and OS X for browsing WARC files. Shows a basic file open dialog to select a WARC(s), then<br />
starts a server and opens a browser. Also determines HTML pages within a WARC. Built on top of pywb. In beta at the moment (early 2015).<br />
|-<br />
| [https://github.com/lintool/warcbase warcbase]<br />
| Apache License, Version 2.0<br />
| Scala<br />
| ?<br />
| [http://lintool.github.io/warcbase-docs/ yes]<br />
| team of more than 4 researchers at the University of Waterloo<br />
| platform for managing web archives built on Hadoop and HBase.<br />
|- <br />
| [https://github.com/helgeho/ArchiveSpark ArchiveSpark]<br />
| MIT License<br />
| Scala<br />
| ?<br />
| ?<br />
| 2 authors<br />
| Apache Spark framework that facilitates access to Web Archives<br />
|-<br />
| [https://github.com/webrecorder/webrecorderplayer-electron/ Webrecorder Player]<br />
| ?<br />
| JavaScript<br />
| ?<br />
| ?<br />
| ?<br />
| Desktop app for viewing high-fidelity web archives (WARC, HAR and ARC) on a local machine, no internet connection required. Particularly useful for social media, dynamic content. Supports OSX, Windows and Linux (experimental). Related to https://webrecorder.io/<br />
|-<br />
| [https://github.com/webrecorder/warcio warcio]<br />
| Apache 2.0<br />
| Python 2.7+/3.3+<br />
| yes<br />
| README<br />
| 7 contributors<br />
| WARC writer library<br />
|-<br />
! name<br />
! license<br />
! lang<br />
! testing<br />
! docs<br />
! # of authors<br />
! description<br />
|}<br />
<br />
== Deprecated ==<br />
* https://code.google.com/p/warc-tools/ - Old, discontinued shit<br />
* https://github.com/internetarchive/archive-commons - split into 2 new repos: ia-web-commons & ia-hadoop-tools<br />
<br />
== The WARC format ==<br />
<br />
A .warc file is usually a group of one or more WARC records. The first record usually describes the records to follow.<br />
<br />
Compression is optional. If used, each record is compressed via gzip. A gzip file supports multiple "members"; compressed warcs end in .warc.gz. According to the guidelines, WARC files should top out at 1 gb.<br />
<br />
=== WARC record ===<br />
* header <br />
* content block <br />
* two newlines<br />
<br />
=== WARC record header ===<br />
The beginning of a WARC record, consisting of one first line declaring the record to be in the WARC format with a given version number, followed by lines of named fields up to a blank line. The WARC record header format largely follows the tradition of HTTP/1.1 [RFC2616] and [RFC2822] headers, with one major exception, allowing UTF-8 [RFC3629].<br />
<br />
<br />
Example of a 'request' record header:<br />
WARC/1.0<br />
WARC-Type: request<br />
WARC-Target-URI: http://xbox.gamespy.com/<br />
Content-Type: application/http;msgtype=request<br />
WARC-Date: 2013-04-02T16:12:40Z<br />
WARC-Record-ID: <urn:uuid:08d9edb9-0ab8-4352-ba56-6cbbd590f34f><br />
WARC-IP-Address: 213.248.112.146<br />
WARC-Warcinfo-ID: <urn:uuid:2b6ad3f1-efab-4e37-8faa-fc8ad112692f><br />
WARC-Block-Digest: sha1:T6PJSZTTP7HBNA6OFZACXAFK25GGLVT4<br />
Content-Length: 150<br />
<br />
=== WARC named fields ===<br />
* A set of elements consisting of a name, a colon, and a value, with long values continued on indented lines.<br />
* Named fields may appear in any order. <br />
* Field values may contain any UTF-8 character.<br />
* The 'encoded-word' mechanism of [RFC2047] may also be used when writing WARC fields and shall also be understood by WARC reading software.<br />
<br />
==== Defined field names ====<br />
; WARC-Type : ''required'', can be one of 'warcinfo', 'response', 'resource', 'request', 'metadata', 'revisit', 'conversion', or 'continuation'<br />
; WARC-Record-ID : ''required'', unique ID, as a URI<br />
; WARC-Date : ''required''<br />
; Content-Length : ''required''<br />
; Content-Type : mime type<br />
; WARC-Concurrent-To : ''repeatable'', WARC-Record-IDs associated with this one <br />
; WARC-Block-Digest : ''optional'', hash of the whole record<br />
; WARC-Payload-Digest : ''optional'', hash of the just the payload<br />
; WARC-IP-Address : where the record was gotten from<br />
; WARC-Refers-To : previous WARC-Record-ID this relates to<br />
; WARC-Target-URI : the URL asked for<br />
; WARC-Truncated : why only part of the content was gotten<br />
; WARC-Warcinfo-ID : WARC-Record-ID of the associated high-level metadata record<br />
; WARC-Filename : ''warcinfo only'', the expected name of the file containing this record<br />
; WARC-Profile : ''revisit only'', the way revisiting was handled, as a URI<br />
; WARC-Identified-Payload-Type : a independently verified mime type of the payload (i.e. not just what it claims to be)<br />
; WARC-Segment-Origin-ID : ''continuation only''<br />
; WARC-Segment-Number : <br />
; WARC-Segment-Total-Length : ''continuation only''<br />
<br />
=== WARC content block ===<br />
Part (zero or more octets) of a WARC record that follows the header and that forms the main body of a<br />
WARC record.<br />
<br />
== CDX File Format ==<br />
<br />
* http://archive.org/web/researcher/cdx_legend.php<br />
* https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server -- How to query IA's CDX server<br />
<br />
Example of generating a list of URLs in a MegaWARC:<br />
curl -sL 'https://archive.org/download/archiveteam_zapd_20131016071259/zapd_20131016071259.megawarc.warc.os.cdx.gz' \<br />
| gunzip -c | cut -f3 -d' '<br />
<br />
Example of getting a list of all the URLs in the Wayback Machine with a given prefix:<br />
curl 'http://web.archive.org/cdx/search/cdx?fl=statuscode,timestamp,original&collapse=urlkey&matchType=prefix&url=http://www.conchord.org'<br />
<br />
[[Category:Tools]]<br />
<br />
{{Navigation box}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Miiverse&diff=29679Miiverse2017-08-30T07:07:41Z<p>PurpleSymphony: Add IRC</p>
<hr />
<div>{{Infobox project<br />
| title = Miiverse<br />
| logo = miiverselogo.png<br />
| image = miiversehomepage.png<br />
| URL = http://miiverse.nintendo.net<br />
| project_status = {{endangered}}<br />
| archiving_status = {{nosavedyet}}<br />
| irc = miiworse<br />
}}<br />
'''Miiverse''' is Nintendo's social network for the Wii U and 3DS. It has a website, as well as clients on Wii U and 3DS which are restricted web browsers.<br />
<br />
== Vital Signs ==<br />
<br />
<s>Seems to be stable.</s> Miiverse will not be available on the Nintendo Switch and may shut down soon after its release.<br />
<br />
{{url|https://twitter.com/ShinyQuagsire/status/887123601056350209|A recent system update to the Wii U contained a message saying the service has ended}}<br />
<br />
{{url|https://www.nintendo.co.uk/News/2017/August/Important-information-about-the-discontinuation-of-the-Miiverse-service-1261237.html|Nintendo officially confirmed that the service would end on "Wednesday, 08/11/2017 at 06:00 GMT" in an article released on their UK site August 29th, 2017}}, as well as {{url|http://en-americas-support.nintendo.com/app/answers/detail/a_id/27324|their US}} and {{url|https://www.nintendo.co.jp/support/information/2017/0829_miiverse.html|Japanese sites}}. They provided an ability to request a download of your posts and albums, but it wouldn't include your comments or comments made on your posts. It was also apparently going to be delayed until after the shutdown, and "Some posts or screenshots may not be included in your downloadable post history due to their status at the time the service ends." [https://miiverse.nintendo.net/settings/download_request The link was not accessible to users who haven't logged in with a Nintendo Network ID.] [http://i.imgur.com/VNKTzou.png A screenshot taken by Miiverse user Wertercatt/Archive Team user Powerkitten is available]<br />
<br />
== Site structure ==<br />
*https://miiverse.nintendo.net/users/Wertercatt (Wertercatt's User Page (username is not case sensitive))<br />
*https://miiverse.nintendo.net/users/Wertercatt/posts?page_param=%7B%22per_page%22%3A%22250%22%7D (Wertercatt's Posts, autopagerized)<br />
*https://miiverse.nintendo.net/users/Wertercatt/empathies?page_param=%7B%22per_page%22%3A%2225%22%7D (Wertercatts's "Yeahs," or liked posts/comments, autopagerized)<br />
*https://miiverse.nintendo.net/users/Wertercatt/diary (Wertercatt's Play Journal Entries, autopagerized)<br />
*https://miiverse.nintendo.net/users/Wertercatt/album (Wertercatt's Screenshot Album. Most user's albums can not be seen as they are "not a public album.")<br />
*https://miiverse.nintendo.net/users/Wertercatt/friends (Wertercatt's Friends. This is not autopagerized and will load all 100 or less friends a user has.)<br />
*https://miiverse.nintendo.net/users/Wertercatt/following (The users which Wertercatt is following. autopagerized)<br />
*https://miiverse.nintendo.net/users/Wertercatt/followers (Users that are following Wertercatt. autopagerized)<br />
*https://miiverse.nintendo.net/posts/AYMHAAACAAADVHkJ-Xncng (A Post.)<br />
*https://miiverse.nintendo.net/posts/AYMHAAACAAADVHkJ-Xncng/replies?page_param=%7B%22per_page%22%3A%22948%22%7D (Comments on a post. Pagerized and pretty broken looking.)<br />
*https://d3esbfg30x759i.cloudfront.net/ss/WVW69kgcmBQ2Z_kphX (A Screenshot)<br />
*https://d3esbfg30x759i.cloudfront.net/pap/WVW69isW3JMTTmz69D (A Drawing)<br />
*https://miiverse.nintendo.net/replies/AYMHAAADAAB2V0gBDDnwMQ (A comment on a post)<br />
*https://miiverse.nintendo.net/titles/14866558073092847604 (group of similar communities)<br />
*https://miiverse.nintendo.net/titles/14866558073092847604/14866558073092847609?page_param=%7B%22per_page%22%3A%221970%22%7D (community)<br />
*https://miiverse.nintendo.net/titles/14866558073092847604/14866558073092847609/old?page_param=%7B%22per_page%22%3A%221970%22%7D (community posts, pre-redesign, autopagerized)<br />
*https://miiverse.nintendo.net/titles/14866558073092847604/14866558073092847609/diary?page_param=%7B%22per_page%22%3A%222000%22%7D (community posts, play journal, autopagerized)<br />
*https://miiverse.nintendo.net/titles/14866558073092847604/14866558073092847609/artwork?page_param=%7B%22per_page%22%3A%222000%22%7D (community posts, drawings, autopagerized)<br />
*https://miiverse.nintendo.net/titles/14866558073092847604/14866558073092847609/topic?page_param=%7B%22per_page%22%3A%222000%22%7D (community posts, discussions, autopagerized)<br />
*https://miiverse.nintendo.net/titles/14866558073104632926/14866558073104632942/in_game?page_param=%7B%22per_page%22%3A%222000%22%7D (community posts, In-Game, autopagerized)<br />
*https://miiverse.nintendo.net/warawara/14866558073037299863 (posts that appear in warawara plaza)<br />
<br />
Useful URL strings:<br />
*<nowiki>?page_param={"per_page":"2000","order":"desc"}</nowiki> Control the number of posts returned. URLs unfortunately have limits to what you can return at once. See the example URLs here for the maximums. If you put in a per_page value over the maximum for a URL you won't be able to scroll farther to load another batch of posts, so putting the proper maximum in (instead of just a bunch of 9s) is a requirement to be effective.<br />
*<nowiki>?page_param={"per_page":"2000","order":"asc"}</nowiki> Send to get the results in ascending order instead. Scrolling this is buggy, as any additional pages of results you load will be in descending order. We probably shouldn't use this.<br />
*<nowiki>?offset=20</nowiki>Set an offset to keep going beyond the maximum post count. Seems to be limited to /empathies<br />
<br />
Misc Oddities:<br />
<br />
*https://miiverse.nintendo.net/posts/AYMHAAACAAADVHjJ7DPwSg (Posts can apparently be posted in no communities and only appear on the user's /posts page?)<br />
*https://miiverse.nintendo.net/users/x_The_Storm_x ("This user is hidden")<br />
<br />
{{Navigation box}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Tumblr&diff=29668Tumblr2017-08-29T06:42:22Z<p>PurpleSymphony: Add IRC</p>
<hr />
<div>{{Infobox project<br />
| title = Tumblr<br />
| logo = Tumblr on white.png<br />
| image = Tumblr_staff_blog.png<br />
| URL = <nowiki>http://www.tumblr.com/</nowiki><br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
| source = https://github.com/ArchiveTeam/tumblr-grab<br />
| irc = tumbledown<br />
}}<br />
<br />
[[Image:Yahoobuystumblr.gif]]<br />
<br />
'''Tumblr''' is a social networking microblog.<br />
<br />
[[Yahoo!]] has purchased Tumblr for 1.1 billion dollars. Tumblr allegedly [http://blogs.wsj.com/digits/2014/10/21/yahoo-tumblr-to-make-over-100-million-in-revenue-next-year/ doubled in number of blogs in 2014] will become profitable in 2015.<br />
<br />
In December 2015, Yahoo put their Tumblr service into the "decide on" category in their Action Plan, according to their [http://www.wsj.com/public/resources/documents/yahoopresentation.pdf 2015 shareholder presentation].<br />
<br />
In June 2017, Tumblr tightened up "Safe mode", which limits "sensitive content" to all users below 18 years old and the viewing of blogs marked as explicit, potentially causing a major moveaway from Tumblr due to Internet Backdraft from it's users. Given Yahoo's tedency to ax things that become less popular than expected, it might be important to keep an eye out for it.<br />
<br />
== Quirks ==<br />
Users can change their account names into the format used for deleted accounts. Specifically, USERNAME-deactivated-[Any amount of digits, 0-9]. Users who do this are unaccessible via their main account page, or directly linked to posts. Their posts will still show up in searches, and their "archive" url will work. This doesn't seem to have an effect on the API, and tumblr-utils will still work just fine. For an example of this tomfoolery, see [http://diediedie3344-deactivated-204913.tumblr.com/archive the archive page of user "diediedie3344-deactivated-204913"].<br />
<br />
Another quirk is that tumblr accounts that appear to be on a different domain name are still accessible at, and show up in searches as, their account name. Trying to go to any page on the accountname.tumblr.com end redirects you to the same page on the custom-url-here.com page. For an example of this behavior, see [http://homosethsual.tumblr.com user "homosethsual"] which redirects to [http://ranpos.star.is/ ranpos.star.is]<br />
<br />
As of 30th of July 2017<s>,it is no longer possible to access NSFW accounts outside of http://tumblr.com/blog/<name> URLs. Attempting to access an NSFW account normally will now cause infinite redirecting.</s> NSFW marked Tumblrs are inaccessible to signed out users.<br />
<br />
== See also ==<br />
* [http://sourceforge.net/projects/gettumblrpics/ gettumblrpics], simple script to download images from a tumblr feed as they appear in it<br />
* [https://github.com/bbolli/tumblr-utils/ tumblr-utils], tumblr_backup.py can make a local backup of posts (XML default), video, audio and images.<br />
* [https://github.com/woodenphone/tumblrsagi Tumblrsagi], Code to grab blogs from the API and stuff them into a database for rehosting, used by [https://t.archive.horse/blogs this tumblr archive]<br />
* [http://soup.io] can automatically mirror the contents of a tumblr blog as they are posted, which may be useful for maintaining an offsite-copy which can be archived later.<br />
* [https://www.jzab.de/content/tumblthree TumblThree], Can archive an entire blog by feeding it an URL, including asks, text posts and reblogs to XML format and can download all images. [https://github.com/johanneszab/TumblThree/releases/latest Downloadable here.] Windows only until the dev implements mono support.<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Microblogging services]] [[Category:Yahoo!]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=ArchiveBot&diff=29616ArchiveBot2017-08-06T07:16:58Z<p>PurpleSymphony: /* Volunteer a Node */</p>
<hr />
<div>[[File:Librarianmotoko.jpg|200px|right|thumb|Imagine Motoko Kusanagi as an archivist.]]<br />
<br />
'''ArchiveBot''' is an [[IRC]] bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, [[Wget_with_WARC_output|records it in a WARC]], and then uploads that WARC to ArchiveTeam servers for eventual injection into the [https://archive.org/search.php?query=collection%3Aarchivebot&sort=-publicdate Internet Archive] (or other archive sites).<br />
<br />
== Details ==<br />
<br />
To use ArchiveBot, drop by [http://chat.efnet.org:9090/?nick=&channels=%23archivebot&Login=Login '''#archivebot'''] on EFNet. To interact with ArchiveBot, you [http://archivebot.readthedocs.org/en/latest/commands.html issue '''commands'''] by typing it into the channel. Note you will need channel operator (<code>@</code>) or voice (<code>+</code>) permissions in order to issue archiving jobs; please ask for assistance or leave a message describing the website you want to archive. <br />
<br />
The [http://dashboard.at.ninjawedding.org/3 '''dashboard'''] shows the sites being downloaded currently. The [http://archivebot.at.ninjawedding.org:4567/pipelines pipeline monitor station] shows the status of deployed instances of crawlers. The [http://archive.fart.website/archivebot/viewer/ viewer] assists in browsing and searching archives.<br />
<br />
Follow [https://twitter.com/archivebot @ArchiveBot] on [[Twitter]]!<ref>Formerly known as [https://twitter.com/atarchivebot @ATArchiveBot]</ref><br />
<br />
=== Components ===<br />
<br />
IRC interface<br />
:The bot listens for commands and reports back status on the IRC channel. You can ask it to archive a website or webpage, check whether the URL has been saved, change the delay time between request, or add some ignore rules. This IRC interface is collaborative meaning anyone with permission can adjust the parameter of jobs. Note that the bot isn't a chat bot so it will ignore you if it doesn't understand a command.<br />
<br />
Dashboard<br />
:The dashboard displays the URLs being downloaded. Each URL line in the dashboard is categorized into successes, warnings, and errors. It will be highlighted in yellow or red. It also provides RSS feeds.<br />
<br />
Backend<br />
:The backend contains the database of jobs and several maintenance tasks such as trimming logs and posting Tweets on Twitter. The backend is the centralized portion of ArchiveBot.<br />
<br />
Crawler<br />
:The crawler will download and spider the website into WARC files. The crawler is the distributed portion of ArchiveBot. Volunteers run nodes connected to the backend. The backend will tell the nodes what jobs to run. Once the node has finished, it reports back to the backend and uploads the WARC files to the staging server. This process is handled by a supervisor script called a pipeline.<br />
<br />
Staging server<br />
:The staging server is the place where all the WARC files are uploaded temporary. Once the current batch has been approved, it will be uploaded to the Internet Archive for consumption by the Wayback Machine.<br />
<br />
ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot. [[Dev|Contributions welcomed]]! Any issues or feature requests may be filed at [https://github.com/ArchiveTeam/ArchiveBot/issues the issue tracker]. <br />
<br />
=== People ===<br />
<br />
The IRC bot, backend and dashboard is operated by [[User:yipdw|yipdw]]. The staging server is operated by [[User:jscott|SketchCow]]. The crawlers are operated by various people.<br />
<br />
== Volunteer a Node ==<br />
'''Note''': New nodes are not being accepted right now. (as of August 2017<ref>http://archive.fart.website/bin/irclogger_log/archiveteam?date=2017-08-06,Sun&sel=53#l49</ref>)<br />
<br />
If you have a machine with <br />
<br />
* lots of disk space (40 GB minimum / 200 GB recommended / 500 GB atypical)<br />
* 512 MB RAM (2 GB recommended, 2 GB swap recommended)<br />
* 10 mbps upload/download speeds (100 mbps recommended)<br />
* long-term availability (2 months minimum)<br />
* unrestricted internet accesses (absolutely no firewall/proxies/censorship/ISP-injected-ads/DNS-redirection/free-cafe-wifi)<br />
<br />
and would like to volunteer, please review the [https://github.com/ArchiveTeam/ArchiveBot/blob/master/INSTALL.pipeline Pipeline Install] instructions and contact [[User:yipdw|yipdw]].<br />
<br />
=== Installation ===<br />
<br />
Installing the ArchiveBot can be difficult.<br />
<br />
But there is a [https://github.com/ArchiveTeam/ArchiveBot/blob/master/.travis.yml Travis.yml automated install script] for [https://travis-ci.org/ArchiveTeam/ArchiveBot Travis-cl] that is designed to test the ArchiveBot. <br />
<br />
Since it's good enough for testing... it's good enough for installation, right? There must be a way to convert it into an installer script.<br />
<br />
== Disclaimers ==<br />
<br />
# Everything is provided on a best-effort basis; nothing is guaranteed to work. (We're volunteers, not a support team.)<br />
# We can decide to stop a job or ban a user if a job is deemed unnecessary. (We don't want to run up operator bandwidth bills and waste Internet Archive donations on costs.)<br />
# We're not Internet Archive. (We do what we want.)<br />
# We're not the Wayback Machine. Specifically, we are not <code>ia_archiver</code> or <code>archive.org_bot</code>. (We don't run crawlers on behalf of other crawlers.)<br />
<br />
Occasionally, we had to ban blocks of IP addresses from the channel. If you think a ban does not apply to you but cannot join the #archivebot channel, please join the main #archiveteam channel instead.<br />
<br />
== Bad Behavior ==<br />
<br />
If you are a website operator and you notice ArchiveBot misbehaving, please contact us on #archivebot or #archiveteam on EFnet (see top of page for links).<br />
<br />
ArchiveBot understands [[robots.txt]] (please read the article) but does not match any directives. It uses it for discovering more links such as sitemaps however.<br />
<br />
Also, please remember that '''we are not the [[Internet Archive|Internet Archive]]'''.<br />
<br />
== More ==<br />
<br />
Like ArchiveBot? Check out our [[Main_Page|homepage]] and other [[projects]]!<br />
<br />
== Notes ==<br />
<br />
<references/><br />
<br />
{{navigation_box}}</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=File:Yahoo_groups_post_date.png&diff=29610File:Yahoo groups post date.png2017-07-30T07:52:39Z<p>PurpleSymphony: PurpleSymphony uploaded a new version of File:Yahoo groups post date.png</p>
<hr />
<div>Yahoo Groups: Messages grouped by post date</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=File:Yahoo_groups_messages_per_group.png&diff=29609File:Yahoo groups messages per group.png2017-07-30T07:52:09Z<p>PurpleSymphony: PurpleSymphony uploaded a new version of File:Yahoo groups messages per group.png</p>
<hr />
<div>Yahoo Groups: Messages per group for about 2000 groups retrieved.</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Yahoo!_Groups&diff=29531Yahoo! Groups2017-07-16T06:37:37Z<p>PurpleSymphony: /* Statistics */ Update</p>
<hr />
<div>{{Infobox project<br />
| title = Yahoo! Groups<br />
| url = http://groups.yahoo.com/<br />
| image = groups-yahoo-com.png<br />
| logo = yahoo-groups-logo.png<br />
| project_status = {{online}}<br />
| archiving_status = {{inprogress}}<br />
}}<br />
<br />
'''Yahoo! Groups''' is Yahoo's email service; it's the result of the acquisition of eGroups and some other Yahoo! stuff.<br />
<br />
It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)<br />
<br />
== Python Yahoo! Group Archiver == <br />
<br />
The [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] is a Python script which allows a scraping of the group. So far only messages are scraped. It puts all the info and metadata (both rendered message body and raw email) into a Mongo database, and provides a script to dump a static version of the site that can be read off of the filesystem. It works with Neo and with private groups by clunkily using Selenium to do the scraping.<br />
<br />
== Yahoo Group Archiver ==<br />
<br />
The [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver] is a Perl script which allows an export of "the messages (without the attachments), everything from the files section and all the images from the photo section along with their hierarchy on Yahoo". <br />
<br />
It appears that, if you get the "Couldn't get message count" error when trying to use it, the solution is to edit the yahoo2maildir.pl file and replace the bottom line <code>my $url = $HTTP::URI_CLASS->new($redirect, $base)->abs($base);</code> (under the heading <code>sub GetJSRedirect</code>) with <code><nowiki>my $url = "http://groups.yahoo.com/group/$group/messages/$begin_msgid"; </nowiki></code><br />
<br />
More frustratingly, it appears that Yahoo blocks your IP temporarily after hitting some invisible limit of data downloaded (the Archiver will continue to "download" messages for a bit, ending up with a bunch of 0-byte files, then stop completely). It's unknown if there is a solution. <br />
<br />
Also: sometimes, some of the downloaded messages, in the middle of an otherwise normal batch, are 0 in size - almost as if Yahoo blocked your IP for a few seconds, then stopped. Watch out for these so that you can re-download them later. <br />
<br />
== Site Structure ==<br />
<br />
There’s a convenient JSON API. May require logging in and joining a group to use all endpoints:<br />
<br />
* Group Information: https://groups.yahoo.com/api/v1/groups/concatenative/<br />
* List of Messages: https://groups.yahoo.com/api/v1/groups/concatenative/messages?count=100<br />
* Specific Message: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/<br />
* Raw Message Content: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw – note that there seems to be a [https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean message encoding problem]<br />
* List of Topics: https://groups.yahoo.com/api/v1/groups/concatenative/topics?count=100<br />
* Specific Topic: https://groups.yahoo.com/api/v1/groups/concatenative/topics/1<br />
* List of Tables: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database<br />
* Specific Table: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/<br />
* Table Content: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records<br />
* List of Files: https://groups.yahoo.com/api/v1/groups/a_furrys_world/files<br />
* List of Attachments: https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments<br />
* List of Polls: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls?count=100<br />
* Specific Poll: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106<br />
* List of Photos: https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos<br />
* List of Albums: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums<br />
* Specific Album: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391<br />
* List Moderators: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/moderators<br />
* Members With Incorrect Emails: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/bouncing<br />
* List of Links: https://groups.yahoo.com/api/v1/groups/a_furrys_world/links<br />
* Search: https://groups.yahoo.com/api/v1/search/groups?offset=0&maxHits=20&sortBy=&query=abcdef – sort can be one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST<br />
* Categories: https://groups.yahoo.com/api/v1/dir/categories/0/?start=0<br />
<br />
Note that all paginated responses are limited to the first 500 results and do not return anything new beyond that.<br />
<br />
== Statistics ==<br />
<br />
As of 2017-07-16 the [https://groups.yahoo.com/neo/dir directory] lists 5599562 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 6.2 billion messages (~4000 messages per group on average so far). 1 billion messages (17%) have been archived already.<br />
<br />
The following graphs are slightly outdated:<br />
<br />
[[File:Yahoo_groups_date_created.png]]<br />
[[File:Yahoo_groups_messages_per_group.png]]<br />
[[File:Yahoo_groups_post_date.png]]<br />
<br />
== Software for backups ==<br />
* [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver], Sourceforge<br />
<br />
== External Links ==<br />
<br />
* https://archive.org/details/yahoo_groups<br />
<br />
== References ==<br />
<br />
<references/><br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Yahoo!]]<br />
[[Category:Mailing lists]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Flickr&diff=26795Flickr2017-01-13T19:41:58Z<p>PurpleSymphony: /* API */ Add dump</p>
<hr />
<div>{{Infobox project<br />
| title = Flickr<br />
| image = Flickr.png<br />
| description = Flickr mainpage in 2010-12-20<br />
| URL = http://www.flickr.com<br />
| project_status = {{online}}<br />
| archiving_status = {{upcoming}}<br />
| tracker = [http://tracker.archiveteam.org/flickr flickr]<br />
| source = [https://github.com/ArchiveTeam/flickr-grab flickr-grab]<br />
| irc = flickrfckr<br />
}}<br />
'''Flickr''' is an image and video hosting website currently owned by [[Yahoo!]]. Any data uploaded can be private or public.<br />
<br />
== Backup Tools ==<br />
<br />
* [http://sunkencity.org/flickredit Flickr Edit] includes Flickr Backup<br />
* [http://hsivonen.iki.fi/photobackup/ Photo and Metadata Backup for Flickr]<br />
* This [http://hivelogic.com/articles/view/backing-up-flickr blog post] documents the [http://github.com/dan/hivelogic-flickrtouchr/tree/master flickrtouchr Flickr backup tool]. It backs up the full sized versions of your images but it doesn't currently backup metadata.<br />
* This [http://tiagovaz.org/posts/flickr_backup_tool_in_Debian/ blog post] documents the [https://github.com/tiagovaz/flickrbackup flickrbackup tool]. Based on flickrtouchr, it also backs up flickr metadata, storing it as EXIF tags within the images.<br />
* http://straup.github.com/parallel-flickr/<br />
* [https://github.com/markdoliner/flickrmirrorer flickrmirrorer] is a small command-line python script that creates a local backup of your Flickr data. It mirrors images, titles, description, tags, albums and collections. It does not backup comments. It also does not preserve order of photos in photostream (though you can emulate it sorting by upload timestamp) and order of albums in collections.<br />
* [https://github.com/drtoast/flickr-backup flickr-backup] backups Flickr photosets, photos, and JSON metadata. Pagination is not handled yet, so if you have any sets with more than 500 photos or comments they won't all be downloaded. <br />
* [https://github.com/Flimm/backup-all-my-flickr-photos backup-all-my-flickr-photos] downloads all your Flickr photos and videos.<br />
* [https://github.com/lutana-de/easyflickrbackup easyflickrbackup] backup all your photos from Flickr.<br />
* If you have your iPhoto connected with your Flickr account, you can back up your pictures on Flickr through iPhoto.<br />
<br />
== Vital Signs ==<br />
<br />
Seems to be stable, can probably change at any time.<br />
<br />
May 2013 update: [http://www.flickr.com/help/forum/en-us/72157633547442506/ new interface]; apparently very overloaded and masses of pro users abandoning the place, deleting their profiles and/or wishing to archive it.<br />
<br />
Nov 2014 update: Yahoo's step to [http://online.wsj.com/articles/fight-over-flickrs-use-of-photos-1416875564 sell the CC-licensed pictures] upset a lot of users (who obviously did not understand this license) and may now want to delete their accounts (with up to 10k images each).<br />
<br />
Dec 2015 update: Yahoo!'s Action Plan in their [http://www.wsj.com/public/resources/documents/yahoopresentation.pdf 2015 shareholder presentation] mentions killing Flickr.<br />
<br />
== API ==<br />
<br />
[https://www.flickr.com/services/api/ Documentation]<br />
<br />
* [https://www.flickr.com/services/api/flickr.photos.search.htm flickr.photos.search]<br />
** returns only 1999 unique results<br />
** filtering by upload date is limited to 600s accuracy<br />
** total number of results is inaccurate<br />
** [https://archive.org/details/flickr-metadata-2016 Metadata dump]<br />
<br />
== Other ==<br />
<br />
It seems a [https://oc.wikipedia.org/wiki/.flickr .flickr] top-level domain is registered.<br />
<br />
== See also ==<br />
* [[FlickrStats]]<br />
* [[FlickrFckr]]<br />
<br />
== External links ==<br />
* http://www.flickr.com<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Image hosting]]<br />
[[Category:Video hosting]]<br />
[[Category:Yahoo!]]</div>PurpleSymphonyhttps://wiki.archiveteam.org/index.php?title=Nin.com&diff=26753Nin.com2017-01-03T11:59:18Z<p>PurpleSymphony: /* forum.nin.com */ Got most of it</p>
<hr />
<div>'''nin.com''' was rebuilt at some point in December 2016; as a result the DNS for all subdomains was repointed to a new design companies servers, and a ''lot'' of old stuff became inaccessible, including remix.nin.com (fan submitted remixes).<br />
<br />
At the time of writing, the old hosting company has not switched off the old sites, and they can be accessed by manual DNS hackery[http://www.echoingthesound.org/community/threads/4298-New-nin-com-website?p=328754#post328754]:<br />
<br />
$ host remix.nin.com ns1.sudjam.com<br />
remix.nin.com has address 208.81.236.37<br />
<br />
__TOC__<br />
<br />
==irc.nin.com ==<br />
<br />
an IRC server, no web content, likely nothing to do here<br />
<br />
== remix.nin.com ==<br />
<br />
{{Infobox project<br />
| title = remix.nin.com<br />
| URL = http://remix.nin.com<br />
| project_status = {{offline}}<br />
| archiving_status = {{saved}}<br />
}}<br />
<br />
Hostname pointed to IP address 208.81.236.37 before it was taken offline around December 15th 2016. Available discovery endpoints:<br />
<br />
* http://remix.nin.com/feed/playlist?id=something<br />
* http://remix.nin.com/playlist/song_info_data.js?mix_id=73<br />
* http://remix.nin.com/listen/get_rating_data.js?mix_id=73 <br />
* http://remix.nin.com/listen/artist_info_data.js?user_id=109510<br />
* http://remix.nin.com/listen/comment_summary_data.js?mix_id=73<br />
<br />
Archives: <nowiki>https://archive.org/details/@purplesymphony?and[]=subject:"remix.nin.com"</nowiki><br />
<br />
== feeds.nin.com ==<br />
<br />
This was a DNS alias onto Google's feedproxy service and as such seems to be dead<br />
<br />
$ host feeds.nin.com ns1.sudjam.com<br />
Using domain server: <br />
Name: ns1.sudjam.com<br />
Address: 208.81.236.21#53<br />
Aliases:<br />
<br />
feeds.nin.com is an alias for 1f20mvu.feedproxy.ghs.google.com.<br />
Host 1f20mvu.feedproxy.ghs.google.com not found: 5(REFUSED)<br />
Host 1f20mvu.feedproxy.ghs.google.com not found: 5(REFUSED)<br />
<br />
From that ETS link<br />
<br />
216.58.193.115 feeds.nin.com<br />
<br />
"feeds.nin.com is a front for Feedburner. I found most of the feeds, I think:"<br />
<br />
* http://feeds.nin.com/ninremixhighestratedoverall/<br />
* http://feeds.nin.com/ninremixnewestmixes/<br />
* http://feeds.nin.com/ninRemixHighestRatedToday<br />
* http://feeds.nin.com/nindotcomupdates<br />
* http://feeds.nin.com/ninNews<br />
* http://feeds.nin.com/ninphotos<br />
* http://feeds.nin.com/ninvideos<br />
<br />
==forum.nin.com ==<br />
<br />
{{Infobox project<br />
| title = forum.nin.com<br />
| image =<br />
| URL = http://forum.nin.com<br />
| project_status = {{offline}}<br />
| archiving_status = {{partiallysaved}}<br />
}}<br />
<br />
[https://archive.org/details/forum-nin-com-2016 Archives], grabbed with modified /etc/hosts<br />
<br />
==member.nin.com ==<br />
==beta.media.nin.com ==<br />
==help.nin.com ==<br />
==phm.nin.com ==<br />
==profile.nin.com ==<br />
==access.nin.com ==<br />
==media.nin.com ==<br />
==dl.nin.com ==<br />
==tds.nin.com ==<br />
==lightsinthesky.nin.com ==<br />
==discipline.nin.com ==<br />
==theslip.nin.com ==<br />
==ghosts.nin.com ==<br />
==halo22.nin.com ==<br />
==yearzero.nin.com ==</div>PurpleSymphony