Difference between revisions of "Usenet"

From Archiveteam
Jump to navigation Jump to search
(Draft)
 
m (→‎top: typos fixed: from it's → from its, supressed → suppressed, propriatery → proprietary)
(9 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Infobox project
| title = Usenet
| URL = none
| project_status = {{online}}
| archiving_status = {{inprogress}}
| irc = archiveteam
}}


USENET is a mailing list based collection of assorted forum groups accessed via the NNTP protocol.
'''Usenet''' is a mailing list based collection of assorted forum groups accessed via the NNTP protocol.


Currently the major archive of this important forum is Google Groups...
Currently the major archive of this important forum is Google Groups, which absorbed DejaNews. However,  there are some concerns raised by this.
#  Google could pull Groups at its whim, with no clear donation policy to other archives.
#  Despite Google's claims not to routinely monitor the service,  postings are removed or suppressed for various reasons.
#  Google currently offers little to distinguish USENET from its own proprietary groups.
#  The interface used for Groups has issues with some browsers, and accessing text versions of postings is an involved process.


However, there are some concerns raised by this.
[[Google Video]] proves that Google might be the longest-serving goldmine of the internet, but that doesn't make it a reliable long-term host. Copies must be retained by others in safe locations.


1. Google could pull Groups at it's whim, with no clear donation policy to other archives.
Usenet was and is a distributed system, hence no single company or server will have the whole history of it. Some big players, however, have a good portion. Moreover, everyone with access to a news server should just download everything (non-binary) there is on it and publish it on archive.org; especially the local hierarchies not mirrored in many places. To do this, it's probably enough to put a free software newsreader at work, which saves on a standard open format? Suggestions needed.
2. Despite Google's claims not to routinely monitor the service, postings are removed or supressed for various reasons.
3.  Google currently offers little to distinguish USENET from it's own propriatery groups.
4.  The interface used for Groups has issues with some browsers, and accessing text versions of postings is an involved process.


In the meanwhile, Kahle and the Internet Archive are not resting!
* https://archive.org/details/usenet
* https://archive.org/details/giganews (since January 2014)
It's hard to say how complete those archives will be.


Therefore there should be an alternative.
Tools for sorting USENET and public mailing list archives into big Katamari style archives are available at https://github.com/ZoeB/arcmesg  Improvements are greatly appreciated!


'''It is suggested that Archiveteam members form an effort to begin a parallel archive to groups'''
{{Navigation box}}
 
Such an alternative should offer a credible search facility, indexing by header fields and over date ranges, broadly similar to those offered by the Groups UI. Such a search could also extend beyond that offered by the current Google offering in enabling grep style expressions to be used ( subject to appropriate limitations on resource uages)
 
Technical issues :
* How big is the current USENET colloquia?
* By how much is this likely to grow?
* How should postings be stored?  ( Ideally text postings should be stored as plain text+headers as they would typically be on a newserver)
* Should NNTP style direct access be allowed, or should posting only be accessible via a neogtiated read only API?
* Binaries -  Leaves as encoded or translate?
 
Logistic issues:
* How to recover pre 2014 material from alternative sources?
* How to upload and index?
 
 
Non-Technical Issues:
* Spam - Some less used groups are in effect mostly spam.. is it worth acrchiving the spam along with genuine postings?
* Cancelmsg -  Google Groups doesn't respond to them, but some newservers will respond to genuine cancelmsg, as well as issuing their own in respect of material found to be in breach of applicable laws.
 
* Impersonation of headers-  Mis attribution of sources is an issue because of the potential for legal consquences.
* Legally questionable material -  Should an archive of USENET respect archival principles (and challange legal threats) or
have takedown procedure?
* The New York 22 Banned list -  No responsible archive would support the deliberate inclusion of clearly illegal 'child abuse' images but these are not always easy to identify such, and should an archive be the one to report previously unknown crimes?
* Libel(i.e Defamation) - In some countries the 'publisher' of a libel (ie an archive) can be held liable for it as well as the original source. Some postings which would be libel are nonetheless retained in the archive as they form part of the public debate. (This is especailly true of high-profile cases). However , libel of course has to be proven in court.
 
 
*Infringement of copyright -  Whilst the DMCA has a takedown procedure, it's sometimes overreaching, meaning materials posted in good faith are removed unfairly.  Precusors to the DMCA takewdown have also been used for SLAPP purposes and to supress

Revision as of 23:23, 4 December 2017

Usenet
URL none
Status Online!
Archiving status In progress...
Archiving type Unknown
IRC channel #archiveteam (on hackint)

Usenet is a mailing list based collection of assorted forum groups accessed via the NNTP protocol.

Currently the major archive of this important forum is Google Groups, which absorbed DejaNews. However, there are some concerns raised by this.

  1. Google could pull Groups at its whim, with no clear donation policy to other archives.
  2. Despite Google's claims not to routinely monitor the service, postings are removed or suppressed for various reasons.
  3. Google currently offers little to distinguish USENET from its own proprietary groups.
  4. The interface used for Groups has issues with some browsers, and accessing text versions of postings is an involved process.

Google Video proves that Google might be the longest-serving goldmine of the internet, but that doesn't make it a reliable long-term host. Copies must be retained by others in safe locations.

Usenet was and is a distributed system, hence no single company or server will have the whole history of it. Some big players, however, have a good portion. Moreover, everyone with access to a news server should just download everything (non-binary) there is on it and publish it on archive.org; especially the local hierarchies not mirrored in many places. To do this, it's probably enough to put a free software newsreader at work, which saves on a standard open format? Suggestions needed.

In the meanwhile, Kahle and the Internet Archive are not resting!

It's hard to say how complete those archives will be.

Tools for sorting USENET and public mailing list archives into big Katamari style archives are available at https://github.com/ZoeB/arcmesg Improvements are greatly appreciated!