Gna!

From Archiveteam
Revision as of 19:21, 17 April 2017 by JTN (talk | contribs) (→‎Hosted data: Gna members can use SSH instead of insecure anonymous rsync)
Jump to navigation Jump to search
Gna!
Gna.org screenshot 20170225.png
URL https://gna.org/
Status Closing
Archiving status In progress...
Archiving type Unknown
IRC channel #gnarm (on hackint)

Gna! is a centralized location where software developers can develop, distribute and maintain free (GPL-compatible) software. It is an instance of the Savane code-hosting platform[1]. It hosts for popular free software projects such as Battle for Wesnoth and Freeciv (full list).

It is shutting down due to lack of admin effort; probably by end of April / early May 2017.

Hosted data

As of 2017-02 it claimed to have 1458 hosted projects. (Many are probably abandoned and will not be saved by their project admins before shutdown.)

Here's a breakdown of the kinds of data stored and what various people can do to grab the data:

  • Third party describes what random anonymous Internet people (e.g., Archive Team) can do
    • Done shows bits that we have already rescued
    • Help shows bits that someone could usefully do
  • Members describes things that only members of the relevant project can do (if better)

Data stored:

  • Code hosting using CVS, Subversion, and Arch (done 2017-02-25, not updated since; see subpage)
    • Third parties can grab all code with full history:
      • All subversion repos available via anonymous rsync: rsync://svn.gna.org/svn/ (ref: bottom of every project's svn page e.g. [1]). (In FSFS format, which is supposed to be portable.)
      • Ditto CVS, it looks like: rsync://svn.gna.org/cvs/
      • Arch/tla [2]: rsync://download.gna.org/arch/
      • Project members can get the same data securely over SSH (for any project, not just their own)
    • There's also a ViewVC web front-end to browse code. (No point grabbing this if you've got the above)
  • Ticket tracking (not saved, help wanted)
    • Up to 4 trackers per project: 'bugs', 'patch', 'task', 'support'
    • Gna members (only) can set up XML export of their own ticket text/metadata ("Export" item on tracker admin menu).
      • Only option for third parties looks like web scraping. Help: can someone look into this? (Someone pointed ArchiveBot at it but it doesn't seem to have grabbed much)
      • Exported XML is published to an unauthenticated URL of the form https://gna.org/export/project/user/number.xml . number might be global; a recent export had number 66. In principle this namespace could be mined by third parties although it's a rather large search space (1458 projects * 9116 users * 66 numbers) and would only catch recent or periodic exports, since they are cleared out quickly.
    • There's no supported interface for grabbing issue attachments (such as patches) even for project admins though.
      • Third parties can scrape attachments by relying on their increasing integer IDs, e.g. file #29845. It looks like you don't have to get the 'bugs' bit correct, so it's possible to scrape all public files by varying the ID. (done/ongoing by JTN, not uploaded anywhere yet)
    • Individual tickets can be private. (Maybe files too?) But the XML export includes private tickets (yes, to an unauthenticated URL).
  • File hosting at http://download.gna.org/ (done 2017-02-25, not updated since; see subpage)
    • Third parties can do anonymous rsync from rsync://download.gna.org/download/
  • Project websites on home.gna.org (done 2017-02-25, not updated since; see subpage)
  • Mailing lists using Mailman (not saved, help wanted)
    • Which means public archives are available to third parties in mbox format (albeit with email addresses mangled). e.g. [4] Help: can someone scrape these? Should be easy. (ArchiveBot has something, not sure what.)
      • Note, the most recent mbox link on inactive lists (e.g., [5]) is broken; replace "2014-09.partial.mbox.gz" with "2014-09.mbox.gz" to fix it
      • It may be worth grabbing the HTML archives too, as they contain some info not available in the mboxes, e.g. "X-From-R13" in HTML comments contains reversibly obfuscated From address
    • Some mailing lists are private. Even project admins can't see the archives at the moment (sr 3421).
  • Project metadata: groups, users, news, help topics etc. In a database and probably only available via web scraping. Help: can someone look into this?
  • Usage stats at http://stats.gna.org/

Gna admins have not so far been responsive to requests for help from at least some project members wishing to migrate or rescue their data, presumably due to the same lack of effort that is why the site is shutting down. They haven't been approached about Archive Team style bulk backup (or at least JTN has not done so).

Shutdown Notice

  • A notice of pending shutdown / request for takeover was first announced in Nov 2016[2] suggesting a time frame of six months
  • A news item[3] about shutdown was posted to the front page 2017-01-31 linking to the above. A reply to that on 4 Feb suggests shutdown will happen "within 3 months, or when the hardware dies".
  • This suggests shutdown by around the beginning of May 2017.

References