Difference between revisions of "Bitbucket"

From Archiveteam
Jump to navigation Jump to search
m (→‎Mercurial archiving: be clearer about what tools we're talking about (some of the "existing tooling" linked is for bitbucket specifically and does include issues etc))
m (Add project tracker and source code)
(4 intermediate revisions by 4 users not shown)
Line 5: Line 5:
| image = bitbucket-screenshot.png
| image = bitbucket-screenshot.png
| project_status = {{specialcase}}
| project_status = {{specialcase}}
| archiving_status = {{upcoming}}
| archiving_status = {{inprogress}}
| source = [https://github.com/ArchiveTeam/bitbucket-grab bitbucket-grab]
| tracker = [https://tracker.archiveteam.org/bitbucket/ bitbucket]
| irc = kickthebucket
| irc_network = hackint
| irc_network = hackint
}}
}}
Line 11: Line 14:
'''Bitbucket''' is a version control repository hosting service, marketed mostly towards proprietary and enterprise software but with a substantial FLOSS presence.
'''Bitbucket''' is a version control repository hosting service, marketed mostly towards proprietary and enterprise software but with a substantial FLOSS presence.


It announced on 20 August 2019 that it would be ending Mercurial support to focus exclusively on Git.<ref>{{URL|https://bitbucket.org/blog/sunsetting-mercurial-support-in-bitbucket}}</ref> Creating new Mercurial repositories was disabled on 1 February 2020, and all Mercurial repositories and API will be removed on 1 June 2020.
== Mercurial repositories ==
It announced on 20 August 2019 that it would be ending Mercurial support to focus exclusively on Git.<ref>{{URL|https://bitbucket.org/blog/sunsetting-mercurial-support-in-bitbucket}}</ref> Creating new Mercurial repositories was disabled on 1 February 2020, and all Mercurial repositories and API will be removed on 1 July 2020.<ref>The original sunset date was 1 June, but on 21 April this was pushed back due to [[Coronavirus]].</ref>


== Mercurial archiving ==
== Archival ==
{{Infobox project
| title = Bitbucket Mercurial web content
| project_status = {{closing}}
| archiving_status = {{inprogress}}
| source = [https://github.com/ArchiveTeam/bitbucket-grab bitbucket-grab]
| tracker = [https://tracker.archiveteam.org/bitbucket/ bitbucket]
| irc = kickthebucket
| irc_network = hackint
}}
{{Infobox project
| title = Bitbucket Mercurial repositories
| project_status = {{closing}}
| archiving_status = {{inprogress}}
| source = [https://github.com/ArchiveTeam/mercurial-grab mercurial-grab]
| tracker = [https://tracker.archiveteam.org/mercurial/ mercurial]
| irc = kickthebucket
| irc_network = hackint
}}


There is Mercurial tooling to deal with the repositories themselves, but these do not include issues, wikis, and other metadata. (What is the status of PRs?) This will have to be separately scraped from Bitbucket's website and/or API.
There is Mercurial tooling to deal with the repositories themselves, but these do not include issues, wikis, and other metadata. (What is the status of PRs?) This will have to be separately scraped from Bitbucket's website and/or API.
Line 19: Line 41:
We have an enumeration of existing Mercurial repositories (scraped from Bitbucket's search API after the February lockdown). These repositories will be writable up to the June deletion. The discovery has an <code>updated-on</code> field, but it's not clear whether this is the repository or its metadata.
We have an enumeration of existing Mercurial repositories (scraped from Bitbucket's search API after the February lockdown). These repositories will be writable up to the June deletion. The discovery has an <code>updated-on</code> field, but it's not clear whether this is the repository or its metadata.


=== Statistics ===
Our archival is split into two projects: [https://tracker.archiveteam.org/mercurial/ mercurial] retrieves the actual hg repositories, and [https://tracker.archiveteam.org/bitbucket/ bitbucket] covers the web interface (issues, pull requests, wikis, etc.).


==== Statistics ====
* Total repos online: 245,068
* Total repos online: 245,068
* Total reported size (fairly accurate): 5.23 TiB (does this include hg compression?)
* Total reported size (fairly accurate): 5.23 TiB (does this include hg compression?)
Line 27: Line 50:
* Maximum reported size: 14.4 GiB
* Maximum reported size: 14.4 GiB


=== Repo cloning ===
==== Repo cloning ====
 
From the <code>hg</code> docs:
From the <code>hg</code> docs:


Line 41: Line 63:
Requests over HTTP can be WARCed.
Requests over HTTP can be WARCed.


=== Existing discussion and tooling ===
==== Existing discussion and tooling ====
 
* [https://community.atlassian.com/t5/Bitbucket-articles/What-to-do-with-your-Mercurial-repos-when-Bitbucket-sunsets/ba-p/1155380 Forum thread]
* [https://community.atlassian.com/t5/Bitbucket-articles/What-to-do-with-your-Mercurial-repos-when-Bitbucket-sunsets/ba-p/1155380 Forum thread]
** [https://community.atlassian.com/t5/Bitbucket-articles/What-to-do-with-your-Mercurial-repos-when-Bitbucket-sunsets/ba-p/1155380/page/7#M321 Some user questions]
** [https://community.atlassian.com/t5/Bitbucket-articles/What-to-do-with-your-Mercurial-repos-when-Bitbucket-sunsets/ba-p/1155380/page/7#M321 Some user questions]
Line 50: Line 71:


== Site structure ==
== Site structure ==
Some API requires auth, some does not. Rate limits are documented [https://confluence.atlassian.com/bitbucket/rate-limits-668173227.html here].
Some API requires auth, some does not. Rate limits are documented [https://confluence.atlassian.com/bitbucket/rate-limits-668173227.html here].


== References ==
== References ==
<references/>
<references/>

Revision as of 08:31, 26 July 2020

Bitbucket
Bitbucket logo
Bitbucket-screenshot.png
URL https://bitbucket.org/
Status Special case
Archiving status In progress...
Archiving type Unknown
Project source bitbucket-grab
Project tracker bitbucket
IRC channel #kickthebucket (on hackint)

Bitbucket is a version control repository hosting service, marketed mostly towards proprietary and enterprise software but with a substantial FLOSS presence.

Mercurial repositories

It announced on 20 August 2019 that it would be ending Mercurial support to focus exclusively on Git.[1] Creating new Mercurial repositories was disabled on 1 February 2020, and all Mercurial repositories and API will be removed on 1 July 2020.[2]

Archival

Bitbucket Mercurial web content
Status Closing
Archiving status In progress...
Archiving type Unknown
Project source bitbucket-grab
Project tracker bitbucket
IRC channel #kickthebucket (on hackint)
Bitbucket Mercurial repositories
Status Closing
Archiving status In progress...
Archiving type Unknown
Project source mercurial-grab
Project tracker mercurial
IRC channel #kickthebucket (on hackint)

There is Mercurial tooling to deal with the repositories themselves, but these do not include issues, wikis, and other metadata. (What is the status of PRs?) This will have to be separately scraped from Bitbucket's website and/or API.

We have an enumeration of existing Mercurial repositories (scraped from Bitbucket's search API after the February lockdown). These repositories will be writable up to the June deletion. The discovery has an updated-on field, but it's not clear whether this is the repository or its metadata.

Our archival is split into two projects: mercurial retrieves the actual hg repositories, and bitbucket covers the web interface (issues, pull requests, wikis, etc.).

Statistics

  • Total repos online: 245,068
  • Total reported size (fairly accurate): 5.23 TiB (does this include hg compression?)
  • Mean reported size: 22.4 MiB
  • Median reported size: 205 KiB
  • Maximum reported size: 14.4 GiB

Repo cloning

From the hg docs:

In normal clone mode, the remote normalizes repository data into a common exchange format and the receiving end translates this data into its local storage format. --stream activates a different clone mode that essentially copies repository files from the remote with minimal data processing. This significantly reduces the CPU cost of a clone both remotely and locally. However, it often increases the transferred data size by 30-40%. This can result in substantially faster clones where I/O throughput is plentiful, especially for larger repositories. A side-effect of --stream clones is that storage settings and requirements on the remote are applied locally: a modern client may inherit legacy or inefficient storage used by the remote or a legacy Mercurial client may not be able to clone from a modern Mercurial remote.

hg clone produces a directory with a working copy, plus the .hg directory containing version control data. However, this is internally sent as a bundle, which if captured can be unbundled normally.

hg network protocols:

Requests over HTTP can be WARCed.

Existing discussion and tooling

Site structure

Some API requires auth, some does not. Rate limits are documented here.

References

  1. https://bitbucket.org/blog/sunsetting-mercurial-support-in-bitbucket[IAWcite.todayMemWeb]
  2. The original sunset date was 1 June, but on 21 April this was pushed back due to Coronavirus.