Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle package name conflicts #8

Open
certik opened this issue Jul 22, 2020 · 8 comments
Open

How to handle package name conflicts #8

certik opened this issue Jul 22, 2020 · 8 comments

Comments

@certik
Copy link
Member

certik commented Jul 22, 2020

Right now if somebody submits a PR with a duplicate name of a new package, such as:

--- a/registry.toml
+++ b/registry.toml
@@ -2,6 +2,9 @@
 "1.7.0" = {git="https://github.com/wavebitscientific/datetime-fortran", tag="v1.7.0"}
 "latest" = {git="https://github.com/wavebitscientific/datetime-fortran"}
 
+[datetime]
+"1.0.0" = {git="https://github.com/other/my_package", tag="v1.0.0"}
+
 [M_calculator]
 "latest" = {git="https://github.com/urbanjost/M_calculator"}

Then our CI will give an error:

$ python load_registry.py 
Traceback (most recent call last):
  File "load_registry.py", line 3, in <module>
    d = toml.load("registry.toml")
  File "/home/ondrej/miniconda3/envs/reg/lib/python3.8/site-packages/toml/decoder.py", line 134, in load
    return loads(ffile.read(), _dict, decoder)
  File "/home/ondrej/miniconda3/envs/reg/lib/python3.8/site-packages/toml/decoder.py", line 477, in loads
    raise TomlDecodeError("What? " + group +
toml.decoder.TomlDecodeError: What? datetime already exists?{'datetime': {'1.7.0': {'git': 'https://github.com/wavebitscientific/datetime-fortran', 'tag': 'v1.7.0'}, 'latest': {'git': 'https://github.com/wavebitscientific/datetime-fortran'}}} (line 5 column 1 char 171)

Which is great, so there will be no duplicates.

However, we should discuss how to handle package name conflicts. Two main options:

  1. All package names will be made unique by prefixing them with their GitHub/GitLab organization name, so M_calculator becomes urbanjost/M_calculator. I believe that is what Go does.

  2. Package names are not prefixed, and we need to provide some guidelines. We can expect a similar situation as in Rust, when people will be submitting lots of new packages with the most obvious names, such as "spline", "harmonics", "fft", "lapack" and they might not necessarily end up the most maintained. And changing a name of a package means breaking people's builds.

The advantages of 1:

  • We can always switch to 2. later if needed
  • The chance of collision of the GitHub / GitLab / ... prefix is relatively low (if it happens, we'll handle it on a case by case basis)
  • We can list the main package, say fortran-lang/stdlib as well as a fork, say certik/stdlib, if the fork has some versions or branches that the main repository does not

Disadvantages of 1:

  • I remember the package by its name, such as "datetime", and I don't want to remember what the prefix is. Although in practice I assume one would do fpm search datetime to obtain the actual version to use (which I won't remember anyway). And fpm search datetime can print out the exact line to put into fpm.toml, such as wavebitscientific/datetime = "1.7.0". So this might not be any more difficult than just datetime = "1.7.0".

Advantages of 2:

  • Shorter names
  • Possible to change where the package lives, for example if wavebitscientific/datetime is moved to fortran-lang/datetime, then nothing changes in the fpm.toml. With the option 1., one would really need to stick to wavebitscientific/datetime as a name, or provide some kind of a redirect.

Disadvantages of 2:

  • First serve first come basis, in a few years all the lucrative names will be already taken
  • Possible conflicts
@milancurcic
Copy link
Member

milancurcic commented Jul 23, 2020

I like option 2.

Guidelines could be same or similar to the criteria for the package index. So, to submit a package to the registry and get it merged, your package would need to check all the boxes.

Then, it's first come first serve. I don't think this is necessarily bad. If somebody wants to get first to the registry, well, they have to make a relevant, mature, and unique package according to the guidelines. If first come first serve would encourage faster development and release of Fortran packages, I think that's an advantage.

This option also wouldn't preclude you to roll your own package that has the same name as another package that's already in the registry. Consider the datetime package. Once the fpm capability to get packages from registry is there, you could do:

[dependencies]
datetime = "latest"

or:

[dependencies]
datetime = "1.7.0"

Now, suppose that you roll your own certik/datetime and want your users to use it. You can still do it, you'd just need to instruct them to specify it like this:

[dependencies]
datetime = {git = "https://github.com/certik/datetime"}

Just like Cargo. I think it can work well.

@milancurcic
Copy link
Member

That being said, if we choose option 2 and decide to follow the guidelines, we'll need somewhat strict review process. Which is not what we did in my PR's to registry.toml today. I merely wanted to get some fpm packages in so we have stuff to look at and think about the format and design. However, it's quite likely that all the packages that are in registry.toml satisfy the criteria already.

@certik
Copy link
Member Author

certik commented Jul 23, 2020 via email

@arjenmarkus
Copy link
Member

I like the idea of simple names as well - option #2. It also - certainly with a reviewing process in place - gives a certain status to the packages. If we decide at some point in time that a particular package should no longer be used, we can deprecate it and after a decent period rename it to "package-deprecated" or the like.
A review process would hinder the ad libitum submission of half-baked packages (I have seen enough of them - and I am guilty of my share ;)).

@certik
Copy link
Member Author

certik commented Jul 23, 2020

If we think that a new given package is not mature enough to use a common name like fft, we can suggest for the package to use a more unique name, such as certik_fft or something like that, then there is less of a problem getting it in, and then in a year or two, if the package matures and it is still the best candidate for the fft name, we can consider adding it as fft.

In other words, I suggest the name of the package to be part of the review process.

@certik
Copy link
Member Author

certik commented Jul 23, 2020

P.S. regarding deprecation, we should enforce semantic versioning like Cargo does. Let's say that an fft package should be replaced with a more maintained and a better version. We can simply bump the major semantic version after we replace the package. That way, no code will break. Current codes will continue using the old version, and if they want to upgrade to the old package but newer version, they use the new name. And if they want the new package, they increase the major semantic version.

So I think this can work.

@everythingfunctional
Copy link
Member

I like option 2. And if anyone really wants their package to have a conflicting name in the future, they can stand up their own registry and have their users specify it like

confliciting_package = { registry = "my.registry.com", version = "1.0.0" }

or

conflicting_package = { git = "https://github.com/somebody/conflicting.git", tag = "v1.0.0" }

@urbanjost
Copy link
Contributor

As an analogy, people often just had a first name; when ambiguities began as populations grew and frequent communication between groups became common surnames and titles evolved. But in the modern world you have something akin to a Social Security number.

You might change your name or move (I have renamed some repositories and forgotten how that might affect the registry, actually); but the number remains the same. Maybe a reviewed version should be given a UUID that supersedes the names? "latest" would go unassigned, emphasizing it might change. And I think "latest" should have to be explicitly requested, not assumed by default, and that you should at least get a warning (maybe it should not be allowed) if you use an external package that has an internal "latest" dependency(?). To prevent issues with deleted or moved or renamed packages should a registered versioned release be archived at the repository? Should there be a "real" repository category where some criteria similar to "like"s could be used that would trigger a copy? Not sure if anyone wants to supply that type of infrastructure, but it might be as simple as "subrepositories" in a github site. Github has something like that but I have not tried it and see a good number of complaints about it; and I believe fees would be required if it exceeded a certain size. Allowing someone to easily create their own local repository to help ensure something is still there ten years later might be something that places the burden on the user instead of the fpm project; but as I use remote dependencies more it is on my mind what is the best practice to avoid problems if the remote package is not available; and that would also eliminate some of the problems for sites that are off the grid. Version numbers and tags are awfully easy to abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants