Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[All] Make username_claim callable (except for CILogon), like it has been in Generic #717

Merged
merged 3 commits into from
Feb 7, 2024

Conversation

yuvipanda
Copy link
Collaborator

While trying to use Auth0 for authentication in one of our hubs, we discovered that the most useful username_claim (sub) produces usernames that look like
oauth2|cilogon|http://cilogon.org/servera/users/43431 (when using auth0 with CILogon). The last part of sub is generally whatever is passed on to auth0, so it's going to be different for different users.

I had thought username_claim was a callable, but turns out that's only true for GenericOAuthenticator. I think it's pretty useful for every authenticator, so I've just moved that functionality out to the base class instead. I also added a test to verify it works. The test is in GenericOAuthenticator because it was the easiest place to put it, but it works across authenticators. This also means it is fully backwards compatible.

While trying to use Auth0 for authentication in one of our
hubs, we discovered that the most useful username_claim (`sub`)
produces usernames that look like
`oauth2|cilogon|http://cilogon.org/servera/users/43431`
(when using auth0 with CILogon). The last part of `sub` is
generally whatever is passed on to auth0, so it's going to
be different for different users.

I had thought `username_claim` was a callable, but turns out
that's only true for GenericOAuthenticator. I think it's
pretty useful for every authenticator, so I've just moved
that functionality out to the base class instead. I also
added a test to verify it works. The test is in GenericOAuthenticator
because it was the easiest place to put it, but it works
across authenticators. This also means it is fully backwards
compatible.
@consideRatio
Copy link
Member

consideRatio commented Jan 18, 2024

I'd like to hash out if we really prefer to make existing config username_claim callable as compared to something else, at least compared to the alternative of adding a new config to pick out the username part from a username_claim and deprecating Generic's username_claim as callable.

Below I'll make a case for adding new config instead of expanding use of username_claim as callable, but I'd like to start with providing a codebase overview.

Codebase overview

The username_claim config

  1. OAuthenticator defines username_claim where its name still directly aligns with its intent - to be the claim from where we derive the username.
  2. Generic and only generic currently overrides username_claim to be callable - and when its callable its not meant to return a username claim anymore but the username itself.
  3. CILogon overrides username_claim to be unusable, pointing users towards defining username_claim under allowed_idps[<idp>].username_derivation instead.

The OAuthenticator.user_info_to_username(self, user_info) function

It picks the username based on config such as username_claim. Generic overrides this in order to respect its callable variable of username_claim. Both user_info_to_username and Generic's callable userinfo_claim gets passed the user_info, but only user_info_to_username is passed self.

The Authenticator.normalize_username function

Defined in the Authenticator base class without being overridden, its called by OAuthenticator.authenticate to process the return value of user_info_to_username.

Why I hesitate on making username_claim as callable

  • Unexpected return value
    Generic's username_claim as callable returns something different than a username claim, but the username itself. This is a confusion point and adds complexity for users and maintainers but also for people inspecting config they haven't read up on yet, like a new JupyterHub admin taking over an existing deployment.
  • Callable config and YAML representation
    Callable config can only be parsed from Python code config, while some deployments provide most authentication config via YAML passthrough config (z2jh's hub.config). While its possible to put this part in hub.extraConfig where Python code strings can be put, it forces config related to auth to be separated which makes the it harder to understand and maintain.

Alternative change proposal 1 - regex dedicated config

  • We deprecate Generic username_claim as a callable
  • We add username_regex as a string for OAuthenticator (and re-implements it in CILogon under allowed_idps.username_derivation), where if its set it will extract the username relevant part specifically.

Alternative change proposal 2 - a regex/callable variation

Like the change proposal above, but where username_regex is named username_picker or similar, that can be either a regex string or callable being passed the user info object.

With this approach, we could deprecate Generic's username_claim as callable with a traitlet validator that sets username_picker instead whenever username_claim as long as we stick with the function signature of passing just user_info.

@consideRatio
Copy link
Member

@yuvipanda and other reviewers, could you try to rank how the strategies below based on what you think will be best?

  1. username_claim as string/callable not only by Generic
  2. username_regex as new string config
  3. username_picker as new string/callable config

@manics
Copy link
Member

manics commented Jan 18, 2024

I'm in favour of minimising the number of configuration properties. How confident are you that username_regex works for all cases? Is username_claim always a string?

@yuvipanda
Copy link
Collaborator Author

Thanks for thinking about this, @consideRatio.

I generally think we should not be introducing regexes wherever possible (https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/). So while I understand the positives of it allowing YAML based config, I think overall introducing regexes to something as sensitive as username validation is not something we should do. Regexes are very very easy to get wrong, and very hard to debug, and escaping mismatches can cause security issues here. So I don't think we should be using regexes here.

I think the pattern of Union(<item>, Callable) where the Callable will return the same thing that could have been statically presented is very prevelant in JupyterHub ecosystem, and also extremely powerful in a clean way. The ability to have arbitrary python code is one of the core strengths of traitlets, and I love us leaning into it. This python code in the test for handling CILogon is actually pretty clean and readable, compared to what one would have to do with a regex. So the Union(<item>, Callable) is a pattern we should keep and promote.

I do understand your concern a bout the property itself being called username_claim, and not actually returning a claim key to be used but the username itself. The name seems fine to me, but I understand it's subjective.

So from #717 (comment) (which is very helpful btw), I'd say we should just not do (1). My preference is for (0), but instead of (2) let me propose a different alternative.

Proposal #4

We have a callable called username_from_user_info that is just a callable. This exists in the base OAuthenticator (and can be ported to CILogon as well). When the admin sets this, username_claim will be ignored (but not forbidden, as the callable may use this setting). This would be a breaking change for GenericOAuthenticator, (username_claim will have to be string only) so it can be consistent. Let's call this option 4.

My preference would be still to just move the functionality as is from GenericOAuthenticator to OAuthenticator (what this PR does), for the following reasons:

  1. No backwards compatibility break anywhere. It's just new functionality. This is the strongest reason for me.
  2. I think the naming is intuitive enough, but I understand this is subjective.
  3. Fewer config options (as @manics said)

So my ordering would be your option (0) and then my proposed option (4).

@consideRatio
Copy link
Member

Thanks for reasoning with me about options, i think option 4 leaves open questions related to having also a function named user_info_to_username.

Okay so going with 0 feels more okay to me now that this has been deliberated on a bit more.

Review feedback for this PR assuming continued path on option 0:

yuvipanda added a commit to yuvipanda/oauthenticator that referenced this pull request Jan 18, 2024
@yuvipanda
Copy link
Collaborator Author

Thanks @consideRatio. I've cleaned up the extra redefention in Generic here. I started working on CILogon, but realized we'll have to figure out how to make the jsonschema based validation we do there work, since it can't handle callables by default. I'm also not sure how to get the unit tests to pass properly. So I've split that out into as a separate PR #718. Do you think this PR can proceed, and we can deal with CILogon separately?

@yuvipanda yuvipanda changed the title Move username_claim being callable to oauth2 autheneticator Move username_claim being callable to OAuth2 base authenticator Jan 20, 2024
@yuvipanda yuvipanda changed the title Move username_claim being callable to OAuth2 base authenticator Make username_claim callable in all Authenticators, not just GenericOAuthenticator Jan 20, 2024
@yuvipanda
Copy link
Collaborator Author

Anything else I can do to get this merged? :)

Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me @yuvipanda!

@consideRatio consideRatio changed the title Make username_claim callable in all Authenticators, not just GenericOAuthenticator Make username_claim callable in all Authenticators except CILogon, like it has been in Generic Feb 7, 2024
@consideRatio consideRatio merged commit 9b96d47 into jupyterhub:main Feb 7, 2024
11 checks passed
@consideRatio
Copy link
Member

Thank you @yuvipanda for taking time to reason about things!

I've opened #728 and updated the title to reflect that CILogon isn't providing a username_claim config that is callable still.

@consideRatio consideRatio changed the title Make username_claim callable in all Authenticators except CILogon, like it has been in Generic [All] Make username_claim callable (except for CILogon), like it has been in Generic Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants