Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSAL MSI with Credentials - Authentication Design #5096

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

gladjohn
Copy link
Contributor

@gladjohn gladjohn commented Jan 22, 2025

This pull request includes several important changes that provide detailed guidance on implementing and handling the MSI V2 /credential endpoint, including token acquisition, SLC revocation, and probing logic for VM/VMSS. The key changes are summarized below:

Implementation of MSI V2 /credential Endpoint:

  • Added a design document outlining the token acquisition process for MSI V2 using the /credential endpoint, including steps for certificate handling and source detection logic.

SLC Revocation Specification:

  • Introduced a specification for handling short-lived credential (SLC) revocation scenarios, detailing the process for obtaining new credentials when an existing one is revoked or invalid.

VM/VMSS Credential Endpoint Probe Logic:

  • Added documentation on the probe logic to determine the availability of the MSI V2 /credential endpoint in IMDS for VM/VMSS, including handling IMDS restart scenarios and expected responses.

@gladjohn gladjohn requested a review from a team as a code owner January 22, 2025 18:27

To start the flow, MSAL requires a certificate. MSAL follows these steps:

1. **Check for an existing certificate**: MSAL looks for a specific certificate (`devicecert.mtlsauth.local`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it look?

This section outlines the necessary steps to acquire an access token using the MSI V2 `/credential` endpoint.

### 1. Check for an Existing (Platform) Certificate
- Search for a specific certificate (`devicecert.mtlsauth.local`) in `Cert:\LocalMachine\My`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Linux?

- If found, extract its thumbprint and use it for authentication.

### 2. Generate a New Certificate (if specific certificate is not found)
- Create a new self-signed certificate with a 90-day validity.
Copy link
Member

@bgavrilMS bgavrilMS Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please detail how to create it - in memory, or create it and place it in a cert store etc.
And what to do after 90 days.

- Send a POST request to the IMDS `/credential` endpoint with the certificate details.
- The request must include:
- `Metadata: true` header.
- `X-ms-Client-Request-id` header with a GUID.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is the CorrelationID similar to CCA's correlation ID? And it also implies that we should add WithCorrelationID API to MSIApplication?

- The request must include:
- `Metadata: true` header.
- `X-ms-Client-Request-id` header with a GUID.
- JSON body containing the certificate's public key in `jwk` format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Create a new self-signed certificate
$cert = New-SelfSignedCertificate `
-Subject $certSubject `
-CertStoreLocation "Cert:\LocalMachine\My" `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in MSAL we shoudl store it in memory no?

kty = "RSA"
use = "sig"
alg = "RS256"
kid = $cert.Thumbprint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use Thumbprint. This is SHA1 driven and is banned by the Crypto board. Use SHA256 thrumbprint.

| `WithClaims()` | Allows passing of claims (bypasses cache). |
| `GetBindingCertificate()` | Helper method to get the binding certificate. |
| `GetManagedIdentitySourceAsync()`| Helper method to get the managed identity source. |
| `WithProofOfPossession()` | Requests a PoP token instead of a default Bearer token. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the actual API name and I don't think you should bring it here.

| `WithClientCapabilities()` | Allows client capabilities |
| `WithClaims()` | Allows passing of claims (bypasses cache). |
| `GetBindingCertificate()` | Helper method to get the binding certificate. |
| `GetManagedIdentitySourceAsync()`| Helper method to get the managed identity source. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alraedy exists in all MSALs no? For Azure SDK. Do we add an extra source?


## Summary of New APIs

| API Name | Purpose |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to specify where these API will be exposed.

@@ -0,0 +1,59 @@
# Short-Lived Credential (SLC) Revocation Specification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a few docs here. How about creating a directory for MSIv2 and adding a short index.md to organize everything?

|----------------------------------|-----------------------------------------------------------|
| `WithClientCapabilities()` | Allows client capabilities |
| `WithClaims()` | Allows passing of claims (bypasses cache). |
| `GetBindingCertificate()` | Helper method to get the binding certificate. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do when not hosted on VM?

- Convert the certificate to a Base64-encoded string (`x5c`).
- Format the JSON payload containing the certificate details for request authentication.

### 4. Request MSI Credential
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about credential caching? Where do we cache this?

| **1️⃣** | Send a **POST request** to `/metadata/identity/credential` **without headers**, using `.` as the body. | | |
| **2️⃣** | **Check HTTP response status.** | | |
| **3️⃣** | If **400 Bad Request**, the `/credential` endpoint **is available**. | Proceed with token acquisition. |
| **4️⃣** | If **500 Internal Server Error**, check the **Server** header. | `"Microsoft-IIS/10.0"` (no `IMDS/`) | Retry the request (IMDS might be restarting). |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give details about retry mechanism


- **[SLC Design Document](https://microsoft.sharepoint.com/:w:/t/AzureMSI/EURnTEtFXPlDngpYhCUioqUBvbSUWEX7vZjP0nm8bxUsQA?e=Ejok1n&wdLOR=cE6820299-49AF-4D7A-B7F7-F58D65C232B6)**
- **[MSAL EPIC](https://identitydivision.visualstudio.com/Engineering/_workitems/edit/3027078)**

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a telemetry chapter. At a minimum we'd want to know if we are dealing with a self-generated cert or one read from the store. Anything else?

| **4️⃣** | If **500 Internal Server Error**, check the **Server** header. | `"Microsoft-IIS/10.0"` (no `IMDS/`) | Retry the request (IMDS might be restarting). |
| **5️⃣** | If the response does not match the above cases, treat it as **unexpected behavior**. | | Log the issue or fallback to IMDS `/token` if applicable. |

---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry chapter? Smth like Probe outcome - "found the endpoint with no retries", "found with retries" and "not found"

## **MSAL Behavior to Relay the Signal to IMDS**

- MSAL can only determine that an `{ "error": "invalid_client" }` response is caused by a credential issue but cannot handle suberrors explicitly.
- For SLC-related errors, MSAL will retry obtaining a new SLC from IMDS and retry with eSTS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retry logic needs details.


return tokenResponse;
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetyry chapter and acceptance tests chapter are missing.


- MSAL can only determine that an `{ "error": "invalid_client" }` response is caused by a credential issue but cannot handle suberrors explicitly.
- For SLC-related errors, MSAL will retry obtaining a new SLC from IMDS and retry with eSTS.
- For claims challenges, MSAL does not get a signal from eSTS but rather from the app developer when passing claims to MSAL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably deserves its own chapter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants