Skip to content

Commit

Permalink
vault: limit token delay to not exceed token TTL
Browse files Browse the repository at this point in the history
This commit fixes a bug in the Vault token renewal.
When KES renews its current token `T1` and receives
a new token `T2`, KES waits a certain amount of time
before using `T2` to account for replication lag between
Vault nodes.

Vault tokens are not signed but static secrets. Each Vault
node in a dist. cluster needs to know the token before being
able to verify it. Without the usage delay described above,
KES might send a request to a Vault node that has not received
the new token `T2`, yet.

Now, KES must also not wait longer then the remaining TTL of the
current token `T1`. Otherwise, `T1` expires BEFORE KES starts using
`T2`. This results in auth errors like the following:

```
Error making API request.\n\nURL: GET http://127.0.0.1:8200/v1/kv/data/my-key3\nCode: 403. Errors:\n\n* 2 errors occurred:\n\t* permission denied\n\t* invalid token\n\n" req.method=POST req.path=/v1/key/create/my-key3 req.ip=127.0.0.1 req.identity=a49fea12c5d1c69f1eba1e4697e62ccdbe389a80f317191892711b47d83c3e85
```

This commit limits the max. time KES waits before using the new token `T2`
to either half the remaining TTL of `T1` or 30s - whatever is shorter.
This ensures that `T1` is still valid once KES switches to `T2`.

Signed-off-by: Andreas Auernhammer <[email protected]>
  • Loading branch information
aead committed Jan 29, 2025
1 parent be730c6 commit 9eb2d68
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions internal/keystore/vault/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ package vault
import (
"context"
"errors"
"net/http"
"os"
"path"
"strings"
Expand Down Expand Up @@ -195,8 +196,10 @@ func (c *client) RenewToken(ctx context.Context, authenticate authFunc, secret *
continue
}

renewIn := 80 * (ttl / 100) // Renew token after 80% of its TTL has passed
ttl = 0 // Set TTL to zero to trigger an immediate re-authentication in case of auth failure
renewIn := 80 * (ttl / 100) // Renew token after 80% of its TTL has passed
delay := min((ttl-renewIn)/2, Delay) // Delay usage of renewed token but not beyond expiry
ttl = 0

select {
case <-ctx.Done():
return
Expand All @@ -210,6 +213,9 @@ func (c *client) RenewToken(ctx context.Context, authenticate authFunc, secret *
if err == nil {
break
}
if resp, ok := err.(*vaultapi.ResponseError); ok && http.StatusBadRequest <= resp.StatusCode && resp.StatusCode < http.StatusInternalServerError {
break // Don't retry on 4xx responses
}
}
if s == nil {
s, _ = authenticate(ctx)
Expand All @@ -225,10 +231,12 @@ func (c *client) RenewToken(ctx context.Context, authenticate authFunc, secret *
// Wait before we use the new auth. token. This accounts
// for replication lag between the Vault nodes and allows
// them to sync the token across the entire cluster.
// However, we must not wait longer than the remaining lifetime
// of the currently used token.
select {
case <-ctx.Done():
return
case <-time.After(Delay):
case <-time.After(delay):
}
c.SetToken(token) // SetToken is safe to call from different go routines
}
Expand Down

0 comments on commit 9eb2d68

Please sign in to comment.