-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RPC modifications to include the lock work #17697
Conversation
eb4ef7a
to
41e79b6
Compare
a00bbc8
to
ffae0f3
Compare
302ff54
to
15ac582
Compare
15ac582
to
a044238
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this work on @Juanadelacuesta, I've given it an initial high level pass with some inline comments as well as those below.
I think we will need different validation logic for the acquire/release lock. When attempting to acquire a lock, we would need to check whether the lock is held or not, before continuing with any potential state change. Releasing, would need to at least ensure the caller is the holder of the lock, before allowing it to continue. There are probably other caveats in this that I am missing.
There would also need to validation on the renew, to ensure the caller is the holder of the lock. I have made some additional notes about the renewal which might help decide how this implementation would look.
Give me a shout if you want to go over any of this in more detail.
aa9fc53
to
c769d5b
Compare
3ac1855
to
950749a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I've left a few comments but nothing super-blocking, especially as this is going to a feature branch and we can review the whole when we're done.
One recommendation in terms of reviewability for future PRs: this is a huge PR but a lot of the diffs are refactoring by moving large blocks of code around. That's fine! But it would help reviewing if you separate the refactoring into separate commits in the same PR rather than mingling it all together. That way reviewers can focus on the important changes.
nomad/variables_endpoint.go
Outdated
reply.Index = encryptedVar.ModifyIndex | ||
return nil | ||
} | ||
|
||
func isVarLocked(req *structs.VariablesApplyRequest, respVarMeta *structs.VariableMetadata) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, this is really checking if the variable is locked by someone else. The name of this method is a little confusing because we already have a method on Variable to check if the variable is locked.
t.Run("lock-acquire/read token/new", func(t *testing.T) { | ||
must.NotNil(t, svHold) | ||
sv := *sv3 | ||
|
||
sv.Items["upsert"] = "read" | ||
sv.VariableMetadata.Lock = &structs.VariableLock{ | ||
TTL: 24 * time.Hour, | ||
LockDelay: 15 * time.Second, | ||
} | ||
|
||
applyReq := structs.VariablesApplyRequest{ | ||
Op: structs.VarOpLockAcquire, | ||
Var: &sv, | ||
WriteRequest: structs.WriteRequest{ | ||
Region: "global", | ||
AuthToken: readToken.SecretID, | ||
}, | ||
} | ||
|
||
applyResp := new(structs.VariablesApplyResponse) | ||
err := msgpackrpc.CallWithCodec(codec, structs.VariablesApplyRPCMethod, &applyReq, &applyResp) | ||
must.EqError(t, err, structs.ErrPermissionDenied.Error()) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We moved a bunch of test code around but it looks like we're not adding new tests for the logic around whether or not to show the Lock ID for non-management tokens that successfully lock/release a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lock acquire was successful, it will always have the lock ID, and no matter if the release was successful or not, it will never return the lock ID, no matter if the token is management or not. The show or no show is more important for the list and get endpoints and this test is for the apply one. It was more for completing the ACL tests and verifying that it is correctly handled and nothing breaks.
} | ||
|
||
if aux.TTL != nil { | ||
switch v := aux.TTL.(type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this handle invalid JSON like TTL: []
safely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does:
~ % curl -X PUT -H 'Content-Type: application/json' -d '{"Lock": {"TTL":[], "LockDelay":"10s"}}' "http://localhost:4646/v1/var/nomad/jobs/blah?lock-acquire" | jq '.'
{
"CreateIndex": 12,
"CreateTime": 1690818288003482000,
"Items": {},
"Lock": {
"TTL": "15s",
"LockDelay": "10s",
"ID": "0a575be4-abe1-333f-440d-9b267085df1d"
},
"ModifyIndex": 16,
"ModifyTime": 1690818329917095000,
"Namespace": "default",
"Path": "nomad/jobs/blah"
}
badBuf := encodeBrokenReq(&svLA) | ||
|
||
req, err := http.NewRequest("PUT", "/v1/var/"+sv1.Path+"?cas=1&"+acquireLockQueryParam, badBuf) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For new tests we're trying to use shoenig/test
Co-authored-by: Tim Gross <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
* temp: remove when real work is added * func: add acquier/release to apply rpc function * Delete lock.go * func: add new rpc endpoint * func: add http wrapper for command agent * func: remove lock related item * fix: sync msg types * func: remove the lock key form the url * fix: fix the flow of the preapply function * remove raft message for renew lock * style: refactor and spell mistakes * func: separate validation for lock acquier * fix: update agent command for lock after rpc endpoint update from query to write * func: add test for validate lock * style: clean unsed code and update comments * func: remove raft message type for renew lock, not needed * fix: update logic to distinguish upsert variable and lock operation on the http endpoint * fix: adjust the look up for query params to only be done on put and posts to simplify the logic * fix: move the rpc metrics up on the renew lock to include auth errors * func: add acquire and release lock functions on fsm * func: fix the query for missing lock to avoid nil points in case of a missing lock * func: add happy path test for acquire and release flows on non existing variable * fix: solve rebase with rpc conflicts * func: add testing for release lock * fix: remove debug file * Update nomad/fsm.go Co-authored-by: Tim Gross <[email protected]> * func: change the return when lock ID is wrong from error to conflict Also hide the variable items * func: add lock ID verification for variable upserts * fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer * fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID * func: move lock ID check to the inner most tx function, not the CAS wrappers * func: remove the lock ID check from the CAS delete * func: allow for users to overide the variable data when locking, the responsibility is left for teh user * fix: make acquire lock function somewhat idempotente * func: add timer operations to variable locks * style: remove redundant code * Add acquire and release lock functions on the state store (#17794) * fix: move the rpc metrics up on the renew lock to include auth errors * func: add acquire and release lock functions on fsm * func: fix the query for missing lock to avoid nil points in case of a missing lock * func: add happy path test for acquire and release flows on non existing variable * fix: solve rebase with rpc conflicts * func: add testing for release lock * style: typo * fix: remove debug file * Update nomad/fsm.go Co-authored-by: Tim Gross <[email protected]> * func: change the return when lock ID is wrong from error to conflict Also hide the variable items * func: add lock ID verification for variable upserts * fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer * fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID * func: add tests for delete CAS with lock ID check * func: move lock ID check to the inner most tx function, not the CAS wrappers * fix: add lock not nil check on release to avoid nill pointers * style: function rename * func: remove the lock ID check from the CAS delete * func: allow for users to overide the variable data when locking, the responsibility is left for teh user --------- Co-authored-by: Tim Gross <[email protected]> * func: merge the timer functions with the leader functions and add lock check for var apply * fix: add a restriction for CAS and lock, only one at the time and avoid overwriting data when lock release * fix: add existing timer check for lock acquire * func: add tests for the creation and renewal of the timers * fix: add path to renew lock http handler * fix: avoif panics by reading path and namespace from req and not response * func: add functional tests to lock * fix: remove duplicated code after rebase * fix: remove the lock info from the correct return value on listing and add testing * fix: update test after making TimerNum public * func: only return values after the renew was successful, and add tests * style: typo * fix: add empty lock when no lock is provided for a lock acquire to avoid panics * fix: use lockID function instead of calling the lock ID directly to avoid panics * func: add tests for command agent lock actions * Update nomad/variables_endpoint.go Co-authored-by: Tim Gross <[email protected]> * Update nomad/variables_endpoint.go Co-authored-by: Tim Gross <[email protected]> * fix: typo --------- Co-authored-by: James Rasell <[email protected]> Co-authored-by: Tim Gross <[email protected]>
* temp: remove when real work is added * func: add acquier/release to apply rpc function * Delete lock.go * func: add new rpc endpoint * func: add http wrapper for command agent * func: remove lock related item * fix: sync msg types * func: remove the lock key form the url * fix: fix the flow of the preapply function * remove raft message for renew lock * style: refactor and spell mistakes * func: separate validation for lock acquier * fix: update agent command for lock after rpc endpoint update from query to write * func: add test for validate lock * style: clean unsed code and update comments * func: remove raft message type for renew lock, not needed * fix: update logic to distinguish upsert variable and lock operation on the http endpoint * fix: adjust the look up for query params to only be done on put and posts to simplify the logic * fix: move the rpc metrics up on the renew lock to include auth errors * func: add acquire and release lock functions on fsm * func: fix the query for missing lock to avoid nil points in case of a missing lock * func: add happy path test for acquire and release flows on non existing variable * fix: solve rebase with rpc conflicts * func: add testing for release lock * fix: remove debug file * Update nomad/fsm.go Co-authored-by: Tim Gross <[email protected]> * func: change the return when lock ID is wrong from error to conflict Also hide the variable items * func: add lock ID verification for variable upserts * fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer * fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID * func: move lock ID check to the inner most tx function, not the CAS wrappers * func: remove the lock ID check from the CAS delete * func: allow for users to overide the variable data when locking, the responsibility is left for teh user * fix: make acquire lock function somewhat idempotente * func: add timer operations to variable locks * style: remove redundant code * Add acquire and release lock functions on the state store (#17794) * fix: move the rpc metrics up on the renew lock to include auth errors * func: add acquire and release lock functions on fsm * func: fix the query for missing lock to avoid nil points in case of a missing lock * func: add happy path test for acquire and release flows on non existing variable * fix: solve rebase with rpc conflicts * func: add testing for release lock * style: typo * fix: remove debug file * Update nomad/fsm.go Co-authored-by: Tim Gross <[email protected]> * func: change the return when lock ID is wrong from error to conflict Also hide the variable items * func: add lock ID verification for variable upserts * fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer * fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID * func: add tests for delete CAS with lock ID check * func: move lock ID check to the inner most tx function, not the CAS wrappers * fix: add lock not nil check on release to avoid nill pointers * style: function rename * func: remove the lock ID check from the CAS delete * func: allow for users to overide the variable data when locking, the responsibility is left for teh user --------- Co-authored-by: Tim Gross <[email protected]> * func: merge the timer functions with the leader functions and add lock check for var apply * fix: add a restriction for CAS and lock, only one at the time and avoid overwriting data when lock release * fix: add existing timer check for lock acquire * func: add tests for the creation and renewal of the timers * fix: add path to renew lock http handler * fix: avoif panics by reading path and namespace from req and not response * func: add functional tests to lock * fix: remove duplicated code after rebase * fix: remove the lock info from the correct return value on listing and add testing * fix: update test after making TimerNum public * func: only return values after the renew was successful, and add tests * style: typo * fix: add empty lock when no lock is provided for a lock acquire to avoid panics * fix: use lockID function instead of calling the lock ID directly to avoid panics * func: add tests for command agent lock actions * Update nomad/variables_endpoint.go Co-authored-by: Tim Gross <[email protected]> * Update nomad/variables_endpoint.go Co-authored-by: Tim Gross <[email protected]> * fix: typo --------- Co-authored-by: James Rasell <[email protected]> Co-authored-by: Tim Gross <[email protected]>
* temp: remove when real work is added * func: add acquier/release to apply rpc function * Delete lock.go * func: add new rpc endpoint * func: add http wrapper for command agent * func: remove lock related item * fix: sync msg types * func: remove the lock key form the url * fix: fix the flow of the preapply function * remove raft message for renew lock * style: refactor and spell mistakes * func: separate validation for lock acquier * fix: update agent command for lock after rpc endpoint update from query to write * func: add test for validate lock * style: clean unsed code and update comments * func: remove raft message type for renew lock, not needed * fix: update logic to distinguish upsert variable and lock operation on the http endpoint * fix: adjust the look up for query params to only be done on put and posts to simplify the logic * fix: move the rpc metrics up on the renew lock to include auth errors * func: add acquire and release lock functions on fsm * func: fix the query for missing lock to avoid nil points in case of a missing lock * func: add happy path test for acquire and release flows on non existing variable * fix: solve rebase with rpc conflicts * func: add testing for release lock * fix: remove debug file * Update nomad/fsm.go Co-authored-by: Tim Gross <[email protected]> * func: change the return when lock ID is wrong from error to conflict Also hide the variable items * func: add lock ID verification for variable upserts * fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer * fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID * func: move lock ID check to the inner most tx function, not the CAS wrappers * func: remove the lock ID check from the CAS delete * func: allow for users to overide the variable data when locking, the responsibility is left for teh user * fix: make acquire lock function somewhat idempotente * func: add timer operations to variable locks * style: remove redundant code * Add acquire and release lock functions on the state store (#17794) * fix: move the rpc metrics up on the renew lock to include auth errors * func: add acquire and release lock functions on fsm * func: fix the query for missing lock to avoid nil points in case of a missing lock * func: add happy path test for acquire and release flows on non existing variable * fix: solve rebase with rpc conflicts * func: add testing for release lock * style: typo * fix: remove debug file * Update nomad/fsm.go Co-authored-by: Tim Gross <[email protected]> * func: change the return when lock ID is wrong from error to conflict Also hide the variable items * func: add lock ID verification for variable upserts * fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer * fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID * func: add tests for delete CAS with lock ID check * func: move lock ID check to the inner most tx function, not the CAS wrappers * fix: add lock not nil check on release to avoid nill pointers * style: function rename * func: remove the lock ID check from the CAS delete * func: allow for users to overide the variable data when locking, the responsibility is left for teh user --------- Co-authored-by: Tim Gross <[email protected]> * func: merge the timer functions with the leader functions and add lock check for var apply * fix: add a restriction for CAS and lock, only one at the time and avoid overwriting data when lock release * fix: add existing timer check for lock acquire * func: add tests for the creation and renewal of the timers * fix: add path to renew lock http handler * fix: avoif panics by reading path and namespace from req and not response * func: add functional tests to lock * fix: remove duplicated code after rebase * fix: remove the lock info from the correct return value on listing and add testing * fix: update test after making TimerNum public * func: only return values after the renew was successful, and add tests * style: typo * fix: add empty lock when no lock is provided for a lock acquire to avoid panics * fix: use lockID function instead of calling the lock ID directly to avoid panics * func: add tests for command agent lock actions * Update nomad/variables_endpoint.go Co-authored-by: Tim Gross <[email protected]> * Update nomad/variables_endpoint.go Co-authored-by: Tim Gross <[email protected]> * fix: typo --------- Co-authored-by: James Rasell <[email protected]> Co-authored-by: Tim Gross <[email protected]>
There are no tests in this PR because some parts of the core implementation are missing, they will be added further on.
Related: #17449
Targets: feature branch