Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RPC modifications to include the lock work #17697

Merged
merged 55 commits into from
Jul 31, 2023
Merged

Conversation

Juanadelacuesta
Copy link
Member

@Juanadelacuesta Juanadelacuesta commented Jun 23, 2023

There are no tests in this PR because some parts of the core implementation are missing, they will be added further on.

Related: #17449
Targets: feature branch

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this work on @Juanadelacuesta, I've given it an initial high level pass with some inline comments as well as those below.

I think we will need different validation logic for the acquire/release lock. When attempting to acquire a lock, we would need to check whether the lock is held or not, before continuing with any potential state change. Releasing, would need to at least ensure the caller is the holder of the lock, before allowing it to continue. There are probably other caveats in this that I am missing.

There would also need to validation on the renew, to ensure the caller is the holder of the lock. I have made some additional notes about the renewal which might help decide how this implementation would look.

Give me a shout if you want to go over any of this in more detail.

nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
nomad/structs/variables.go Outdated Show resolved Hide resolved
nomad/structs/variables.go Outdated Show resolved Hide resolved
nomad/structs/variables.go Outdated Show resolved Hide resolved
nomad/structs/variables.go Outdated Show resolved Hide resolved
nomad/variables_endpoint.go Outdated Show resolved Hide resolved
nomad/variables_endpoint.go Outdated Show resolved Hide resolved
command/agent/variable_endpoint.go Outdated Show resolved Hide resolved
command/agent/variable_endpoint.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I've left a few comments but nothing super-blocking, especially as this is going to a feature branch and we can review the whole when we're done.

One recommendation in terms of reviewability for future PRs: this is a huge PR but a lot of the diffs are refactoring by moving large blocks of code around. That's fine! But it would help reviewing if you separate the refactoring into separate commits in the same PR rather than mingling it all together. That way reviewers can focus on the important changes.

nomad/variables_endpoint.go Outdated Show resolved Hide resolved
nomad/variables_endpoint.go Outdated Show resolved Hide resolved
reply.Index = encryptedVar.ModifyIndex
return nil
}

func isVarLocked(req *structs.VariablesApplyRequest, respVarMeta *structs.VariableMetadata) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding correctly, this is really checking if the variable is locked by someone else. The name of this method is a little confusing because we already have a method on Variable to check if the variable is locked.

Comment on lines +265 to +287
t.Run("lock-acquire/read token/new", func(t *testing.T) {
must.NotNil(t, svHold)
sv := *sv3

sv.Items["upsert"] = "read"
sv.VariableMetadata.Lock = &structs.VariableLock{
TTL: 24 * time.Hour,
LockDelay: 15 * time.Second,
}

applyReq := structs.VariablesApplyRequest{
Op: structs.VarOpLockAcquire,
Var: &sv,
WriteRequest: structs.WriteRequest{
Region: "global",
AuthToken: readToken.SecretID,
},
}

applyResp := new(structs.VariablesApplyResponse)
err := msgpackrpc.CallWithCodec(codec, structs.VariablesApplyRPCMethod, &applyReq, &applyResp)
must.EqError(t, err, structs.ErrPermissionDenied.Error())
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved a bunch of test code around but it looks like we're not adding new tests for the logic around whether or not to show the Lock ID for non-management tokens that successfully lock/release a lock.

Copy link
Member Author

@Juanadelacuesta Juanadelacuesta Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lock acquire was successful, it will always have the lock ID, and no matter if the release was successful or not, it will never return the lock ID, no matter if the token is management or not. The show or no show is more important for the list and get endpoints and this test is for the apply one. It was more for completing the ACL tests and verifying that it is correctly handled and nothing breaks.

}

if aux.TTL != nil {
switch v := aux.TTL.(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle invalid JSON like TTL: [] safely?

Copy link
Member Author

@Juanadelacuesta Juanadelacuesta Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does:

~ % curl -X PUT -H 'Content-Type: application/json' -d '{"Lock": {"TTL":[], "LockDelay":"10s"}}' "http://localhost:4646/v1/var/nomad/jobs/blah?lock-acquire" | jq '.'

{
 "CreateIndex": 12,
 "CreateTime": 1690818288003482000,
 "Items": {},
 "Lock": {
   "TTL": "15s",
   "LockDelay": "10s",
   "ID": "0a575be4-abe1-333f-440d-9b267085df1d"
 },
 "ModifyIndex": 16,
 "ModifyTime": 1690818329917095000,
 "Namespace": "default",
 "Path": "nomad/jobs/blah"
}

badBuf := encodeBrokenReq(&svLA)

req, err := http.NewRequest("PUT", "/v1/var/"+sv1.Path+"?cas=1&"+acquireLockQueryParam, badBuf)
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For new tests we're trying to use shoenig/test

@Juanadelacuesta Juanadelacuesta merged commit 15696f5 into f-gh-17449 Jul 31, 2023
19 checks passed
@Juanadelacuesta Juanadelacuesta deleted the f-gh-17449-rpc branch July 31, 2023 16:07
Juanadelacuesta added a commit that referenced this pull request Sep 12, 2023
* temp: remove when real work is added

* func: add acquier/release to apply rpc function

* Delete lock.go

* func: add new rpc endpoint

* func: add http wrapper for command agent

* func: remove lock related item

* fix: sync msg types

* func: remove the lock key form the url

* fix: fix the flow of the preapply function

* remove raft message for renew lock

* style: refactor and spell mistakes

* func: separate validation for lock acquier

* fix: update agent command for lock after rpc endpoint update from query to write

* func: add test for validate lock

* style: clean unsed code and update comments

* func: remove raft message type for renew lock, not needed

* fix: update logic to distinguish upsert variable and lock operation on the http endpoint

* fix: adjust the look up for query params to only be done on put and posts to simplify the logic

* fix: move the rpc metrics up on the renew lock to include auth errors

* func: add acquire and release lock functions on fsm

* func: fix the query for missing lock to avoid nil points in case of a missing lock

* func: add happy path test for acquire and release flows on non existing variable

* fix: solve rebase with rpc conflicts

* func: add testing for release lock

* fix: remove debug file

* Update nomad/fsm.go

Co-authored-by: Tim Gross <[email protected]>

* func: change the return when lock ID is wrong from error to conflict
Also hide the variable items

* func: add lock ID verification for variable upserts

* fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer

* fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID

* func: move lock ID check to the inner most tx function, not the CAS wrappers

* func: remove the lock ID check from the CAS delete

* func: allow for users to overide the variable data when locking, the responsibility is left for teh user

* fix: make acquire lock function somewhat idempotente

* func: add timer operations to variable locks

* style: remove redundant code

* Add acquire and release lock functions on the state store (#17794)

* fix: move the rpc metrics up on the renew lock to include auth errors

* func: add acquire and release lock functions on fsm

* func: fix the query for missing lock to avoid nil points in case of a missing lock

* func: add happy path test for acquire and release flows on non existing variable

* fix: solve rebase with rpc conflicts

* func: add testing for release lock

* style: typo

* fix: remove debug file

* Update nomad/fsm.go

Co-authored-by: Tim Gross <[email protected]>

* func: change the return when lock ID is wrong from error to conflict
Also hide the variable items

* func: add lock ID verification for variable upserts

* fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer

* fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID

* func: add tests for delete CAS with lock ID check

* func: move lock ID check to the inner most tx function, not the CAS wrappers

* fix: add lock not nil check on release to avoid nill pointers

* style: function rename

* func: remove the lock ID check from the CAS delete

* func: allow for users to overide the variable data when locking, the responsibility is left for teh user

---------

Co-authored-by: Tim Gross <[email protected]>

* func: merge the timer functions with the leader functions and add lock check for var apply

* fix: add a restriction for CAS and lock, only one at the time and avoid overwriting data when lock release

* fix: add existing timer check for lock acquire

* func: add tests for the creation and renewal of the timers

* fix: add path to renew lock http handler

* fix: avoif panics by reading path and namespace from req and not response

* func: add functional tests to lock

* fix: remove duplicated code after rebase

* fix: remove the lock info from the correct return value on listing and add testing

* fix: update test after making TimerNum public

* func: only return values after the renew was successful, and add tests

* style: typo

* fix: add empty lock when no lock is provided for a lock acquire to avoid panics

* fix: use lockID function instead of calling the lock ID directly to avoid panics

* func: add tests for command agent lock actions

* Update nomad/variables_endpoint.go

Co-authored-by: Tim Gross <[email protected]>

* Update nomad/variables_endpoint.go

Co-authored-by: Tim Gross <[email protected]>

* fix: typo

---------

Co-authored-by: James Rasell <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
Juanadelacuesta added a commit that referenced this pull request Sep 13, 2023
* temp: remove when real work is added

* func: add acquier/release to apply rpc function

* Delete lock.go

* func: add new rpc endpoint

* func: add http wrapper for command agent

* func: remove lock related item

* fix: sync msg types

* func: remove the lock key form the url

* fix: fix the flow of the preapply function

* remove raft message for renew lock

* style: refactor and spell mistakes

* func: separate validation for lock acquier

* fix: update agent command for lock after rpc endpoint update from query to write

* func: add test for validate lock

* style: clean unsed code and update comments

* func: remove raft message type for renew lock, not needed

* fix: update logic to distinguish upsert variable and lock operation on the http endpoint

* fix: adjust the look up for query params to only be done on put and posts to simplify the logic

* fix: move the rpc metrics up on the renew lock to include auth errors

* func: add acquire and release lock functions on fsm

* func: fix the query for missing lock to avoid nil points in case of a missing lock

* func: add happy path test for acquire and release flows on non existing variable

* fix: solve rebase with rpc conflicts

* func: add testing for release lock

* fix: remove debug file

* Update nomad/fsm.go

Co-authored-by: Tim Gross <[email protected]>

* func: change the return when lock ID is wrong from error to conflict
Also hide the variable items

* func: add lock ID verification for variable upserts

* fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer

* fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID

* func: move lock ID check to the inner most tx function, not the CAS wrappers

* func: remove the lock ID check from the CAS delete

* func: allow for users to overide the variable data when locking, the responsibility is left for teh user

* fix: make acquire lock function somewhat idempotente

* func: add timer operations to variable locks

* style: remove redundant code

* Add acquire and release lock functions on the state store (#17794)

* fix: move the rpc metrics up on the renew lock to include auth errors

* func: add acquire and release lock functions on fsm

* func: fix the query for missing lock to avoid nil points in case of a missing lock

* func: add happy path test for acquire and release flows on non existing variable

* fix: solve rebase with rpc conflicts

* func: add testing for release lock

* style: typo

* fix: remove debug file

* Update nomad/fsm.go

Co-authored-by: Tim Gross <[email protected]>

* func: change the return when lock ID is wrong from error to conflict
Also hide the variable items

* func: add lock ID verification for variable upserts

* fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer

* fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID

* func: add tests for delete CAS with lock ID check

* func: move lock ID check to the inner most tx function, not the CAS wrappers

* fix: add lock not nil check on release to avoid nill pointers

* style: function rename

* func: remove the lock ID check from the CAS delete

* func: allow for users to overide the variable data when locking, the responsibility is left for teh user

---------

Co-authored-by: Tim Gross <[email protected]>

* func: merge the timer functions with the leader functions and add lock check for var apply

* fix: add a restriction for CAS and lock, only one at the time and avoid overwriting data when lock release

* fix: add existing timer check for lock acquire

* func: add tests for the creation and renewal of the timers

* fix: add path to renew lock http handler

* fix: avoif panics by reading path and namespace from req and not response

* func: add functional tests to lock

* fix: remove duplicated code after rebase

* fix: remove the lock info from the correct return value on listing and add testing

* fix: update test after making TimerNum public

* func: only return values after the renew was successful, and add tests

* style: typo

* fix: add empty lock when no lock is provided for a lock acquire to avoid panics

* fix: use lockID function instead of calling the lock ID directly to avoid panics

* func: add tests for command agent lock actions

* Update nomad/variables_endpoint.go

Co-authored-by: Tim Gross <[email protected]>

* Update nomad/variables_endpoint.go

Co-authored-by: Tim Gross <[email protected]>

* fix: typo

---------

Co-authored-by: James Rasell <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
Juanadelacuesta added a commit that referenced this pull request Sep 15, 2023
* temp: remove when real work is added

* func: add acquier/release to apply rpc function

* Delete lock.go

* func: add new rpc endpoint

* func: add http wrapper for command agent

* func: remove lock related item

* fix: sync msg types

* func: remove the lock key form the url

* fix: fix the flow of the preapply function

* remove raft message for renew lock

* style: refactor and spell mistakes

* func: separate validation for lock acquier

* fix: update agent command for lock after rpc endpoint update from query to write

* func: add test for validate lock

* style: clean unsed code and update comments

* func: remove raft message type for renew lock, not needed

* fix: update logic to distinguish upsert variable and lock operation on the http endpoint

* fix: adjust the look up for query params to only be done on put and posts to simplify the logic

* fix: move the rpc metrics up on the renew lock to include auth errors

* func: add acquire and release lock functions on fsm

* func: fix the query for missing lock to avoid nil points in case of a missing lock

* func: add happy path test for acquire and release flows on non existing variable

* fix: solve rebase with rpc conflicts

* func: add testing for release lock

* fix: remove debug file

* Update nomad/fsm.go

Co-authored-by: Tim Gross <[email protected]>

* func: change the return when lock ID is wrong from error to conflict
Also hide the variable items

* func: add lock ID verification for variable upserts

* fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer

* fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID

* func: move lock ID check to the inner most tx function, not the CAS wrappers

* func: remove the lock ID check from the CAS delete

* func: allow for users to overide the variable data when locking, the responsibility is left for teh user

* fix: make acquire lock function somewhat idempotente

* func: add timer operations to variable locks

* style: remove redundant code

* Add acquire and release lock functions on the state store (#17794)

* fix: move the rpc metrics up on the renew lock to include auth errors

* func: add acquire and release lock functions on fsm

* func: fix the query for missing lock to avoid nil points in case of a missing lock

* func: add happy path test for acquire and release flows on non existing variable

* fix: solve rebase with rpc conflicts

* func: add testing for release lock

* style: typo

* fix: remove debug file

* Update nomad/fsm.go

Co-authored-by: Tim Gross <[email protected]>

* func: change the return when lock ID is wrong from error to conflict
Also hide the variable items

* func: add lock ID verification for variable upserts

* fix: check for empty lock instead of empty lock ID on setCas to avoid nil pointer

* fix: separate the table update for the lock release, the var update cant be used because of the missing lock ID

* func: add tests for delete CAS with lock ID check

* func: move lock ID check to the inner most tx function, not the CAS wrappers

* fix: add lock not nil check on release to avoid nill pointers

* style: function rename

* func: remove the lock ID check from the CAS delete

* func: allow for users to overide the variable data when locking, the responsibility is left for teh user

---------

Co-authored-by: Tim Gross <[email protected]>

* func: merge the timer functions with the leader functions and add lock check for var apply

* fix: add a restriction for CAS and lock, only one at the time and avoid overwriting data when lock release

* fix: add existing timer check for lock acquire

* func: add tests for the creation and renewal of the timers

* fix: add path to renew lock http handler

* fix: avoif panics by reading path and namespace from req and not response

* func: add functional tests to lock

* fix: remove duplicated code after rebase

* fix: remove the lock info from the correct return value on listing and add testing

* fix: update test after making TimerNum public

* func: only return values after the renew was successful, and add tests

* style: typo

* fix: add empty lock when no lock is provided for a lock acquire to avoid panics

* fix: use lockID function instead of calling the lock ID directly to avoid panics

* func: add tests for command agent lock actions

* Update nomad/variables_endpoint.go

Co-authored-by: Tim Gross <[email protected]>

* Update nomad/variables_endpoint.go

Co-authored-by: Tim Gross <[email protected]>

* fix: typo

---------

Co-authored-by: James Rasell <[email protected]>
Co-authored-by: Tim Gross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants