-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds citus_pause_node udf #7089
Conversation
Codecov Report
@@ Coverage Diff @@
## main #7089 +/- ##
=======================================
Coverage 93.20% 93.21%
=======================================
Files 274 274
Lines 59211 59230 +19
=======================================
+ Hits 55187 55210 +23
+ Misses 4024 4020 -4 |
|
||
|
||
BackgroundWorkerHandle * | ||
CheckBackgroundWorkerToObtainLocks(int32 lock_cooldown) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function needs a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment
} | ||
|
||
permutation "s1-begin" "s1-pause-node" "s2-begin" "s2-insert-distributed" "s2-end" "s1-end" | ||
permutation "s1-begin" "s1-pause-node-force" "s2-begin" "s2-insert-distributed" "s2-end" "s1-end" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's show the difference between force and non-force using these tests:
-- s1-pause-node should block
"s1-begin" "s2-begin" "s2-insert-distributed" "s1-pause-node" "s2-end" "s1-end"
-- s1-pause-node-force
"s1-begin" "s2-begin" "s2-insert-distributed" "s1-pause-node-force"(*) "s1-end" "s2-end"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not make it work this way. I changed the order. Could you check the results?
-- Set statement_timeout for the session (in milliseconds) | ||
SET client_min_messages = 'notice'; | ||
|
||
-- Variable to track if the INSERT statement was successful | ||
DO $$ | ||
BEGIN | ||
|
||
-- Execute the INSERT statement | ||
insert into employee values(11,'e11',3); | ||
|
||
END; | ||
|
||
$$ | ||
LANGUAGE plpgsql; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This insert step can now contain only a single insert statement afaict (same for the insert-reference and delete-distributed step below).
-- Set statement_timeout for the session (in milliseconds) | |
SET client_min_messages = 'notice'; | |
-- Variable to track if the INSERT statement was successful | |
DO $$ | |
BEGIN | |
-- Execute the INSERT statement | |
insert into employee values(11,'e11',3); | |
END; | |
$$ | |
LANGUAGE plpgsql; | |
insert into employee values(11,'e11',3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
SET client_min_messages = 'notice'; | ||
DO $$ | ||
DECLARE | ||
v_shard_id int; | ||
v_node_id int; | ||
v_node_name text; | ||
v_node_port int; | ||
BEGIN | ||
--The first message in the block is being printed on the top of the code block. So adding a dummy message | ||
--to make sure that the first message is printed in correct place. | ||
raise notice ''; | ||
-- Get the shard id for the distribution column | ||
SELECT get_shard_id_for_distribution_column('employee', 3) into v_shard_id; | ||
|
||
--Get the node id for the shard id | ||
SELECT nodename,nodeport into v_node_name,v_node_port FROM citus_shards WHERE shardid = v_shard_id limit 1; | ||
raise notice 'node name is %',v_node_name; | ||
raise notice 'node port is %',v_node_port; | ||
|
||
-- Get the node id for the shard id | ||
SELECT nodeid into v_node_id FROM pg_dist_node WHERE nodename = v_node_name and nodeport = v_node_port limit 1; | ||
|
||
|
||
-- Pause the node | ||
perform pg_catalog.citus_pause_node_within_txn(v_node_id) ; | ||
END; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these notices are very helpful now. Let's remove them.
SET client_min_messages = 'notice'; | |
DO $$ | |
DECLARE | |
v_shard_id int; | |
v_node_id int; | |
v_node_name text; | |
v_node_port int; | |
BEGIN | |
--The first message in the block is being printed on the top of the code block. So adding a dummy message | |
--to make sure that the first message is printed in correct place. | |
raise notice ''; | |
-- Get the shard id for the distribution column | |
SELECT get_shard_id_for_distribution_column('employee', 3) into v_shard_id; | |
--Get the node id for the shard id | |
SELECT nodename,nodeport into v_node_name,v_node_port FROM citus_shards WHERE shardid = v_shard_id limit 1; | |
raise notice 'node name is %',v_node_name; | |
raise notice 'node port is %',v_node_port; | |
-- Get the node id for the shard id | |
SELECT nodeid into v_node_id FROM pg_dist_node WHERE nodename = v_node_name and nodeport = v_node_port limit 1; | |
-- Pause the node | |
perform pg_catalog.citus_pause_node_within_txn(v_node_id) ; | |
END; | |
DO $$ | |
DECLARE | |
v_shard_id int; | |
v_node_id int; | |
v_node_name text; | |
v_node_port int; | |
BEGIN | |
-- Get the shard id for the distribution column | |
SELECT get_shard_id_for_distribution_column('employee', 3) into v_shard_id; | |
--Get the node id for the shard id | |
SELECT nodename,nodeport into v_node_name,v_node_port FROM citus_shards WHERE shardid = v_shard_id limit 1; | |
-- Get the node id for the shard id | |
SELECT nodeid into v_node_id FROM pg_dist_node WHERE nodename = v_node_name and nodeport = v_node_port limit 1; | |
-- Pause the node | |
perform pg_catalog.citus_pause_node_within_txn(v_node_id) ; | |
END; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
BEGIN | ||
select nextval('pg_dist_node_nodeid_seq')::int into v_node_id; | ||
select citus_pause_node_within_txn(v_node_id) ; | ||
EXCEPTION | ||
WHEN SQLSTATE 'P0002' THEN | ||
GET STACKED DIAGNOSTICS v_exception_message = MESSAGE_TEXT; | ||
v_expected_exception_message := 'node ' || v_node_id || ' not found'; | ||
if v_exception_message = v_expected_exception_message then | ||
RAISE NOTICE 'Node not found.'; | ||
end if; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when you don't transform the ERROR to a NOTICE like this:
BEGIN | |
select nextval('pg_dist_node_nodeid_seq')::int into v_node_id; | |
select citus_pause_node_within_txn(v_node_id) ; | |
EXCEPTION | |
WHEN SQLSTATE 'P0002' THEN | |
GET STACKED DIAGNOSTICS v_exception_message = MESSAGE_TEXT; | |
v_expected_exception_message := 'node ' || v_node_id || ' not found'; | |
if v_exception_message = v_expected_exception_message then | |
RAISE NOTICE 'Node not found.'; | |
end if; | |
BEGIN | |
select nextval('pg_dist_node_nodeid_seq')::int into v_node_id; | |
select citus_pause_node_within_txn(v_node_id) ; |
I think we'll see the error in the test output, which would be a fine way to test, and reduced the testing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node id is a changing value and error message includes node id. Therefore, I'm adding this exception to have a constant error message
permutation "s1-begin" "s2-begin" "s1-pause-node" "s2-insert-reference" "s1-end" "s2-end" | ||
permutation "s1-begin" "s1-pause-node" "s1-pause-node" "s1-end" | ||
permutation "s1-begin" "s1-node-not-found" "s1-end" | ||
permutation "s1-begin" "s2-begin" "s2-insert-distributed" "s1-pause-node-force"(*) "s2-end" "s1-end" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be swapped to test that the "force" argument kills the s2 query
permutation "s1-begin" "s2-begin" "s2-insert-distributed" "s1-pause-node-force"(*) "s2-end" "s1-end" | |
permutation "s1-begin" "s2-begin" "s2-insert-distributed" "s1-pause-node-force"(*) "s1-end" "s2-end" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test isolation_setup ... ok 31 ms
test isolation_cluster_management ... ok 77 ms
test isolation_citus_pause_node ... FAILED 11001 ms
In that case it takes 10 seconds to test the case. If we use so, I need to give a timeout; e.g. 1000 ms
Here in this scneario, there are two problems
- tests take too much time and set statement_timeout does not work
- I need to put two expected test result file since in some cases below value is printed twice
FATAL: terminating connection due to administrator command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used lock_cooldown parameter with the suggestion of @JelteF and it worked
DESCRIPTION: Presenting citus_pause_node UDF enabling pausing by node_id. citus_pause_node takes a node_id parameter and fetches all the shards in that node and puts AccessExclusiveLock on all the shards inside that node. With this lock, insert is disabled, until citus_pause_node transaction is closed. --------- Co-authored-by: Hanefi Onaldi <[email protected]>
DESCRIPTION: Presenting citus_pause_node UDF enabling pausing by node_id.
citus_pause_node takes a node_id parameter and fetches all the shards in that node and puts AccessExclusiveLock on all the shards inside that node. With this lock, insert is disabled, until citus_pause_node transaction is closed.