Cherry-pick from main to release-13.0 #7841

naisila · 2025-01-12T20:58:12Z

need PR to trigger tests

DESCRIPTION: citus_move_shard_placement now fails early when shard cannot be safely moved The implementation is quite simplistic - `citus_move_shard_placement(...)` will fail with an error if there's any new node in the cluster that doesn't have reference tables yet. It could have been finer-grained, i.e. erroring only when trying to move a shard to an unitialized node. Looking at the related functions - `replicate_reference_tables()` or `citus_rebalance_start()`, I think it's acceptable behaviour. These other functions also treat "any" unitialized node as a temporary anomaly. Fixes #7426 --------- Co-authored-by: Jelte Fennema-Nio <[email protected]>

Store the previous shard cost so that the invariant checking performs as expected.

… single-node cluster (#7552) This fixes #7551 reported by Egor Chindyaskin Function activate_node_snapshot() is not meant to be called on a cluster without worker nodes. This commit adds ERROR report for such case to prevent server crash.

…ion (#7534) Fixes #7533. DESCRIPTION: Fixes incorrect `VALID UNTIL` setting assumption made for roles when syncing them to new nodes

#7607) DESCRIPTION: Use macro wrapper to access PGPROC data, to improve compatibility with PostgreSQL forks.

DESCRIPTION: Add a check to see if the given limit is null. Fixes a bug by checking if the limit given in the query is null when the actual limit is computed with respect to the given offset. Prior to this change, null is interpreted as 0 during the limit calculation when both limit and offset are given. Fixes #7663

codecov · 2025-01-13T03:08:54Z

Codecov Report

Attention: Patch coverage is 86.99187% with 16 lines in your changes missing coverage. Please review.

Project coverage is 89.48%. Comparing base (ddef972) to head (c55bc8c).
Report is 14 commits behind head on release-13.0.

Additional details and impacted files

@@              Coverage Diff              @@
##           release-13.0    #7841   +/-   ##
=============================================
  Coverage         89.47%   89.48%           
=============================================
  Files               274      276    +2     
  Lines             59967    60063   +96     
  Branches           7506     7524   +18     
=============================================
+ Hits              53656    53747   +91     
+ Misses             4167     4166    -1     
- Partials           2144     2150    +6

… coordinator nodes concurrently (#7682) When multiple sessions concurrently attempt to add the same coordinator node using `citus_set_coordinator_host`, there is a potential race condition. Both sessions may pass the initial metadata check (`isCoordinatorInMetadata`), but only one will succeed in adding the node. The other session will fail with an assertion error (`Assert(!nodeAlreadyExists)`), causing the server to crash. Even though the `AddNodeMetadata` function takes an exclusive lock, it appears that the lock is not preventing the race condition before the initial metadata check. - **Issue**: The current logic allows concurrent sessions to pass the check for existing coordinators, leading to an attempt to insert duplicate nodes, which triggers the assertion failure. - **Impact**: This race condition leads to crashes during operations that involve concurrent coordinator additions, as seen in #7646. **Test Plan:** - Isolation Test Limitation: An isolation test was added to simulate concurrent additions of the same coordinator node, but due to the behavior of PostgreSQL locking mechanisms, the test does not trigger the edge case. The lock applied within the function serializes the operations, preventing the race condition from occurring in the isolation test environment. While the edge case is difficult to reproduce in an isolation test, the fix addresses the core issue by ensuring concurrency control through proper locking. - Existing Tests: All existing tests related to node metadata and coordinator management have been run to ensure that no regressions were introduced. **After the Fix:** - Concurrent attempts to add the same coordinator node will be serialized. One session will succeed in adding the node, while the others will skip the operation without crashing the server. Co-authored-by: Mehmet YILMAZ <[email protected]> (cherry picked from commit 4775715)

#7659) We were writing incorrect data to target collection in some cases of merge command. In case of repartition when source query is RELATION. We were referring to incorrect attribute number that was resulting into this incorrect behavior. Example : ![image](https://github.com/user-attachments/assets/a101cb36-7976-459c-befb-96a55a5b3dc1) ![image](https://github.com/user-attachments/assets/e5c83b7b-5b8e-4d79-a927-95684dc9ba49) I have added fixed tests as part of this PR , Thanks. (cherry picked from commit 5bad6c6)

…n may cause segfault #7705 In function MasterAggregateMutator(), when the original Node is a Var node use makeVar() instead of copyObject() when constructing the Var node for the target list of the combine query. The varnullingrels field of the original Var node is ignored because it is not relevant for the combine query; copying this cause the problem in issue 7705, where a coordinator query had a Var with a reference to a non-existent join relation. (cherry picked from commit c52f360)

Co-authored-by: Pavel Seleznev <[email protected]> (cherry picked from commit fe6d198)

…when the application_name changes (#7791) DESCRIPTION: Fixes a crash that happens because of unsafe catalog access when re-assigning the global pid after application_name changes. When application_name changes, we don't actually need to try re-assigning the global pid for external client backends because application_name doesn't affect the global pid for such backends. Plus, trying to re-assign the global pid for external client backends would unnecessarily cause performing a catalog access when the cached local node id is invalidated. However, accessing to the catalog tables is dangerous in certain situations like when we're not in a transaction block. And for the other types of backends, i.e., the Citus internal backends, we need to re-assign the global pid when the application_name changes because for such backends we simply extract the global pid inherited from the originating backend from the application_name -that's specified by originating backend when openning that connection- and this doesn't require catalog access. (cherry picked from commit 7341191)

…stributed transaction: Raise an ERROR instead of a crash (cherry picked from commit ab7c13b)

DESCRIPTION: Shard moves/isolate report LSN's in lsn format While investigating an issue with our catchup mechanism on certain postgres versions we noticed we print LSN's in the format of the native long type. This is an uncommon representation for LSN's in postgres logs. This patch changes the output of our log message to go from the long type representation to the native LSN type representation. Making it easier for postgres users to recognize and compare LSN's with other related reports. example of new output: ``` 2023-09-25 17:28:47.544 CEST [11345] LOG: The LSN of the target subscriptions on node localhost:9701 have increased from 0/0 to 0/E1ED20F8 at 2023-09-25 17:28:47.544165+02 where the source LSN is 1/415DCAD0 ``` (cherry picked from commit b87fbcb)

Propagates SECURITY LABEL ON ROLE stmt (#7304) We propagate `SECURITY LABEL [for provider] ON ROLE rolename IS labelname` to the worker nodes. We also make sure to run the relevant `SecLabelStmt` commands on a newly added node by looking at roles found in `pg_shseclabel`. See official docs for explanation on how this command works: https://www.postgresql.org/docs/current/sql-security-label.html This command stores the role label in the `pg_shseclabel` catalog table. This commit also fixes the regex string in `check_gucs_are_alphabetically_sorted.sh` script such that it escapes the dot. Previously it was looking for all strings starting with "citus" instead of "citus." as it should. To test this feature, I currently make use of a special GUC to control label provider registration in PG_init when creating the Citus extension. (cherry picked from commit 0d1f188) Co-authored-by: Naisila Puka <[email protected]> (cherry picked from commit 686d2b4)

crabhi and others added 6 commits January 12, 2025 22:54

Fix: store the previous shard cost for order verification (#7550)

e2a03ce

Store the previous shard cost so that the invariant checking performs as expected.

Fix incorrect "VALID UNTIL" assumption made for roles in node activat…

ef946e4

…ion (#7534) Fixes #7533. DESCRIPTION: Fixes incorrect `VALID UNTIL` setting assumption made for roles when syncing them to new nodes

Use macro wrapper to access PGPROC data, which allow to improve compa… (

5eb037a

#7607) DESCRIPTION: Use macro wrapper to access PGPROC data, to improve compatibility with PostgreSQL forks.

m3hm3t and others added 7 commits January 13, 2025 15:20

Remove warnings on some builds (#7680)

cdded25

Co-authored-by: Pavel Seleznev <[email protected]> (cherry picked from commit fe6d198)

For scenarios, such as, Bug 3697586: Server crashes when assigning di…

d2ca63f

…stributed transaction: Raise an ERROR instead of a crash (cherry picked from commit ab7c13b)

naisila force-pushed the release-13.0-naisila branch 2 times, most recently from 520d9d7 to 1468ae3 Compare January 13, 2025 15:07

naisila force-pushed the release-13.0-naisila branch from 1468ae3 to c55bc8c Compare January 13, 2025 16:56

naisila merged commit c55bc8c into release-13.0 Jan 13, 2025
153 of 155 checks passed

naisila deleted the release-13.0-naisila branch January 13, 2025 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick from main to release-13.0 #7841

Cherry-pick from main to release-13.0 #7841

naisila commented Jan 12, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading

Cherry-pick from main to release-13.0 #7841

Cherry-pick from main to release-13.0 #7841

Conversation

naisila commented Jan 12, 2025 • edited Loading

codecov bot commented Jan 13, 2025 • edited Loading

Codecov Report

naisila commented Jan 12, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading