Cherry-pick from main to release-13.0 #7840

naisila · 2025-01-10T21:30:13Z

No description provided.

codecov · 2025-01-10T21:33:56Z

Codecov Report

Attention: Patch coverage is 81.81818% with 6 lines in your changes missing coverage. Please review.

Please upload report for BASE (release-13.0@3e924db). Learn more about missing BASE report.

Additional details and impacted files

@@               Coverage Diff               @@
##             release-13.0    #7840   +/-   ##
===============================================
  Coverage                ?   89.47%           
===============================================
  Files                   ?      274           
  Lines                   ?    59967           
  Branches                ?     7506           
===============================================
  Hits                    ?    53656           
  Misses                  ?     4167           
  Partials                ?     2144

…ceiver timeouts during large shard splits. (#7229) DESCRIPTION: Send keepalive messages during the logical replication phase of large shard splits to avoid timeouts. During the logical replication part of the shard split process, split decoder filters out the wal records produced by the initial copy. If the number of wal records is big, then split decoder ends up processing for a long time before sending out any wal records through pgoutput. Hence the wal receiver may time out and restarts repeatedly causing our split driver code catch up logic to fail. Notes: 1. If the wal_receiver_timeout is set to a very small number e.g. 600ms, it may time out before receiving the keepalives. My tests show that this code works best when the` wal_receiver_timeout `is set to 1minute, which is the default value. 2. Once a logical replication worker time outs, a new one gets launched. The new logical replication worker sets the pg_stat_subscription columns to initial values. E.g. the latest_end_lsn is set to 0. Our driver logic in `WaitForGroupedLogicalRepTargetsToCatchUp` can not handle LSN value to go back. This is the main reason for it to get stuck in the infinite loop. (cherry picked from commit e9035f6)

When executing a prepared CALL, which is not pure SQL but available with some drivers like npgsql and jpgdbc, Citus entered a code path where a plan is not defined, while trying to increase its cost. Thus SIG11 when plan is a NULL pointer. Fix by only increasing plan cost when plan is not null. However, it is a bit suspicious to get here with a NULL plan and maybe a better change will be to not call ShardPlacementForFunctionColocatedWithDistTable() with a NULL plan at all (in call.c:134) bug hit with for example: ``` CallableStatement proc = con.prepareCall("{CALL p(?)}"); proc.registerOutParameter(1, java.sql.Types.BIGINT); proc.setInt(1, -100); proc.execute(); ``` where `p(bigint)` is a distributed "function" and the param the distribution key (also in a distributed table), see #7242 for details Fixes #7242 (cherry picked from commit 0678a2f)

This fixes #7230. First of all, using HeapTupleHeaderGetDatumLength(heapTuple) is definetly wrong, it gives a number that's 4 times less than the correct tuple size (heapTuple.t_len). See https://github.com/postgres/postgres/blob/REL_16_0/src/include/access/htup_details.h#L455-L456 https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L279 https://github.com/postgres/postgres/blob/REL_16_0/src/include/varatt.h#L225-L226 When I fixed it, the limit_intermediate_size test failed, so I tried to understand what's going on there. In original commit fd546cf these queries were supposed to fail. Then in b3af63c three of the queries that were supposed to fail suddenly worked and tests were changed to pass without understanding why the output had changed or how to keep test testing what it had to test. Even comments saying that these queries should fail were left untouched. Commit message gives no clue about why exactly test has changed: > It seems that when we use adaptive executor instead of task tracker, we > exceed the intermediate result size less in the test. Therefore updated > the tests accordingly. Then 3fda2c3 also blindly raised the limit for one of the queries to keep it working: 3fda2c3#diff-a9b7b617f9dfd345318cb8987d5897143ca1b723c87b81049bbadd94dcc86570R19 When in fe3caf3 that HeapTupleHeaderGetDatumLength(heapTuple) call was finally added, one of those test queries became failing again. The other two of them now also failing after the fix. I don't understand how exactly the calculation of "intermediate result size" that is limited by citus.max_intermediate_result_size had changed through b3af63c and fe3caf3, but these numbers are now closer to what they originally were when this limitation was added in fd546cf. So these queries should fail, like in the original version of the limit_intermediate_size test. Co-authored-by: Karina Litskevich <[email protected]> (cherry picked from commit 20dc58c)

LoadShardList is called twice, which is not neccessary, and there is no need to sort the shard placement list since we only want to know the list length. (cherry picked from commit 8e979f7)

This change refactors the code by using generate_qualified_relation_name from id instead of using a sequence of functions to generate the relation name. Fixes #6602 (cherry picked from commit ee11492)

naisila marked this pull request as ready for review January 12, 2025 18:49

emelsimsek and others added 5 commits January 12, 2025 22:05

[performance improvement] remove duplicate LoadShardList call (#7380)

f42e855

LoadShardList is called twice, which is not neccessary, and there is no need to sort the shard placement list since we only want to know the list length. (cherry picked from commit 8e979f7)

Generate qualified relation name (#7427)

ddef972

This change refactors the code by using generate_qualified_relation_name from id instead of using a sequence of functions to generate the relation name. Fixes #6602 (cherry picked from commit ee11492)

naisila force-pushed the release-13.0-naisila branch from 73610dd to ddef972 Compare January 12, 2025 19:09

naisila merged commit ddef972 into release-13.0 Jan 12, 2025
153 of 155 checks passed

naisila deleted the release-13.0-naisila branch January 12, 2025 19:35

naisila changed the title ~~Bump pg versions~~ cherry-pick PR Jan 12, 2025

naisila changed the title ~~cherry-pick PR~~ cherry-pick from main to release-13.0 PR Jan 12, 2025

naisila changed the title ~~cherry-pick from main to release-13.0 PR~~ Cherry-pick from main to release-13.0 Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry-pick from main to release-13.0 #7840

Cherry-pick from main to release-13.0 #7840

naisila commented Jan 10, 2025

codecov bot commented Jan 10, 2025 •

edited

Loading

Cherry-pick from main to release-13.0 #7840

Cherry-pick from main to release-13.0 #7840

Conversation

naisila commented Jan 10, 2025

codecov bot commented Jan 10, 2025 • edited Loading

Codecov Report

codecov bot commented Jan 10, 2025 •

edited

Loading