Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] : writing incorrect data to target Merge repartition Command #7659

Merged
merged 8 commits into from
Sep 13, 2024

Conversation

paragikjain
Copy link
Contributor

We were writing incorrect data to target collection in some cases of merge command. In case of repartition when source query it RELATION. We were referring to incorrect attribute number that was resulting into this incorrect behavior.

Example :
image
image

I have added fixed tests as part of this PR , Thanks.

Copy link

codecov bot commented Jul 16, 2024

Codecov Report

Attention: Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.

Project coverage is 52.71%. Comparing base (3c467e6) to head (92191f0).

❗ There is a different number of reports uploaded between BASE (3c467e6) and HEAD (92191f0). Click for more details.

HEAD has 61 uploads less than BASE
Flag BASE (3c467e6) HEAD (92191f0)
14_regress_check-enterprise-failure 1 0
14_16_upgrade 1 0
14_15_upgrade 1 0
15_16_upgrade 1 0
16_regress_check-split 1 0
16_regress_check-enterprise-failure 1 0
16_regress_check-enterprise-isolation-logicalrep-2 1 0
14_regress_check-enterprise-isolation-logicalrep-1 1 0
14_regress_check-enterprise-isolation 1 0
15_regress_check-enterprise-isolation-logicalrep-1 1 0
15_regress_check-enterprise-failure 1 0
15_regress_check-operations 1 0
16_regress_check-enterprise-isolation-logicalrep-1 1 0
16_regress_check-vanilla 1 0
15_regress_check-multi-mx 1 0
16_regress_check-failure 1 0
16_regress_check-enterprise-isolation 1 0
15_regress_check-failure 1 0
14_regress_check-multi-mx 1 0
_upgrade 17 1
14_regress_check-vanilla 1 0
16_regress_check-enterprise 1 0
15_regress_check-enterprise 1 0
15_regress_check-vanilla 1 0
14_regress_check-split 1 0
14_regress_check-enterprise 1 0
14_regress_check-operations 1 0
16_regress_check-multi-1 1 0
15_regress_check-multi-1 1 0
14_regress_check-multi-1 1 0
16_regress_check-operations 1 0
15_regress_check-enterprise-isolation 1 0
14_regress_check-columnar 1 0
15_regress_check-enterprise-isolation-logicalrep-2 1 0
16_regress_check-columnar 1 0
15_regress_check-split 1 0
16_regress_check-multi 1 0
15_regress_check-multi 1 0
14_regress_check-isolation 1 0
15_regress_check-isolation 1 0
14_regress_check-failure 1 0
16_cdc_installcheck 1 0
15_cdc_installcheck 1 0
16_regress_check-multi-mx 1 0
16_regress_check-isolation 1 0
14_regress_check-multi 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #7659       +/-   ##
===========================================
- Coverage   89.70%   52.71%   -36.99%     
===========================================
  Files         283      283               
  Lines       60506    60481       -25     
  Branches     7539     7531        -8     
===========================================
- Hits        54276    31884    -22392     
- Misses       4073    25772    +21699     
- Partials     2157     2825      +668     

@@ -361,8 +362,9 @@ CreateAllTargetListForRelation(Oid relationId, List *requiredAttributes)
}
else
{
int varAttNum = isMergeQuery ? attrNum : colAppendIdx++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that there is an off-by-one error in the column values inserted into the table but I really would like understand the Postgres-internals fact that requires us doing something specific for MERGE while it's not needed for other types of commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Merge Command we should refer actual table attNum but in other cases we are building query like below which refers attnum of a subquery. not the actual table.
(SELECT t.a, NULL, NULL FROM (SELECT a FROM table) t).

That is the reason this is applicable only for merge command.

@@ -857,8 +857,11 @@ ConvertRelationRTEIntoSubquery(Query *mergeQuery, RangeTblEntry *sourceRte,
/* set the FROM expression to the subquery */
newRangeTableRef->rtindex = SINGLE_RTE_INDEX;
sourceResultsQuery->jointree = makeFromExpr(list_make1(newRangeTableRef), NULL);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiredAttributes = RequiredAttrNumbersForRelationInternal(mergeQuery, relationRestriction->index);
Here were we getting 1, 3?

@tejeswarm
Copy link
Contributor

Likely the problem is stemming from the fact that ConvertRelationRTEIntoSubquery() is really designed for

SELECT * FROM the relation

but in this case we are selecting column 1 and 3. May be we should change the function to handle

SELECT <col-list> FROM the relation

@tejeswarm
Copy link
Contributor

tejeswarm commented Sep 12, 2024

I think I know what the problem is, can you please do me a small favor

Please change the below code

sourceResultsQuery->targetList =
                CreateAllTargetListForRelation(sourceRte->relid, requiredAttributes);

to

sourceResultsQuery->targetList =
                CreateFilteredTargetListForRelation(rteRelation->relid, requiredAttributes);

and see how things work

@paragikjain
Copy link
Contributor Author

ou please do me

I think I know what the problem is, can you please do me a small favor

Please change the below code

sourceResultsQuery->targetList =
                CreateAllTargetListForRelation(sourceRte->relid, requiredAttributes);

to

sourceResultsQuery->targetList =
                CreateFilteredTargetListForRelation(rteRelation->relid, requiredAttributes);

and see how things work

Yes this works fine !! modified the changes with this function.

Copy link
Contributor

@tejeswarm tejeswarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you please add more test scenarios

In the existing test scenarios, the order of columns in both target and source are matching

INSERT (salary, id) VALUES (source.salary, source.id);
INSERT (id, salary) VALUES (source.id, source.salary);

Can we add tests where order of columns in target and source are not same i.e INSERT target(col1, col3) FROM source(col3, col1)

Semantically they may be incorrect i.e. you may be inserting age into id, but we want to make sure that the "correct columns" are picked up on both sides.

@tejeswarm tejeswarm merged commit 5bad6c6 into citusdata:main Sep 13, 2024
150 of 151 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants