Skip to content

Commit

Permalink
merge: reduce prevalence of nested conflicts with diff3
Browse files Browse the repository at this point in the history
This is a pretty small code change, but one that perhaps deserves a
lengthy explanation...

When merging, it is possible to have nested conflicts.  This most
frequently happens when using merge.conflictStyle=diff3 (or zdiff3) and
doing so in a case where there is more than one merge base.  For
example:

          L1---L2
         /  \ /  \
        B    X    ?
         \  / \  /
          R1---R2

Here on branches L and R there are many commits omitted, but L1 and R1
are both valid merge bases for a merge of L2 and R2.  This reason we end
up with two valid merge bases is because we have both a merge from L
into R and a merge from R into L (each merge occurring before or at L2
and R2, respectively).  When merging L2 and R2 using the diff3 conflict
style, today you might get a conflict of the form:

    Non-conflicting leading content
    <<<<<<< e11e11e1 (First line of commit message of L2)
    L2:conflicting region
    ||||||| merged common ancestors
    <<<<<<<<< Temporary merge branch 1
    L1:conflicting region
    ||||||||| ba5eba11
    M:conflicting region
    =========
    R1:conflicting region
    =======
    R2:conflicting region
    >>>>>>> 52525252 (First line of commit message of R2)
    Non-conflicting trailing content

where "COMMIT: conflicting region" above stands for several (or even
hundreds) of lines of content from the (relevant file of) the relevant
commit.

You could get another layer of nesting here, if you found that there was
more than one merge base of the merge bases.  In fact, the number of
layers of nesting is not limited.  In effect, the higher the depth of
recursion needed for merging, the more the "base" version in the diff3
output expands.

Reports over the years suggest the presence of nested conflicts diminish
the value of having the base version available; the greater the nesting
(and perhaps also the longer the length of each region of lines when
there is a nested conflict), the more diminished the value is.  In fact,
it might be preferable for these particular conflicts to have used
merge.conflictStyle=merge instead, i.e. to provide 0 context lines for
the base version, while still using merge.conflictStyle=diff3 for other
cases that don't have conflicts between the merge bases.

However, there is an alternative way to handle the recursive merges that
would approximate merge.conflictStyle=merge as the number of nesting
levels increases: resolve the merge of merge bases not by using a
conflicted merge of the two merge bases, but by using their base
version.

This alternative strategy works because we have some latitude in how the
virtual merge base is selected.  Using the base version of the merge
bases is something we have done before in specific contexts, and in each
case doing so fixed actual bugs.  For more details, see:
    816147e (merge-recursive: add a bunch of FIXME comments
                  documenting known bugs, 2021-03-20) -- particularly
                  the cases where resolution for merge bases are wrong
    4ef88fc (merge-ort: add handling for different types of files
                  at same path, 2021-01-01)
    c73cda7 (merge-ort: copy and adapt merge_submodule() from
                  merge-recursive.c, 2021-01-01)
    62fdec1 (merge-ort: flesh out implementation of
                  handle_content_merge(), 2021-01-01)
    ec61d14 (merge-recursive: Fix modify/delete resolution in the
                  recursive case, 2011-08-11)
    a129d96 (Allow specifying specialized merge-backend per path.,
                  2007-04-16) -- particularly the "common ancestor"
                  comment and associated code

If this explanation feels like "magic" to you, there's an alternative
rules-based approach by which we can evaluate the choice of how to
create a virtual merge base.  We want any virtual merge base to follow
these rules:

  Rule 1) If within a certain range of lines, all merge bases match
          each other, then use those lines from any of them in the
          virtual merge base.
  Rule 2) If within a certain range of lines, there is at most one
          version of those lines that does not match the merge base
          of the merge bases, then use that unique version of those
          lines in the virtual merge base.
  Rule 3) In lines of the file that disagree between two or more
          merge bases (and which also disagree with the base of the
          merge bases), fill those lines in the virtual merge base
          with something that matches none of the merge bases.

The first two rules simply let us resolve cases that are clearly
unambiguous.  The third rule may look funny but is necessary to avoid
the virtual merge base accidentally matching one of the two sides in the
outer merge.  (If the virtual merge base matches one of the two sides in
the outer merge, the merge machinery will think that side of the outer
merge made no change and thus that there is no conflict in the outer
merge, despite the fact that the two sides of the outer merge may
disagree with each other.)

If we are using merge.conflictStyle=merge, then these three rules are
sufficient; anything else we do will be irrelevant to the end result.
In that case, we could even satisfy rule 3 by ignoring the conflicting
lines and replacing them with totally random lines.  However, for
merge.conflictStyle=diff3, we want something that looks more like a
"base version" of the relevant file.  That gives us a goal for the
virtual merge base:

  Goal 4) In lines of the file falling under rule 3, try to pick
          something that looks like a base version.

For Goal 4, both merging the conflicted portions of the merge bases and
taking the base of the merge bases satisfy this goal.  Both have their
plusses and minuses.  But both become less and less useful when there is
a deeply nested recursive merge.  For a deeply nested recursive merge,
the conflicted contents gives a highly nested conflict showing every
version of the file going back to the eventual common point in history.
In contrast, the "base of the merge bases" strategy instead only gives
the single version of the file from that final common point in history.
Since codebases tend to grow over time, odds are that the more deeply
recursive the merge has to go, the smaller the context that will be
provided with the "base of the merge bases" strategy.  In the limit,
the original version of the lines far enough back in history may have
been empty, so the "base of the merge bases" strategy effectively makes
recursive merges look like merge.conflictStyle=merge for deep
recursions, while still providing some "base version" context for
more shallow recursions.  As noted near the beginning of this commit
message, having something that approaches no context in the special
cases of deep recursions is exactly what we'd prefer.  So:

  Goal 5) In lines of the file falling under Rule 3, the more deep the
          recursion is, the less likely relevant context can be kept;
          prefer small (or even empty) context regions over very
          complicated ones.

This all sounds great, but there is one gotcha -- since we iteratively
merge the merge-bases pairwise, we don't have an easy way to distinguish
between Rule 2 and Rule 3 at times.  For example, if we have three merge
bases and they all disagree on some line, the conflicted-content
solution avoids an ambiguity, but taking the "base of the merge bases"
introduces one.  In particular, for this case of three merge bases that
disagree on some line, merging the first two merge bases yields an
interim virtual merge base that matches the base, making it look like
the virtual merge base has not been modified relative to its base.  Then
when we merge the third merge base with the interim merge base, we'd
think it cleanly resolved to that line from the third merge base,
against our wishes.  Since all three merge bases differed on that line,
we'd want to use the base of the merge bases, but the pairwise merging
made that difficult.  To avoid this problem, only use the "base of the
merge bases" strategy when we have two merge bases.  That will limit
where this new virtual merge base strategy will help us, but since two
merge bases is the most common case for recursive merges, it should
still provide significant benefit.

Signed-off-by: Elijah Newren <[email protected]>
  • Loading branch information
newren committed Jan 21, 2025
1 parent 47291da commit 35331a2
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 36 deletions.
4 changes: 2 additions & 2 deletions merge-ll.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ struct ll_merge_options {
* Resolve local conflicts automatically in favor of one side or the other
* (as in 'git merge-file' `--ours`/`--theirs`/`--union`). Can be `0`,
* `XDL_MERGE_FAVOR_OURS`, `XDL_MERGE_FAVOR_THEIRS`,
* or `XDL_MERGE_FAVOR_UNION`.
* or `XDL_MERGE_FAVOR_UNION`, `XDL_MERGE_FAVOR_BASE`.
*/
unsigned variant : 2;
unsigned variant : 3;

/**
* Resmudge and clean the "base", "theirs" and "ours" files before merging.
Expand Down
16 changes: 15 additions & 1 deletion merge-ort.c
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,9 @@ struct merge_options_internal {
/* call_depth: recursion level counter for merging merge bases */
int call_depth;

/* vmb_favor: preferred resolution variant for virtual merge base */
unsigned vmb_favor;

/* field that holds submodule conflict information */
struct string_list conflicted_submodules;
};
Expand Down Expand Up @@ -2070,7 +2073,7 @@ static int merge_3way(struct merge_options *opt,

if (opt->priv->call_depth) {
ll_opts.virtual_ancestor = 1;
ll_opts.variant = 0;
ll_opts.variant = opt->priv->vmb_favor;
} else {
switch (opt->recursive_variant) {
case MERGE_VARIANT_OURS:
Expand Down Expand Up @@ -5186,6 +5189,17 @@ static void merge_ort_internal(struct merge_options *opt,
ancestor_name = "empty tree";
} else if (merge_bases) {
ancestor_name = "merged common ancestors";
/*
* If there were more than two virtual merge bases, we just
* fall back to our virtual merge base having conflict markers
* between versions. But if there are only two, then we can
* resolve conflicts by taking the version from the merge
* base of our merge bases.
*/
if (merge_bases->next)
opt->priv->vmb_favor = 0;
else
opt->priv->vmb_favor = XDL_MERGE_FAVOR_BASE;
} else {
strbuf_add_unique_abbrev(&merge_base_abbrev,
&merged_merge_bases->object.oid,
Expand Down
65 changes: 32 additions & 33 deletions t/t6416-recursive-corner-cases.sh
Original file line number Diff line number Diff line change
Expand Up @@ -213,13 +213,12 @@ test_expect_success 'git detects differently handled merges conflict' '
git cat-file -p C:new_a >ours &&
git cat-file -p C:a >theirs &&
>empty &&
test_must_fail git merge-file \
git merge-file --base \
-L "Temporary merge branch 1" \
-L "" \
-L "Temporary merge branch 2" \
ours empty theirs &&
sed -e "s/^\([<=>]\)/\1\1\1/" ours >ours-tweaked &&
git hash-object ours-tweaked >expect &&
git hash-object ours >expect &&
git rev-parse >>expect \
D:new_a E:new_a &&
git rev-parse >actual \
Expand Down Expand Up @@ -275,13 +274,12 @@ test_expect_success 'git detects differently handled merges conflict, swapped' '
git cat-file -p C:a >ours &&
git cat-file -p C:new_a >theirs &&
>empty &&
test_must_fail git merge-file \
git merge-file --base \
-L "Temporary merge branch 1" \
-L "" \
-L "Temporary merge branch 2" \
ours empty theirs &&
sed -e "s/^\([<=>]\)/\1\1\1/" ours >ours-tweaked &&
git hash-object ours-tweaked >expect &&
git hash-object ours >expect &&
git rev-parse >>expect \
D:new_a E:new_a &&
git rev-parse >actual \
Expand Down Expand Up @@ -1564,11 +1562,13 @@ test_expect_failure 'check conflicting modes for regular file' '
# - R2 renames 'a' to 'm'
#
# In the end, in file 'm' we have four different conflicting files (from
# two versions of 'b' and two of 'a'). In addition, if
# merge.conflictstyle is diff3, then the base version also has
# conflict markers of its own, leading to a total of three levels of
# conflict markers. This is a pretty weird corner case, but we just want
# to ensure that we handle it as well as practical.
# two versions of 'b' and two of 'a'). In addition, if we didn't use
# XDL_MERGE_FAVOR_BASE and merge.conflictstyle is diff3, then the base
# version would also have conflict markers of its own, leading to a total of
# three levels of conflict markers. However, we do use XDL_MERGE_FAVOR_BASE
# for recursive merges, so this should keep conflict nestings to two. This
# is a pretty weird corner case, but we just want to ensure that we handle
# it as well as practical.

test_expect_success 'setup nested conflicts' '
git init nested_conflicts &&
Expand Down Expand Up @@ -1646,7 +1646,7 @@ test_expect_success 'setup nested conflicts' '
)
'

test_expect_success 'check nested conflicts' '
test_expect_success 'check nested merges without nested conflicts' '
(
cd nested_conflicts &&
Expand All @@ -1670,26 +1670,28 @@ test_expect_success 'check nested conflicts' '
git cat-file -p main:a >base &&
git cat-file -p L1:a >ours &&
git cat-file -p R1:a >theirs &&
test_must_fail git merge-file --diff3 \
git merge-file --base \
-L "Temporary merge branch 1" \
-L "$MAIN" \
-L "Temporary merge branch 2" \
ours \
base \
theirs &&
sed -e "s/^\([<|=>]\)/\1\1/" ours >vmb_a &&
mv ours vmb_a &&
test_cmp base vmb_a &&
git cat-file -p main:b >base &&
git cat-file -p L1:b >ours &&
git cat-file -p R1:b >theirs &&
test_must_fail git merge-file --diff3 \
git merge-file --base \
-L "Temporary merge branch 1" \
-L "$MAIN" \
-L "Temporary merge branch 2" \
ours \
base \
theirs &&
sed -e "s/^\([<|=>]\)/\1\1/" ours >vmb_b &&
mv ours vmb_b &&
test_cmp base vmb_b &&
# Compare :2:m to expected values
git cat-file -p L2:m >ours &&
Expand Down Expand Up @@ -1749,14 +1751,15 @@ test_expect_success 'check nested conflicts' '
# L<n> and R<n> resolve the conflicts differently.
#
# X<n> is an auto-generated merge-base used when merging L<n+1> and R<n+1>.
# By construction, X1 has conflict markers due to conflicting versions.
# X2, due to using merge.conflictstyle=3, has nested conflict markers.
# By construction, X1 has two handle conflicting versions. X2 is both
# based on a merge base that had to ahndle conflicting versions, and is
# trying to resolve conflicting versions between L2 and R2.
#
# So, merging R3 into L3 using merge.conflictstyle=3 should show the
# nested conflict markers from X2 in the base version -- that means we
# have three levels of conflict markers. Can we distinguish all three?
# So, merging R3 into L3 using merge.conflictstyle=3 has a merge base where
# conflicts had to dealth with multiple levels back. Do we get a reasonable
# diff3 output for it?

test_expect_success 'setup virtual merge base with nested conflicts' '
test_expect_success 'setup virtual merge base where multiple levels back had conflicts' '
git init virtual_merge_base_has_nested_conflicts &&
(
cd virtual_merge_base_has_nested_conflicts &&
Expand Down Expand Up @@ -1815,7 +1818,7 @@ test_expect_success 'setup virtual merge base with nested conflicts' '
)
'

test_expect_success 'check virtual merge base with nested conflicts' '
test_expect_success 'check virtual merge base where multiple levels back had conflicts' '
(
cd virtual_merge_base_has_nested_conflicts &&
Expand All @@ -1839,34 +1842,30 @@ test_expect_success 'check virtual merge base with nested conflicts' '
git rev-parse :2:content :3:content >actual &&
test_cmp expect actual &&
# Imitate X1 merge base, except without long enough conflict
# markers because a subsequent sed will modify them. Put
# result into vmb.
# Imitate X1 merge base, Put result into vmb.
git cat-file -p main:content >base &&
git cat-file -p L:content >left &&
git cat-file -p R:content >right &&
cp left merged-once &&
test_must_fail git merge-file --diff3 \
git merge-file --base \
-L "Temporary merge branch 1" \
-L "$MAIN" \
-L "Temporary merge branch 2" \
merged-once \
base \
right &&
sed -e "s/^\([<|=>]\)/\1\1\1/" merged-once >vmb &&
mv merged-once vmb &&
# Imitate X2 merge base, overwriting vmb. Note that we
# extend both sets of conflict markers to make them longer
# with the sed command.
# Imitate X2 merge base, overwriting vmb.
cp left merged-twice &&
test_must_fail git merge-file --diff3 \
git merge-file --base \
-L "Temporary merge branch 1" \
-L "merged common ancestors" \
-L "Temporary merge branch 2" \
merged-twice \
vmb \
right &&
sed -e "s/^\([<|=>]\)/\1\1\1/" merged-twice >vmb &&
mv merged-twice vmb &&
# Compare :1:content to expected value
git cat-file -p :1:content >actual &&
Expand Down

0 comments on commit 35331a2

Please sign in to comment.