-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
git for-each-ref: is-base atom and base branches #1768
Conversation
There are issues in commit 3ff144e: |
There are issues in commit 22b72ab: |
f937f24
to
febf73f
Compare
There are issues in commit 3ff144e: |
There are issues in commit 6b1e6c4: |
febf73f
to
7631c3d
Compare
There are issues in commit 23f18f1: |
There are issues in commit 9ccfdd6: |
There are issues in commit 7631c3d: |
7631c3d
to
db87434
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> Derrick Stolee (3):
> commit-reach: add get_branch_base_for_tip
> for-each-ref: add 'is-base' token
> p1500: add is-base performance tests
>
> commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++
> commit-reach.h | 17 ++++++
> ref-filter.c | 78 +++++++++++++++++++++++-
> ref-filter.h | 15 +++++
> t/helper/test-reach.c | 2 +
> t/perf/p1500-graph-walks.sh | 31 ++++++++++
> t/t6600-test-reach.sh | 94 ++++++++++++++++++++++++++++
> 7 files changed, 354 insertions(+), 1 deletion(-)
I was expecting to see an documentation update to for-each-ref (and
probably branch and tag) so that what this new atom means. Is it
that %(is-base:<commit>) interpolates to <commit> for a ref that is an
ancestor of <commit>, and interpolates to an empty string for a ref
that is not, or something?
Thanks.
|
db87434
to
899eaa5
Compare
On the Git mailing list, Derrick Stolee wrote (reply to this): On 8/1/24 7:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> >> Derrick Stolee (3):
>> commit-reach: add get_branch_base_for_tip
>> for-each-ref: add 'is-base' token
>> p1500: add is-base performance tests
>>
>> commit-reach.c | 118 ++++++++++++++++++++++++++++++++++++
>> commit-reach.h | 17 ++++++
>> ref-filter.c | 78 +++++++++++++++++++++++-
>> ref-filter.h | 15 +++++
>> t/helper/test-reach.c | 2 +
>> t/perf/p1500-graph-walks.sh | 31 ++++++++++
>> t/t6600-test-reach.sh | 94 ++++++++++++++++++++++++++++
>> 7 files changed, 354 insertions(+), 1 deletion(-)
> > I was expecting to see an documentation update to for-each-ref (and
> probably branch and tag) so that what this new atom means. Is it
> that %(is-base:<commit>) interpolates to <commit> for a ref that is an
> ancestor of <commit>, and interpolates to an empty string for a ref
> that is not, or something?
You are absolutely right that I missed this crucial detail. I will
eventually send a v2 with this oversight corrected. For now, please
consider this documentation diff, and I look forward to other review
comments that I can use to improve this series before sending v2.
diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index c1dd12b93cf..5154ba3e2a7 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -264,6 +264,16 @@ ahead-behind:<committish>::
commits ahead and behind, respectively, when comparing the output
ref to the `<committish>` specified in the format.
+is-base:<committish>::
+ In at most one row, `(<committish>)` will appear to indicate the ref
+ that minimizes the number of commits in the first-parent history of
+ `<committish>` and not in the first-parent history of the ref. Ties
+ are broken by favoring the earliest ref in the list. Note that this
+ token will not appear if the first-parent history of `<committish>`
+ does not intersect the first-parent histories of the filtered refs.
+ This can be used as a heuristic to guess which of the filtered refs
+ was used as the base of the branch that produced `<committish>`.
+
describe[:options]::
A human-readable name, like linkgit:git-describe[1];
empty string for undescribable commits. The `describe` string may
|
On the Git mailing list, Junio C Hamano wrote (reply to this): Derrick Stolee <[email protected]> writes:
> consider this documentation diff, and I look forward to other review
> comments that I can use to improve this series before sending v2.
>
> diff --git a/Documentation/git-for-each-ref.txt
> b/Documentation/git-for-each-ref.txt
> index c1dd12b93cf..5154ba3e2a7 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -264,6 +264,16 @@ ahead-behind:<committish>::
> commits ahead and behind, respectively, when comparing the output
> ref to the `<committish>` specified in the format.
>
> +is-base:<committish>::
> + In at most one row, `(<committish>)` will appear to indicate the ref
> + that minimizes the number of commits in the first-parent history of
> + `<committish>` and not in the first-parent history of the ref. Ties
> + are broken by favoring the earliest ref in the list. Note that this
> + token will not appear if the first-parent history of `<committish>`
> + does not intersect the first-parent histories of the filtered refs.
> + This can be used as a heuristic to guess which of the filtered refs
> + was used as the base of the branch that produced `<committish>`.
> +
OK. Knowing what definition you used is crucial when reading the
implementation, as we cannot tell what you wanted to implement
without it ;-)
Thanks. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Junio C Hamano <[email protected]> writes:
>> +is-base:<committish>::
>> + In at most one row, `(<committish>)` will appear to indicate the ref
>> + that minimizes the number of commits in the first-parent history of
>> + `<committish>` and not in the first-parent history of the ref.
This was a bit too dense for me to grok. So if I have a <commit>
that is at the tip of a branch B forked from 'master', and then
'master' advanced by a lot since the branch forked, the number this
is minimizing for 'master' is the commits on the branch B, but when
showing 'maint', then even though the branch B may have the tip of
'maint' as an ancestor, the number for 'maint' would be a lot more
than the number for 'master'. If there were another branch C that
was forked from 'master' and shared some (or all) commits that are
near the tip of branch B, e.g.
---o---o---o---o---o---o---o---o---o 'master'
\
o---o---o---o 'C'
\
o---o---o---o 'B'
then the number may be even smaller for branch 'C' than 'master'.
And for at most one ref, %(is-base:<commit>) becomes "(<commit>)";
for all other refs, it becomes an empty string.
OK.
> OK. Knowing what definition you used is crucial when reading the
> implementation, as we cannot tell what you wanted to implement
> without it ;-)
>
> Thanks. |
This patch series was integrated into seen via git@6021df1. |
This patch series was integrated into seen via git@cff39cd. |
This branch is now known as |
This patch series was integrated into seen via git@6912159. |
This patch series was integrated into seen via git@813637c. |
There was a status update in the "New Topics" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Expecting a reroll. source: <[email protected]> |
This patch series was integrated into seen via git@2de25db. |
This patch series was integrated into seen via git@f1df888. |
This patch series was integrated into seen via git@259918f. |
There was a status update in the "Cooking" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Expecting a reroll. source: <[email protected]> |
This patch series was integrated into seen via git@21eb9b8. |
The lookup_commit_reference_by_name() method uses lookup_commit_reference() without an option to use lookup_commit_reference_gently(). Create a gentle version of the method so it can be used in locations where non-commits may be found but error messages should be silenced. Signed-off-by: Derrick Stolee <[email protected]>
757c200
to
d001334
Compare
The previous change introduced the get_branch_base_for_tip() method in commit-reach.c. The motivation of that change was about using a heuristic to deteremine the base branch for a source commit from a list of candidate commit tips. This change makes that algorithm visible to users via a new atom in the 'git for-each-ref' format. This change is very similar to the chang in 49abcd2 (for-each-ref: add ahead-behind format atom, 2023-03-20). Introduce the 'is-base:<source>' atom, which will indicate that the algorithm should be computed and the result of the algorithm is reported using an indicator of the form '(<source>)'. For example, using '%(is-base:HEAD)' would result in one line having the token '(HEAD)'. Use the sorted order of refs included in the ref filter to break ties in the algorithm's heuristic. In the previous change, the motivating examples include using an L0 trunk, long-lived L1 branches, and temporary release branches. A caller could communicate the ordered preference among these categories using the input refpecs and avoiding a different sort mechanism. This sorting behavior is tested in the test scripts. It is important to include this atom as a special case to can_do_iterative_format() to match the expectations created in bd98f97 (ref-filter.c: filter & format refs in the same callback, 2023-11-14). The ahead-behind atom was one of the special cases, and this similarly requires using an algorithm across all input refs before starting the format of any single ref. In the test script, the format tokens use colons or lack whitespace to avoid Git complaining about trailing whitespace errors. Signed-off-by: Derrick Stolee <[email protected]>
The previous two changes introduced a commit walking heuristic for finding the most likely base branch for a given source. This algorithm walks first-parent histories until reaching a collision. This walk _should_ be very fast. Exceptions include cases where a commit-graph file does not exist, leading to a full walk of all reachable commits to compute generation numbers, or a case where no collision in the first-parent history exists, leading to a walk of all first-parent history to the root commits. The p1500 test script guarantees a complete commit-graph file during its setup, so we will not test that scenario. Do create a new root commit in an effort to test the scenario of parallel first-parent histories. Even with the extra root commit, these tests take no longer than 0.02 seconds on my machine for the Git repository. However, the results are slightly more interesting in a copy of the Linux kernel repository: Test --------------------------------------------------------------- 1500.2: ahead-behind counts: git for-each-ref 0.12 1500.3: ahead-behind counts: git branch 0.12 1500.4: ahead-behind counts: git tag 0.12 1500.5: contains: git for-each-ref --merged 0.04 1500.6: contains: git branch --merged 0.04 1500.7: contains: git tag --merged 0.04 1500.8: is-base check: test-tool reach (refs) 0.03 1500.9: is-base check: test-tool reach (tags) 0.03 1500.10: is-base check: git for-each-ref 0.03 1500.11: is-base check: git for-each-ref (disjoint-base) 0.07 Signed-off-by: Derrick Stolee <[email protected]>
d001334
to
cce9921
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
There was a status update in the "Cooking" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Comments? source: <[email protected]> |
There was a status update in the "Cooking" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Comments? source: <[email protected]> |
This patch series was integrated into seen via git@58280ea. |
This patch series was integrated into seen via git@996b87a. |
On the Git mailing list, Junio C Hamano wrote (reply to this): "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> There are benefits to users both on client-side and server-side. In an
> internal monorepo, this base branch detection algorithm is used to determine
> a long-lived branch based on the HEAD commit, mapping to a group within the
> organizational structure of the repository, which determines a set of
> projects that the user will likely need to build; this leads to
> automatically selecting an initial sparse-checkout definition based on the
> build dependencies required. An upcoming feature in Azure Repos will use
> this algorithm to automatically create a pull request against the correct
> target branch, reducing user pain from needing to select a different branch
> after a large commit diff is rendered against the default branch. This atom
> unlocks that ability for Git hosting services that use Git in their backend.
Thanks for an update. This iteration looks good to me.
|
There was a status update in the "Cooking" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Will merge to 'next'. source: <[email protected]> |
On the Git mailing list, Derrick Stolee wrote (reply to this): On 8/19/24 3:52 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> >> There are benefits to users both on client-side and server-side. In an
>> internal monorepo, this base branch detection algorithm is used to determine
>> a long-lived branch based on the HEAD commit, mapping to a group within the
>> organizational structure of the repository, which determines a set of
>> projects that the user will likely need to build; this leads to
>> automatically selecting an initial sparse-checkout definition based on the
>> build dependencies required. An upcoming feature in Azure Repos will use
>> this algorithm to automatically create a pull request against the correct
>> target branch, reducing user pain from needing to select a different branch
>> after a large commit diff is rendered against the default branch. This atom
>> unlocks that ability for Git hosting services that use Git in their backend.
> > Thanks for an update. This iteration looks good to me.
Thank you for your careful review.
-Stolee
|
This patch series was integrated into seen via git@918c769. |
This patch series was integrated into next via git@dd5da48. |
This patch series was integrated into seen via git@d9c0d90. |
This patch series was integrated into seen via git@c42f26f. |
There was a status update in the "Cooking" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Will merge to 'master'. source: <[email protected]> |
This patch series was integrated into seen via git@36102ce. |
This patch series was integrated into seen via git@f271dbd. |
There was a status update in the "Cooking" section about the branch 'git for-each-ref' learned a new "--format" atom to find the branch that the history leading to a given commit "%(is-base:<commit>)" is likely based on. Will merge to 'master'. source: <[email protected]> |
This patch series was integrated into seen via git@a8e7fad. |
This patch series was integrated into seen via git@3222718. |
This patch series was integrated into master via git@3222718. |
This patch series was integrated into next via git@3222718. |
Closed via 3222718. |
This change introduces a new 'git for-each-ref' atom, 'is-base', in a very similar way to the 'ahead-behind' atom. As detailed carefully in the first change, this is motivated by the need to detect the concept of a "base branch" in a repository with multiple long-lived branches.
This change is motivated by a third-party tool created to make this detection with the same optimization mechanism, but using a much slower technique due to the limitations of the Git CLI not presenting this information. The existing algorithm involves using
git rev-list --first-parent -<N>
in batches for the collection of considered references, comparing those lists, and increasing<N>
as needed until finding a collision. This new use of 'git for-each-ref' will allow determining this mechanism within a single process and walking a minimal number of commits.There are benefits to users both on client-side and server-side. In an internal monorepo, this base branch detection algorithm is used to determine a long-lived branch based on the HEAD commit, mapping to a group within the organizational structure of the repository, which determines a set of projects that the user will likely need to build; this leads to automatically selecting an initial sparse-checkout definition based on the build dependencies required. An upcoming feature in Azure Repos will use this algorithm to automatically create a pull request against the correct target branch, reducing user pain from needing to select a different branch after a large commit diff is rendered against the default branch. This atom unlocks that ability for Git hosting services that use Git in their backend.
Thanks, -Stolee
Updates in v2
Updates in v3
cc: [email protected]
cc: [email protected]