diff --git a/Documentation/git-filter-repo.txt b/Documentation/git-filter-repo.txt index 604b2d78..3befbbe9 100644 --- a/Documentation/git-filter-repo.txt +++ b/Documentation/git-filter-repo.txt @@ -54,6 +54,9 @@ can be overridden, but they are all on by default): * pruning commits which become empty due to the above filters (also handles edge cases like pruning of merge commits which become degenerate and empty) + * rewriting stashes + * baking the changes made by refs/replace/ refs into the permanent + history and removing the replace refs * stripping of original history to avoid mixing old and new history * repacking the repository post-rewrite to shrink the repo for the user @@ -379,14 +382,14 @@ directory. These files are overwritten unconditionally on every run. Commit map ~~~~~~~~~~ -The `.git/filter-repo/commit-map` file contains a mapping of how all +The `$GIT_DIR/filter-repo/commit-map` file contains a mapping of how all commits were (or were not) changed. * A header is the first line with the text "old" and "new" * Commit mappings are in no particular order * All commits in range of the rewrite will be listed, even commits - that are unchanged (e.g. because the commit pre-dated when the - large file(s) were introduced to the repo). + that are unchanged (e.g. because the commit pre-dated when files + the filtering operation are removing were introduced to the repo). * An all-zeros hash, or null SHA, represents a non-existent object. When in the "new" column, this means the commit was removed entirely. @@ -394,7 +397,7 @@ commits were (or were not) changed. Reference map ~~~~~~~~~~~~~ -The `.git/filter-repo/ref-map` file contains a mapping of which local +The `$GIT_DIR/filter-repo/ref-map` file contains a mapping of which local references were changed. * A header is the first line with the text "old", "new" and "ref" @@ -402,6 +405,33 @@ references were changed. * An all-zeros hash, or null SHA, represents a non-existent object. When in the "new" column, this means the ref was removed entirely. +Already Ran +~~~~~~~~~~~ + +The `$GIT_DIR/filter-repo/already_ran` file contains a file recording that +git-filter-repo has been run. When this file is present, future runs will +be treated as an extension of the previous filtering operation. + +Concretely, this means: + * The "Fresh Clone" check is bypassed + + This is done because past runs would cause the repository to no longer + look like a fresh clone, and thus fail the fresh clone check, but doing + filtering via multiple invocations of git-filter-repo is an intended + and support usecase. You already passed or bypassed the "Fresh Clone" + check on your initial run. + +However, if the already_ran file exists but is older than 1 day when they +invoke git-filter-repo, the user will be prompted for whether the new run +should be considered a continuation of the previous run. If they do not +answer in the affirmative, then the above bullet will not apply. +This prompt exists because users might do a history rewrite in a repository, +forget about it and leave the $GIT_DIR/filter-repo directory around, and +then some months or years later need to do another rewrite. If commits +have been made public and shared from the previous rewrite, then the next +filter-repo run should not be considered a continuation of the previous +filtering run. + [[FRESHCLONE]] FRESH CLONE SAFETY CHECK AND --FORCE ------------------------------------ diff --git a/git-filter-repo b/git-filter-repo index 59294d18..a00adff7 100755 --- a/git-filter-repo +++ b/git-filter-repo @@ -354,17 +354,24 @@ class ProgressWriter(object): class _IDs(object): """ A class that maintains the 'name domain' of all the 'marks' (short int - id for a blob/commit git object). The reason this mechanism is necessary - is because the text of fast-export may refer to an object using a different - mark than the mark that was assigned to that object using IDS.new(). This - class allows you to translate the fast-export marks (old) to the marks - assigned from IDS.new() (new). - - Note that there are two reasons why the marks may differ: (1) The - user manually creates Blob or Commit objects (for insertion into the - stream) (2) We're reading the data from two different repositories - and trying to combine the data (git fast-export will number ids from - 1...n, and having two 1's, two 2's, two 3's, causes issues). + id for a blob/commit git object). There are two reasons this mechanism + is necessary: + (1) the output text of fast-export may refer to an object using a different + mark than the mark that was assigned to that object using IDS.new(). + (This class allows you to translate the fast-export marks, "old" to + the marks assigned from IDS.new(), "new"). + (2) when we prune a commit, its "old" id becomes invalid. Any commits + which had that commit as a parent needs to use the nearest unpruned + ancestor as its parent instead. + + Note that for purpose (1) above, this typically comes about because the user + manually creates Blob or Commit objects (for insertion into the stream). + It could also come about if we attempt to read the data from two different + repositories and trying to combine the data (git fast-export will number ids + from 1...n, and having two 1's, two 2's, two 3's, causes issues; granted, we + this scheme doesn't handle the two streams perfectly either, but if the first + fast export stream is entirely processed and handled before the second stream + is started, this mechanism may be sufficient to handle it). """ def __init__(self): @@ -399,7 +406,7 @@ class _IDs(object): """ Record that old_id is being renamed to new_id. """ - if old_id != new_id: + if old_id != new_id or old_id in self._translation: # old_id -> new_id self._translation[old_id] = new_id @@ -434,7 +441,12 @@ class _IDs(object): rv += " %d -> %s\n" % (k, self._translation[k]) rv += "Reverse translation:\n" - for k in sorted(self._reverse_translation): + reverse_keys = list(self._reverse_translation.keys()) + if None in reverse_keys: # pragma: no cover + reverse_keys.remove(None) + reverse_keys = sorted(reverse_keys) + reverse_keys.append(None) + for k in reverse_keys: rv += " " + str(k) + " -> " + str(self._reverse_translation[k]) + "\n" return rv @@ -2935,7 +2947,21 @@ class RepoFilter(object): # Determine if this is second or later run of filter-repo tmp_dir = self.results_tmp_dir(create_if_missing=False) - already_ran = os.path.isfile(os.path.join(tmp_dir, b'already_ran')) + ran_path = os.path.join(tmp_dir, b'already_ran') + already_ran = os.path.isfile(ran_path) + if already_ran: + current_time = time.time() + file_mod_time = os.path.getmtime(ran_path) + file_age = current_time - file_mod_time + if file_age > 86400: # file older than a day + msg = (f"The previous run is older than a day ({decode(ran_path)} already exists).\n" + f"See \"Already Ran\" section in the manual for more information.\n" + f"Treat this run as a continuation of filtering in the previous run (Y/N)? ") + response = input(msg) + + if response.lower() != 'y': + os.remove(ran_path) + already_ran = False # Default for --replace-refs if not self._args.replace_refs: @@ -3146,10 +3172,10 @@ class RepoFilter(object): return fi_input, fi_output = self._import_pipes while self._pending_renames: - orig_id, ignore = self._pending_renames.popitem(last=False) - new_id = fi_output.readline().rstrip() - self._commit_renames[orig_id] = new_id - if old_hash == orig_id: + orig_hash, ignore = self._pending_renames.popitem(last=False) + new_hash = fi_output.readline().rstrip() + self._commit_renames[orig_hash] = new_hash + if old_hash == orig_hash: return if limit and len(self._pending_renames) < limit: return @@ -3914,7 +3940,7 @@ class RepoFilter(object): refs_to_nuke = set() if refs_to_nuke and self._args.debug: print("[DEBUG] Deleting the following refs:\n "+ - decode(b"\n ".join(refs_to_nuke))) + decode(b"\n ".join(sorted(refs_to_nuke)))) p.stdin.write(b''.join([b"delete %s\n" % x for x in refs_to_nuke])) diff --git a/pyproject.toml b/pyproject.toml index 9a689a2b..7157e75f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -6,7 +6,7 @@ authors = [ ] readme = "README.md" classifiers = [ - "Development Status :: 4 - Beta", + "Development Status :: 5 - Production/Stable", "Operating System :: OS Independent", "Programming Language :: Python", "License :: OSI Approved :: MIT License", diff --git a/t/t9393-rerun.sh b/t/t9393-rerun.sh new file mode 100755 index 00000000..1bc88dce --- /dev/null +++ b/t/t9393-rerun.sh @@ -0,0 +1,79 @@ +#!/bin/bash + +test_description='filter-repo tests with reruns' + +. ./test-lib.sh + +export PATH=$(dirname $TEST_DIRECTORY):$PATH # Put git-filter-repo in PATH + +DATA="$TEST_DIRECTORY/t9393" +DELETED_SHA="0000000000000000000000000000000000000000" # FIXME: sha256 support + +test_expect_success 'a re-run that is treated as a clean slate' ' + test_create_repo clean_slate_rerun && + ( + cd clean_slate_rerun && + git fast-import --quiet <$DATA/simple && + + FIRST_ORPHAN=$(git rev-parse orphan-me~1) && + FINAL_ORPHAN=$(git rev-parse orphan-me) && + FILE_A_CHANGE=$(git rev-list -1 HEAD -- fileA) && + FILE_B_CHANGE=$(git rev-list -1 HEAD -- fileB) && + FILE_C_CHANGE=$(git rev-list -1 HEAD -- fileC) && + FILE_D_CHANGE=$(git rev-list -1 HEAD -- fileD) && + ORIGINAL_TAG=$(git rev-parse v1.0) && + + git filter-repo --invert-paths --path fileB --force && + NEW_FILE_C_CHANGE=$(git rev-list -1 HEAD -- fileC) && + NEW_FILE_D_CHANGE=$(git rev-list -1 HEAD -- fileD) && + FINAL_TAG=$(git rev-parse v1.0) && + + cat <<-EOF | sort >sha-expect && + ${FIRST_ORPHAN} ${FIRST_ORPHAN} + ${FINAL_ORPHAN} ${FINAL_ORPHAN} + ${FILE_A_CHANGE} ${FILE_A_CHANGE} + ${FILE_B_CHANGE} ${DELETED_SHA} + ${FILE_C_CHANGE} ${NEW_FILE_C_CHANGE} + ${FILE_D_CHANGE} ${NEW_FILE_D_CHANGE} + EOF + printf "%-40s %s\n" old new >expect && + cat sha-expect >>expect && + test_cmp <(sort expect) <(sort .git/filter-repo/commit-map) && + + cat <<-EOF | sort -k 3 >sha-expect && + ${FILE_D_CHANGE} ${NEW_FILE_D_CHANGE} $(git symbolic-ref HEAD) + ${FINAL_ORPHAN} ${FINAL_ORPHAN} refs/heads/orphan-me + ${ORIGINAL_TAG} ${FINAL_TAG} refs/tags/v1.0 + EOF + printf "%-40s %-40s %s\n" old new ref >expect && + cat sha-expect >>expect && + test_cmp expect .git/filter-repo/ref-map && + + touch -t 197001010000 .git/filter-repo/already_ran && + echo no | git filter-repo --invert-paths --path fileC --force && + FINAL_FILE_D_CHANGE=$(git rev-list -1 HEAD -- fileD) && + REALLY_FINAL_TAG=$(git rev-parse v1.0) && + + cat <<-EOF | sort >sha-expect && + ${FIRST_ORPHAN} ${FIRST_ORPHAN} + ${FINAL_ORPHAN} ${FINAL_ORPHAN} + ${FILE_A_CHANGE} ${FILE_A_CHANGE} + ${NEW_FILE_C_CHANGE} ${DELETED_SHA} + ${NEW_FILE_D_CHANGE} ${FINAL_FILE_D_CHANGE} + EOF + printf "%-40s %s\n" old new >expect && + cat sha-expect >>expect && + test_cmp <(sort expect) <(sort .git/filter-repo/commit-map) && + + cat <<-EOF | sort -k 3 >sha-expect && + ${NEW_FILE_D_CHANGE} ${FINAL_FILE_D_CHANGE} $(git symbolic-ref HEAD) + ${FINAL_ORPHAN} ${FINAL_ORPHAN} refs/heads/orphan-me + ${FINAL_TAG} ${REALLY_FINAL_TAG} refs/tags/v1.0 + EOF + printf "%-40s %-40s %s\n" old new ref >expect && + cat sha-expect >>expect && + test_cmp expect .git/filter-repo/ref-map + ) +' + +test_done diff --git a/t/t9393/simple b/t/t9393/simple new file mode 100644 index 00000000..3206e02c --- /dev/null +++ b/t/t9393/simple @@ -0,0 +1,96 @@ +feature done +# Simple repo with a few files, and two branches with no common history. +# Note that the original-oid directives are very fake, but make it easy to +# track things. +blob +mark :1 +original-oid 0000000000000000000000000000000000000001 +data 16 +file 1 contents + +blob +mark :2 +original-oid 0000000000000000000000000000000000000002 +data 16 +file 2 contents + +blob +mark :3 +original-oid 0000000000000000000000000000000000000003 +data 16 +file 3 contents + +blob +mark :4 +original-oid 0000000000000000000000000000000000000004 +data 16 +file 4 contents + +reset refs/heads/orphan-me +commit refs/heads/orphan-me +mark :5 +original-oid 0000000000000000000000000000000000000009 +author Little O. Me 1535228562 -0700 +committer Little O. Me 1535228562 -0700 +data 8 +Initial +M 100644 :1 nuke-me + +commit refs/heads/orphan-me +mark :6 +original-oid 000000000000000000000000000000000000000A +author Little 'ol Me 1535229544 -0700 +committer Little 'ol Me 1535229544 -0700 +data 9 +Tweak it +from :5 +M 100644 :4 nuke-me + +reset refs/heads/master +commit refs/heads/master +mark :7 +original-oid 000000000000000000000000000000000000000B +author Little O. Me 1535229523 -0700 +committer Little O. Me 1535229523 -0700 +data 15 +Initial commit +M 100644 :1 fileA + +commit refs/heads/master +mark :8 +original-oid 000000000000000000000000000000000000000C +author Lit.e Me 1535229559 -0700 +committer Lit.e Me 1535229580 -0700 +data 10 +Add fileB +from :7 +M 100644 :2 fileB + +commit refs/heads/master +mark :9 +original-oid 000000000000000000000000000000000000000D +author Little Me 1535229601 -0700 +committer Little Me 1535229601 -0700 +data 10 +Add fileC +from :8 +M 100644 :3 fileC + +commit refs/heads/master +mark :10 +original-oid 000000000000000000000000000000000000000E +author Little Me 1535229618 -0700 +committer Little Me 1535229618 -0700 +data 10 +Add fileD +from :9 +M 100644 :4 fileD + +tag v1.0 +from :10 +original-oid 000000000000000000000000000000000000000F +tagger Little John 1535229637 -0700 +data 5 +v1.0 + +done