Skip to content

Commit

Permalink
Fix an old bug that can manifest during factoring.
Browse files Browse the repository at this point in the history
`AddFoldedRange()` can terminate prematurely if the character class
already contains the rune. For example, if it contains `a` and we
want to add folded `a`, it sees `a` and stops without adding `A`.
To avoid that, we use an empty character class and then merge it.

Fixes #467.

Change-Id: I1f26b182e23b8a03da5d98107705bdf443d931d6
Reviewed-on: https://code-review.googlesource.com/c/re2/+/62270
Reviewed-by: Alex Chernyakhovsky <[email protected]>
Reviewed-by: Paul Wankadia <[email protected]>
  • Loading branch information
junyer committed Dec 12, 2023
1 parent 71857b0 commit 9d0b5bf
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 1 deletion.
12 changes: 11 additions & 1 deletion re2/parse.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1177,7 +1177,17 @@ void FactorAlternationImpl::Round3(Regexp** sub, int nsub,
for (CharClass::iterator it = cc->begin(); it != cc->end(); ++it)
ccb.AddRange(it->lo, it->hi);
} else if (re->op() == kRegexpLiteral) {
ccb.AddRangeFlags(re->rune(), re->rune(), re->parse_flags());
if (re->parse_flags() & Regexp::FoldCase) {
// AddFoldedRange() can terminate prematurely if the character class
// already contains the rune. For example, if it contains 'a' and we
// want to add folded 'a', it sees 'a' and stops without adding 'A'.
// To avoid that, we use an empty character class and then merge it.
CharClassBuilder tmp;
tmp.AddRangeFlags(re->rune(), re->rune(), re->parse_flags());
ccb.AddCharClass(&tmp);
} else {
ccb.AddRangeFlags(re->rune(), re->rune(), re->parse_flags());
}
} else {
LOG(DFATAL) << "RE2: unexpected op: " << re->op() << " "
<< re->ToString();
Expand Down
7 changes: 7 additions & 0 deletions re2/testing/parse_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,13 @@ Test prefix_tests[] = {
"cat{lit{a}alt{emp{}cat{str{ardvark}alt{emp{}lit{s}}}"
"cat{str{ba}alt{cat{lit{c}alt{cc{0x69 0x6b}cat{str{us}alt{emp{}str{es}}}}}"
"str{ft}cat{str{lone}alt{emp{}lit{s}}}}}}}" },
// As per https://github.com/google/re2/issues/467,
// these should factor identically, but they didn't
// because AddFoldedRange() terminated prematurely.
{ "0A|0[aA]", "cat{lit{0}cc{0x41 0x61}}" },
{ "0a|0[aA]", "cat{lit{0}cc{0x41 0x61}}" },
{ "0[aA]|0A", "cat{lit{0}cc{0x41 0x61}}" },
{ "0[aA]|0a", "cat{lit{0}cc{0x41 0x61}}" },
};

// Test that prefix factoring works.
Expand Down

0 comments on commit 9d0b5bf

Please sign in to comment.