Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize char before pattern lookup in FuzzyMatchV1 #4252

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

alexeisersun
Copy link
Contributor

There is an edge-case in FuzzyMatchV1 during backward scan, related to normalization: if string is initially denormalized (e.g. Unicode symbol), backward scan will proceed further to the next char; however, when the score is computed, the string is normalized first, then scanned based on the pattern. This leads to accessing pattern index increment, which itself leads to out-of-bound index access, resulting in a panic.

To illustrate the process, here's the sequence of operations when search is perfored:

  1. during backward scan by "minim" pattern
xxxxx Minímal example
      ^^^^^^^^^^^^
      ||||||||||||
      miniiiiiiiim <- compute score for this substring
  1. during compute score by "minim" pattern
      Minímal exam
      minimal exam <- normalize chars before computing the score
      ^^^^^^
      ||||||
      minim <- at this point the pattern is already fully scanned and index
              is out-of-the-bound

In this commit the char is normalized during backward scan, to detect properly the boundaries for the pattern.

@alexeisersun alexeisersun force-pushed the normalized-long-string branch from efe3ef2 to 102da78 Compare February 17, 2025 11:29
There is an edge-case in FuzzyMatchV1 during backward scan, related to
normalization: if string is initially denormalized (e.g. Unicode symbol),
backward scan will proceed further to the next char; however, when the
score is computed, the string is normalized first, then scanned based on
the pattern. This leads to accessing pattern index increment, which
itself leads to out-of-bound index access, resulting in a panic.

To illustrate the process, here's the sequence of operations when search
is perfored:

1. during backward scan by "minim" pattern

```
xxxxx Minímal example
      ^^^^^^^^^^^^
      ||||||||||||
      miniiiiiiiim <- compute score for this substring
```
2. during compute score by "minim" pattern
```
      Minímal exam
      minimal exam <- normalize chars before computing the score
      ^^^^^^
      ||||||
      minim <- at this point the pattern is already fully scanned and index
              is out-of-the-bound
```

In this commit the char is normalized during backward scan, to detect
properly the boundaries for the pattern.
@alexeisersun alexeisersun force-pushed the normalized-long-string branch from 102da78 to 93746a7 Compare February 17, 2025 11:32
@junegunn
Copy link
Owner

Thanks, I can reproduce the problem.

echo 'Minímal example' | fzf --algo=v1 --tiebreak=end -f minim
panic: runtime error: index out of range [5] with length 5

goroutine 33 [running]:
github.com/junegunn/fzf/src/algo.calculateScore(0x0, 0x1, 0x14000246008, {0x1400023e030, 0x5, 0x0?}, 0x0, 0xc, 0x0)
        github.com/junegunn/fzf/src/algo/algo.go:674 +0x5f0
github.com/junegunn/fzf/src/algo.FuzzyMatchV1(0x0, 0x1, 0x0, 0x14000246008, {0x1400023e030, 0x5, 0x5}, 0x0, 0x0?)
        github.com/junegunn/fzf/src/algo/algo.go:785 +0x470
github.com/junegunn/fzf/src.(*Pattern).iter(0x0?, 0x102b7aec8, {0x1400011cc38?, 0x0?, 0x0?}, 0x0, 0x1, 0x0, {0x1400023e030, 0x5, ...}, ...)
        github.com/junegunn/fzf/src/pattern.go:448 +0xc4
github.com/junegunn/fzf/src.(*Pattern).extendedMatch(0x140002300b0, 0x14000246008?, 0x0, 0x14000226180)
        github.com/junegunn/fzf/src/pattern.go:394 +0x35c
github.com/junegunn/fzf/src.(*Pattern).MatchItem(0x140002300b0, 0x14000246008, 0x0?, 0x0?)
        github.com/junegunn/fzf/src/pattern.go:347 +0x30
github.com/junegunn/fzf/src.(*Pattern).matchChunk(0x140002300b0, 0x14000246008, {0x0?, 0x102b30860?, 0x0?}, 0x14000226180)
        github.com/junegunn/fzf/src/pattern.go:308 +0x3ac
github.com/junegunn/fzf/src.(*Pattern).Match(0x140002300b0, 0x14000246008, 0x14000226180)
        github.com/junegunn/fzf/src/pattern.go:293 +0x88
github.com/junegunn/fzf/src.(*Matcher).scan.func1(0x0, 0x14000226180, {0x14000224010, 0x1, 0x0?})
        github.com/junegunn/fzf/src/matcher.go:186 +0xf8
created by github.com/junegunn/fzf/src.(*Matcher).scan in goroutine 1
        github.com/junegunn/fzf/src/matcher.go:181 +0x4a8

Just out of curiosity, how did you find this problem?

@junegunn junegunn added the bug label Feb 17, 2025
@junegunn junegunn merged commit 01d9d9c into junegunn:master Feb 17, 2025
5 checks passed
@junegunn
Copy link
Owner

Merged, thanks!

@alexeisersun
Copy link
Contributor Author

Just out of curiosity, how did you find this problem?

@junegunn It was quite a journey. 😄

After an upgrade of fzf (I guess it was an update from 0.49 to 0.59), the fuzzy search using Nvim, fzf and fzf.vim started to throw errors. Namely, if search was performed by term "cli, fzf.vim would report an error like Error running cd '/Users/alexeisersun/foobar' && '/Users/alexeisersun/.config/nvim/plugged/fzf/bin/fzf' --tmux 90%,70% .... For simplicity of debugging I switched Tmux off and debugged fzf Vim plugin relevant crash report from fzf binary, using on_stdout callback from Vim.
With Go backtraces at hands, I found out what was the offending line -- it turned out to be a line from a file with veeery long rows (60-70k symbols). With divide-and-conquer technique, I found the minimal substring that resulted in a crash. It was clear that some Unicode symbols were the offender -- the only question that was left is where the bug is. 😅
Initially I thought the bug was inside calculateScore, but sidx and eidx (which happened to be too big and cause the panic) were passed as arguments, so I searched 1 layer up. That's where I noticed that pattern lookup was normalized for forward scan, but denormalized for backwards one. And here we are.

@junegunn
Copy link
Owner

Wow, great job! Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants