feature proposal: heaviest common subsequence algorithm for linear token diffs #772

dgetu · 2024-10-12T20:19:33Z

I saw that this wiki page describes a problem with the LCS approach to linear token diffs. This paper seems to have a solution for the related HCS problem. If tokens are weighted by the number of characters they contain (maybe with keywords at a reduced weight and whitespace at zero weight), I'd think that would result in good diffs in most cases. Per-language weight tuning would be possible as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature proposal: heaviest common subsequence algorithm for linear token diffs #772

feature proposal: heaviest common subsequence algorithm for linear token diffs #772

dgetu commented Oct 12, 2024

feature proposal: heaviest common subsequence algorithm for linear token diffs #772

feature proposal: heaviest common subsequence algorithm for linear token diffs #772

Comments

dgetu commented Oct 12, 2024