You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We consider the example of W = "ABCDABD" first. We will see that it follows much the same pattern as the main search, and is efficient for similar reasons. We set T[0] = -1. To find T[1], we must discover a proper suffix of "A" which is also a prefix of pattern W. But there are no proper suffixes of "A", so we set T[1] = 0. To find T[2], we see that the substring W[0] - W[1] ("AB") has a proper suffix "B". However "B" is not a prefix of the pattern W. Therefore, we set T[2] = 0.
Continuing to T[3], we first check the proper suffix of length 1, and as in the previous case it fails. Should we also check longer suffixes? No, we now note that there is a shortcut to checking all suffixes: let us say that we discovered a proper suffix which is a proper prefix (A proper prefix of a string is not equal to the string itself) and ending at W[2] with length 2 (the maximum possible); then its first character is also a proper prefix of W, hence a proper prefix itself, and it ends at W[1], which we already determined did not occur as T[2] = 0 and not T[2] = 1. Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m+1 only if a valid suffix of size m was found at the previous stage (i.e. T[x] = m) and should not bother to check m+2, m+3, etc.
Therefore, we need not even concern ourselves with substrings having length 2, and as in the previous case the sole one with length 1 fails, so T[3] = 0.
We pass to the subsequent W[4], 'A'. The same logic shows that the longest substring we need consider has length 1, and as in the previous case it fails since "D" is not a prefix of W. But instead of setting T[4] = 0, we can do better by noting that W[4] = W[0], and also that a look-up of T[4] implies the corresponding S character, S[m+4], was a mismatch and therefore S[m+4] ≠ 'A'. Thus there is no point in restarting the search at S[m+4]; we should begin 1 ahead. This means that we may shift pattern W by match length plus one character, so T[4] = -1.
Considering now the next character, W[5], which is 'B': though by inspection the longest substring would appear to be 'A', we still set T[5] = 0. The reasoning is similar to why T[4] = -1. W[5] itself extends the prefix match begun with W[4], and we can assume that the corresponding character in S, S[m+5] ≠ 'B'. So backtracking before W[5] is pointless, but S[m+5] may be 'A', hence T[5] = 0.
Finally, we see that the next character in the ongoing segment starting at W[4] = 'A' would be 'B', and indeed this is also W[5]. Furthermore, the same argument as above shows that we need not look before W[4] to find a segment for W[6], so that this is it, and we take T[6] = 2.
i
0
1
2
3
4
5
6
A
B
C
D
A
B
D
-1
0
0
0
-1
0
2
i
0
1
2
3
4
5
6
7
8
A
B
A
C
A
B
A
B
C
-1
0
-1
1
-1
0
-1
3
2
The text was updated successfully, but these errors were encountered:
部分匹配表
难点:理解T[4], T[5]
We consider the example of W = "ABCDABD" first. We will see that it follows much the same pattern as the main search, and is efficient for similar reasons. We set T[0] = -1. To find T[1], we must discover a proper suffix of "A" which is also a prefix of pattern W. But there are no proper suffixes of "A", so we set T[1] = 0. To find T[2], we see that the substring W[0] - W[1] ("AB") has a proper suffix "B". However "B" is not a prefix of the pattern W. Therefore, we set T[2] = 0.
Continuing to T[3], we first check the proper suffix of length 1, and as in the previous case it fails. Should we also check longer suffixes? No, we now note that there is a shortcut to checking all suffixes: let us say that we discovered a proper suffix which is a proper prefix (A proper prefix of a string is not equal to the string itself) and ending at W[2] with length 2 (the maximum possible); then its first character is also a proper prefix of W, hence a proper prefix itself, and it ends at W[1], which we already determined did not occur as T[2] = 0 and not T[2] = 1. Hence at each stage, the shortcut rule is that one needs to consider checking suffixes of a given size m+1 only if a valid suffix of size m was found at the previous stage (i.e. T[x] = m) and should not bother to check m+2, m+3, etc.
Therefore, we need not even concern ourselves with substrings having length 2, and as in the previous case the sole one with length 1 fails, so T[3] = 0.
We pass to the subsequent W[4], 'A'. The same logic shows that the longest substring we need consider has length 1, and as in the previous case it fails since "D" is not a prefix of W. But instead of setting T[4] = 0, we can do better by noting that W[4] = W[0], and also that a look-up of T[4] implies the corresponding S character, S[m+4], was a mismatch and therefore S[m+4] ≠ 'A'. Thus there is no point in restarting the search at S[m+4]; we should begin 1 ahead. This means that we may shift pattern W by match length plus one character, so T[4] = -1.
Considering now the next character, W[5], which is 'B': though by inspection the longest substring would appear to be 'A', we still set T[5] = 0. The reasoning is similar to why T[4] = -1. W[5] itself extends the prefix match begun with W[4], and we can assume that the corresponding character in S, S[m+5] ≠ 'B'. So backtracking before W[5] is pointless, but S[m+5] may be 'A', hence T[5] = 0.
Finally, we see that the next character in the ongoing segment starting at W[4] = 'A' would be 'B', and indeed this is also W[5]. Furthermore, the same argument as above shows that we need not look before W[4] to find a segment for W[6], so that this is it, and we take T[6] = 2.
The text was updated successfully, but these errors were encountered: