Understanding of next array in KMP algorithm
The meaning of next array
Here next[j] = k; then the light blue area in front of k is the same as the light blue area in front of j;
next[j] means that when the string at position j does not match the main string, the next string that needs to be compared with the main string is at next[j]; there is the following figure:
If the current position j does not match a character in the main string, the next comparison is between K and the current position of the main string, and this K is next[j]; since the two light blue areas are the same, the area in front of K is the same. It must be the same as the main string and does not need to be compared; as shown below:
As can be seen from the above figure, the area in front of K does not need to be compared;
Derivation of next array
From the meaning expressed by the next array, we can ask next[j], we must first find a K, the light blue area in front of this K is the same as the light blue area in front of j; as shown below:
According to the regulations next[1] = 0; then find other next[j];
For next[2], it must be next[2] = 1;
As shown in the figure below: when the second element does not match, j will return to 1 for comparison, so next[2] must be 1;
Next is the derivation of the general case. Here, the recursive method is used for derivation, that is, given next[j], find next[j + 1];
If next[j] = k, there is the following figure:
Since next[j = k, it can be seen that the light blue part is the same; the next two cases are discussed;
ch[K] == ch[j], in this case, the following figure can be obtained;
It can be seen from the figure: for j + 1, a K + 1 can be found such that the light blue area is the same, then when j + 1 does not match, the next time K + 1 and the main string will be compared; therefore next[j + 1] = K + 1 = next[j] + 1;
ch[K] != ch[j], this situation becomes complicated, which is also the most difficult part to understand in the whole KMP algorithm;
As can be seen from the beginning of this section, the most critical point in finding next[j + 1] is to find how long the suffix and prefix match before j + 1, that is, how much light blue area matches; what we are facing now The picture is as follows:
Our aim is to find a K1 such that: find K1 such that the light blue parts are the same;
To make the light blue part the same, divide it into two parts so that 1 and 2 are the same, and K1 and j are the same;
Trying to make 1 and 2 the same is difficult to compare, but it can be transformed into another problem, as shown below:
Trying to find the same area as 1 and 3 is equivalent to finding the same area as 1 and 2; why? Because next[j] = K, the front of j is the same as the front of K as shown below:
This equivalence relationship is very important and is the key to this part of the derivation; it is extracted separately as shown below:
So how to get K1 such that 1 and 2 are the same? Returning to the meaning represented by next[j] at the beginning of the text, next[j] = k; then the light blue area in front of k is the same as the light blue area in front of j, and next[K] is in front of j. Known, so we can get K1 = next[K], the K1 obtained at this time can satisfy 1 and 3 the same;
At this point, the problem that 1 and 3 are equal has been solved. If K1 and j are the same directly, the following figure can be obtained;
那么 next[j + 1] = K1 + 1 = next[K] + 1 = next[next[j]] + 1;
So what if ch[j] != ch[K1]? Then it evolves into the following problem:
This picture is the same as the picture at the beginning of this section, so follow this method to solve it;
可得结果:next[j + 1] = next[K1] + 1 = next[next[K]] + 1 = next[next[next[j]]] + 1
If the next time K2 is still not equal to j, then the recursion can continue; until Kn = 0;
one example
Next, use the above conclusion to calculate the next array of a string;
There is an array ababaaababaa transformed into the following table:
serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
next | 0 | 1 | 1 | 2 | 3 | 4 | 2 | 2 | 3 | 4 | 5 | 6 |
Fill in the next column of the form in the following order:
For next[1], it is specified as 0, according to the previous analysis: next[2] = 1;
For next[3], observe that 2 and next[2] = 1, that is, b and a are not equal; and next[next[2]] = 0, so next[3] = 1;
For next[4], observe 3 and next[3] = 1, that is, a and a are equal, so next[4] = next[3] + 1 = 2;
For next[5], observe 4 and next[4] = 2, that is, b and b are equal, so next[5] = next[4] + 1 = 3;
For next[6], observe 5 and next[5] = 3, that is, a and a are equal, so next[6] = next[5] + 1 = 4;
For next[7], observe that 6 and next[6] = 4, namely a and b, are not equal, next[next[6]] = 2, compare with 6, that is, b and a, are not equal, continue recursion next[next [next[6]]] = next[next[4]] = next[2] = 1; compares 1 and 6 i.e. a and a, equal, so next[6] = next[next[next[6]] + 1 = 2;
For next[8], observe that 7 and next[7] = 2, i.e. a and b, are not equal, next[next[7]] = 1, 1 and 7 are equal, so next[8] = next[next[7 ]] + 1 = next[2] + 1 = 2;
For next[9], observe 8 and next[8] = 2, that is, b and b are equal, so next[9] = next[8] + 1 = 3;
For next[10], observe 9 and next[9] = 1, that is, a and a are equal, so next[10] = next[9] + 1 = 4;
For next[11], observe 10 and next[10] = 2, that is, b and b are equal, so next[11] = next[10] + 1 = 5;
For next[12], observe 11 and next[11] = 3, that is, a and a are equal, so next[12] = next[11] + 1 = 6;
Code
The code for obtaining the next array obtained through the above analysis is as follows:
void get_next(String T, int next[]) {
int k = 0, j = 1;
next[1] = 0;
while (j < T.length)
{
if(k == 0 || T.ch[k] == T.ch[j]) {
k++;
j++;
next[j] = k;
} else {
k = next[k];
}
}}
Then the following KMP algorithm code is easier:
int KMP(String S, String T, int next[]) {
int i = 1, j = 1;
while (i <= S.length && j <= T.length)
{
if(j == 0 || S.ch[i] == T.ch[j]) {
i++;
j++;
} else {
j = next[j];
}
}
if(j > T.length) {
return i - T.length;
} else {
return 0;
}}
The test code is as follows:
#include<iostream>using namespace std;const int MAX = 255;typedef struct {
char ch[MAX];
int length;} String;void InitiString(String &s, char chars[]) {
int len = 0;
while(chars[len] != '\0') {
s.ch[len + 1] = chars[len];
len++;
}
s.length = len;}void get_next(String T, int next[]) {
int k = 0, j = 1;
next[1] = 0;
while (j < T.length)
{
if(k == 0 || T.ch[k] == T.ch[j]) {
k++;
j++;
next[j] = k;
} else {
k = next[k];
}
}}int KMP(String S, String T, int next[]) {
int i = 1, j = 1;
while (i <= S.length && j <= T.length)
{
if(j == 0 || S.ch[i] == T.ch[j]) {
i++;
j++;
} else {
j = next[j];
}
}
if(j > T.length) {
return i - T.length;
} else {
return 0;
}}int main() {
char char1[20] = "aabaabaabaac";
char char2[20] = "aabaac";
String S, T;
InitiString(S, char1);
InitiString(T, char2);
int next[MAX];
get_next(T, next);
int index = KMP(S, T, next);
printf("%d", index);
return 0;}
Output result:
7
0 Comments