## Understanding of next array in KMP algorithm

## The meaning of next array

Here next[j] = k; then the light blue area in front of k is the same as the light blue area in front of j;

next[j] means that when the string at position j does not match the main string, the next string that needs to be compared with the main string is at next[j]; there is the following figure:

If the current position j does not match a character in the main string, the next comparison is between K and the current position of the main string, and this K is next[j]; since the two light blue areas are the same, the area in front of K is the same. It must be the same as the main string and does not need to be compared; as shown below:

As can be seen from the above figure, the area in front of K does not need to be compared;

## Derivation of next array

From the meaning expressed by the next array, we can ask next[j], we must first find a K, the light blue area in front of this K is the same as the light blue area in front of j; as shown below:

According to the regulations next[1] = 0; then find other next[j];

For next[2], it must be next[2] = 1;

As shown in the figure below: when the second element does not match, j will return to 1 for comparison, so next[2] must be 1;

Next is the derivation of the general case. Here, the recursive method is used for derivation, that is, given next[j], find next[j + 1];

If next[j] = k, there is the following figure:

Since next[j = k, it can be seen that the light blue part is the same; the next two cases are discussed;

ch[K] == ch[j], in this case, the following figure can be obtained;

It can be seen from the figure: for j + 1, a K + 1 can be found such that the light blue area is the same, then when j + 1 does not match, the next time K + 1 and the main string will be compared; therefore next[j + 1] = K + 1 = next[j] + 1;

ch[K] != ch[j], this situation becomes complicated, which is also the most difficult part to understand in the whole KMP algorithm;

As can be seen from the beginning of this section, the most critical point in finding next[j + 1] is to find how long the suffix and prefix match before j + 1, that is, how much light blue area matches; what we are facing now The picture is as follows:

Our aim is to find a K1 such that: find K1 such that the light blue parts are the same;

To make the light blue part the same, divide it into two parts so that 1 and 2 are the same, and K1 and j are the same;

Trying to make 1 and 2 the same is difficult to compare, but it can be transformed into another problem, as shown below:

**Trying to find the same area as 1 and 3 is equivalent to finding the same area as 1 and 2;** why? Because next[j] = K, the front of j is the same as the front of K as shown below:

This equivalence relationship is very important and is the key to this part of the derivation; it is extracted separately as shown below:

So how to get K1 such that 1 and 2 are the same? Returning to the meaning represented by next[j] at the beginning of the text, **next[j] = k; then the light blue area in front of k is the same as the light blue area in front of j,** and next[K] is in front of j. Known, so we can get K1 = next[K], the K1 obtained at this time can satisfy 1 and 3 the same;

At this point, the problem that 1 and 3 are equal has been solved. If K1 and j are the same directly, the following figure can be obtained;

那么 next[j + 1] = K1 + 1 = next[K] + 1 = next[next[j]] + 1；

So what if ch[j] != ch[K1]? Then it evolves into the following problem:

This picture is the same as the picture at the beginning of this section, so follow this method to solve it;

可得结果：next[j + 1] = next[K1] + 1 = next[next[K]] + 1 = next[next[next[j]]] + 1

If the next time K2 is still not equal to j, then the recursion can continue; until Kn = 0;

## one example

Next, use the above conclusion to calculate the next array of a string;

There is an array ababaaababaa transformed into the following table:

serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |

next | 0 | 1 | 1 | 2 | 3 | 4 | 2 | 2 | 3 | 4 | 5 | 6 |

Fill in the next column of the form in the following order:

For next[1], it is specified as 0, according to the previous analysis: next[2] = 1;

For next[3], observe that 2 and next[2] = 1, that is, b and a are not equal; and next[next[2]] = 0, so next[3] = 1;

For next[4], observe 3 and next[3] = 1, that is, a and a are equal, so next[4] = next[3] + 1 = 2;

For next[5], observe 4 and next[4] = 2, that is, b and b are equal, so next[5] = next[4] + 1 = 3;

For next[6], observe 5 and next[5] = 3, that is, a and a are equal, so next[6] = next[5] + 1 = 4;

For next[7], observe that 6 and next[6] = 4, namely a and b, are not equal, next[next[6]] = 2, compare with 6, that is, b and a, are not equal, continue recursion next[next [next[6]]] = next[next[4]] = next[2] = 1; compares 1 and 6 i.e. a and a, equal, so next[6] = next[next[next[6]] + 1 = 2;

For next[8], observe that 7 and next[7] = 2, i.e. a and b, are not equal, next[next[7]] = 1, 1 and 7 are equal, so next[8] = next[next[7 ]] + 1 = next[2] + 1 = 2;

For next[9], observe 8 and next[8] = 2, that is, b and b are equal, so next[9] = next[8] + 1 = 3;

For next[10], observe 9 and next[9] = 1, that is, a and a are equal, so next[10] = next[9] + 1 = 4;

For next[11], observe 10 and next[10] = 2, that is, b and b are equal, so next[11] = next[10] + 1 = 5;

For next[12], observe 11 and next[11] = 3, that is, a and a are equal, so next[12] = next[11] + 1 = 6;

## Code

The code for obtaining the next array obtained through the above analysis is as follows:

void get_next(String T, int next[]) { int k = 0, j = 1; next[1] = 0; while (j < T.length) { if(k == 0 || T.ch[k] == T.ch[j]) { k++; j++; next[j] = k; } else { k = next[k]; } }}

Then the following KMP algorithm code is easier:

int KMP(String S, String T, int next[]) { int i = 1, j = 1; while (i <= S.length && j <= T.length) { if(j == 0 || S.ch[i] == T.ch[j]) { i++; j++; } else { j = next[j]; } } if(j > T.length) { return i - T.length; } else { return 0; }}

The test code is as follows:

#include<iostream>using namespace std;const int MAX = 255;typedef struct { char ch[MAX]; int length;} String;void InitiString(String &s, char chars[]) { int len = 0; while(chars[len] != '\0') { s.ch[len + 1] = chars[len]; len++; } s.length = len;}void get_next(String T, int next[]) { int k = 0, j = 1; next[1] = 0; while (j < T.length) { if(k == 0 || T.ch[k] == T.ch[j]) { k++; j++; next[j] = k; } else { k = next[k]; } }}int KMP(String S, String T, int next[]) { int i = 1, j = 1; while (i <= S.length && j <= T.length) { if(j == 0 || S.ch[i] == T.ch[j]) { i++; j++; } else { j = next[j]; } } if(j > T.length) { return i - T.length; } else { return 0; }}int main() { char char1[20] = "aabaabaabaac"; char char2[20] = "aabaac"; String S, T; InitiString(S, char1); InitiString(T, char2); int next[MAX]; get_next(T, next); int index = KMP(S, T, next); printf("%d", index); return 0;}

Output result:

7

## 0 Comments