• notice
  • Congratulations on the launch of the Sought Tech site

Understanding of next array in KMP algorithm

The meaning of next array

Here next[j] = k; then the light blue area in front of k is the same as the light blue area in front of j;

next[j] means that when the string at position j does not match the main string, the next string that needs to be compared with the main string is at next[j]; there is the following figure:

If the current position j does not match a character in the main string, the next comparison is between K and the current position of the main string, and this K is next[j]; since the two light blue areas are the same, the area in front of K is the same. It must be the same as the main string and does not need to be compared; as shown below:

As can be seen from the above figure, the area in front of K does not need to be compared;

Derivation of next array

From the meaning expressed by the next array, we can ask next[j], we must first find a K, the light blue area in front of this K is the same as the light blue area in front of j; as shown below:

According to the regulations next[1] = 0; then find other next[j];

For next[2], it must be next[2] = 1;

As shown in the figure below: when the second element does not match, j will return to 1 for comparison, so next[2] must be 1;

Next is the derivation of the general case. Here, the recursive method is used for derivation, that is, given next[j], find next[j + 1];

If next[j] = k, there is the following figure:

Since next[j = k, it can be seen that the light blue part is the same; the next two cases are discussed;

  • ch[K] == ch[j], in this case, the following figure can be obtained;

It can be seen from the figure: for j + 1, a K + 1 can be found such that the light blue area is the same, then when j + 1 does not match, the next time K + 1 and the main string will be compared; therefore next[j + 1] = K + 1 = next[j] + 1;

  • ch[K] != ch[j], this situation becomes complicated, which is also the most difficult part to understand in the whole KMP algorithm;

    As can be seen from the beginning of this section, the most critical point in finding next[j + 1] is to find how long the suffix and prefix match before j + 1, that is, how much light blue area matches; what we are facing now The picture is as follows:

Our aim is to find a K1 such that: find K1 such that the light blue parts are the same;

To make the light blue part the same, divide it into two parts so that 1 and 2 are the same, and K1 and j are the same;

Trying to make 1 and 2 the same is difficult to compare, but it can be transformed into another problem, as shown below:

Trying to find the same area as 1 and 3 is equivalent to finding the same area as 1 and 2; why? Because next[j] = K, the front of j is the same as the front of K as shown below:

This equivalence relationship is very important and is the key to this part of the derivation; it is extracted separately as shown below:

So how to get K1 such that 1 and 2 are the same? Returning to the meaning represented by next[j] at the beginning of the text, next[j] = k; then the light blue area in front of k is the same as the light blue area in front of j, and next[K] is in front of j. Known, so we can get K1 = next[K], the K1 obtained at this time can satisfy 1 and 3 the same;

At this point, the problem that 1 and 3 are equal has been solved. If K1 and j are the same directly, the following figure can be obtained;

那么 next[j + 1] = K1 + 1 = next[K] + 1 = next[next[j]] + 1;

So what if ch[j] != ch[K1]? Then it evolves into the following problem:

This picture is the same as the picture at the beginning of this section, so follow this method to solve it;

可得结果:next[j + 1] = next[K1] + 1 = next[next[K]] + 1 = next[next[next[j]]] + 1

If the next time K2 is still not equal to j, then the recursion can continue; until Kn = 0;

one example

Next, use the above conclusion to calculate the next array of a string;

There is an array ababaaababaa transformed into the following table:

Sababaaababaa
serial number123456789101112
next011234223456

Fill in the next column of the form in the following order:

  • For next[1], it is specified as 0, according to the previous analysis: next[2] = 1;

  • For next[3], observe that 2 and next[2] = 1, that is, b and a are not equal; and next[next[2]] = 0, so next[3] = 1;

  • For next[4], observe 3 and next[3] = 1, that is, a and a are equal, so next[4] = next[3] + 1 = 2;

  • For next[5], observe 4 and next[4] = 2, that is, b and b are equal, so next[5] = next[4] + 1 = 3;

  • For next[6], observe 5 and next[5] = 3, that is, a and a are equal, so next[6] = next[5] + 1 = 4;

  • For next[7], observe that 6 and next[6] = 4, namely a and b, are not equal, next[next[6]] = 2, compare with 6, that is, b and a, are not equal, continue recursion next[next [next[6]]] = next[next[4]] = next[2] = 1; compares 1 and 6 i.e. a and a, equal, so next[6] = next[next[next[6]] + 1 = 2;

  • For next[8], observe that 7 and next[7] = 2, i.e. a and b, are not equal, next[next[7]] = 1, 1 and 7 are equal, so next[8] = next[next[7 ]] + 1 = next[2] + 1 = 2;

  • For next[9], observe 8 and next[8] = 2, that is, b and b are equal, so next[9] = next[8] + 1 = 3;

  • For next[10], observe 9 and next[9] = 1, that is, a and a are equal, so next[10] = next[9] + 1 = 4;

  • For next[11], observe 10 and next[10] = 2, that is, b and b are equal, so next[11] = next[10] + 1 = 5;

  • For next[12], observe 11 and next[11] = 3, that is, a and a are equal, so next[12] = next[11] + 1 = 6;

Code

The code for obtaining the next array obtained through the above analysis is as follows:

void get_next(String T, int next[]) {
    int k = 0, j = 1;
    next[1] = 0;
    while (j < T.length)
    {
        if(k == 0 || T.ch[k] == T.ch[j]) {
            k++;
            j++;
            next[j] = k;
        } else {
            k = next[k];
        }
    }}

Then the following KMP algorithm code is easier:

int KMP(String S, String T, int next[]) {
    int i = 1, j = 1;
    while (i <= S.length && j <= T.length)
    {
        if(j == 0 || S.ch[i] == T.ch[j]) {
            i++;
            j++;
        } else {
            j = next[j];
        }
    }
    if(j > T.length) {
        return i - T.length;
    } else {
        return 0;
    }}

The test code is as follows:

#include<iostream>using namespace std;const int MAX = 255;typedef struct {
    char ch[MAX];
    int length;} String;void InitiString(String &s, char chars[]) {
    int len = 0;
    while(chars[len] != '\0') {
        s.ch[len + 1] = chars[len];
        len++;
    }
    s.length = len;}void get_next(String T, int next[]) {
    int k = 0, j = 1;
    next[1] = 0;
    while (j < T.length)
    {
        if(k == 0 || T.ch[k] == T.ch[j]) {
            k++;
            j++;
            next[j] = k;
        } else {
            k = next[k];
        }
    }}int KMP(String S, String T, int next[]) {
    int i = 1, j = 1;
    while (i <= S.length && j <= T.length)
    {
        if(j == 0 || S.ch[i] == T.ch[j]) {
            i++;
            j++;
        } else {
            j = next[j];
        }
    }
    if(j > T.length) {
        return i - T.length;
    } else {
        return 0;
    }}int main() {
    char char1[20] = "aabaabaabaac";
    char char2[20] = "aabaac";
    String S, T;
    InitiString(S, char1);
    InitiString(T, char2);
    int next[MAX];
    get_next(T, next);
    int index  = KMP(S, T, next);
    printf("%d", index);
    return 0;}

Output result:

7


Tags

Technical otaku

Sought technology together

Related Topic

0 Comments

Leave a Reply

+