Rabin-Karp , The String Matching Algorithm.

Bhakti Thaker
4 min readSep 8, 2019

--

The problem represents that: The TEXT is given and the Pattern is given and one need to find out that whether that Pattern is a subset of Text or we can say that ,whether that Pattern is a part of the Text or not?

There are various Algorithms For String Matching, just like those this is also one String Matching Algorithm.

This algorithm resolves the drawback of Naive String-Matching Algorithm.So here will see that actually it works practically.

- Hash Function: Using which we can generate the hash Code

- The code that we obtained from the Hash Function is called Hash Code.

- Here we have to find that whether pattern is presented inside the Text or not ?

HOW?

1> By assuming give the numeric value to the alphabets.

2> Then create the hash function using those values, using which we will obtain Hash Code.

3> Give the values to the Text characters individually according to that assumption Table Values into the Group of Three characters.

4> Using Hash Value generates the Hash code of the selected Text and compares it with the Hash Code of the Pattern.

5> If the Hash code of both pattern and text Matches that means — The particular Pattern is a part of the Text and pattern is present inside the Text itself.

6> But what if Don’t Matches then, slide to the Next character by subtracting the starting character and then again using hash function compare the hash code of the newly computed hash code with hash code of the pattern and so on.

7> Sliding is known as Rolling Hash Function

Lets catch other examples to know its drawback.

Drawback:

  • Just by applying the hash function based on the table created , then sometimes even though Hash Code matched of both pattern and Text but the charcters of the Pattern dont matches to the Text.

(Just like in above example [d+b+a=7] also [c+c+a=7] but that doesn’t mean that the pattern is matched with the Text)

So here, even though the hash code matches but the pattern wont so this condition is known as SPURIOUS HIT. That is the waste of hit.

Its complexity: theta(m*n) of worst case

Solution:

So as we saw in above example that, how there occur spurious hits due to Weak Hashing Function so to avoid that we must use some Strong Hashing function.

HOW the strong hashing function is created:

1> Do use the Base and its Power. What base value one need to select is totally depends on the our self but the Power value must be in decreasing order from (total no of pattern characters -1) up to 0. If the pattern is d b a then there can be used 10² then 10¹ then 10⁰ and like wise.

(P[1]*(10^(m-1))) + (p[2]* (10^(m-2))) + (p[2]* (10^(m-3)))

this is the formula can be applied if m=3 .that is, if pattern contains 3 characters.

By doing this to each and every Text in the pair of 3 . If one gets the match of the hash code of both Pattern and Text , then that means Pattern is found inside the Text and If NOT then Go for Rolling Hash Function, that slides one character to the next, again comparing the newly generated Hash Code with the Hash Code of the pattern.

Rolling Hash Function:[331 — (3 * (10²) ) ] * 10 + ( 3 * (10²) )

(P[1]*(10^(m-1))) + (p[2]* (10^(m-2))) + (p[2]* (10^(m-3)))

then for sliding to next do follow following steps:

i> subtract the value of 1st character from the summed value:

i.e. if ,(P[1]*(10^(m-1))) + (p[2]* (10^(m-2))) + (p[2]* (10^(m-3))) = x

then do[ x-(P[1]*(10^(m-1)))] where x= summation of first 3 character of the Text

ii> multiply the obtained answer from step 1 with 10

iii> add (p[4]* (10^(m-1))) to the answer obtained from step 2.

By following this steps of rolling hash function , all the characters of the text would be able to slide the characters to the next character and using Strong Hash Function the Pattern would be compared to the Text.

Time complexity: orderOf(n -(m+1))

Additional: If the base ,that are chosen are larger values the obviously the hash code we obtain will also be greater in number. To avoid that we can also use MODULO to the answer obtained to reduce the length. But by doing that it can also occur Spuriuos hits which can cause again the Time Complexity of (m*n)

This is how one do String ,Matching by using String Matching algorithm Like “Rabin Karp String Matching Algorithm”

--

--