Ranin-Karp and Naive Approach
import random
import timeit
def generate_dna_sequence(length):
bases = ["A", "T", "G", "C"]
return "".join([random.choice(bases) for _ in range(length)])
def naive_search(text, pattern):
for i in range(len(text) - len(pattern) + 1):
if text[i:i+len(pattern)] == pattern:
return i
return None
def rabin_karp(text, pattern):
p_hash = hash(pattern)
for i in range(len(text) - len(pattern) + 1):
t_hash = hash(text[i:i+len(pattern)])
if t_hash == p_hash and text[i:i+len(pattern)] == pattern:
return i
return None
dna_sequence = generate_dna_sequence(1000000)
pattern = generate_dna_sequence(10000)
print("Naive search took", timeit.timeit(lambda: naive_search(dna_sequence, pattern), number=5), "seconds.")
print("Rabin-Karp took", timeit.timeit(lambda: rabin_karp(dna_sequence, pattern), number=5), "seconds.")
I am trying to compare the time it takes to search a DNA sequence using Rabin-Karp and Naive approach. Theoretically, Rabin-Karp is supposed to be a faster algorithm but when I run my code, the Naive approach seems to be doing better for any length pattern and dna sequence to be searched. I have tried using different hashing functions to bring down the time but I got the best results using python's inbuilt hash function. What could I be doing wrong? I am thankful for any help
I think "hash" is not doing what you expect it do.
check out this : https://www.geeksforgeeks.org/python-program-for-rabin-karp-algorithm-for-pattern-searching/
i tried it with your example and it was faster than the naive approach.
Naive search took 1.5969108999997843 seconds.
Rabin-Karp took 10.925225599989062 seconds.
RK from link above (after converting to python3) took 1.4543468999909237 seconds.
Thank you very much for your help, I have realized that a specific type of hash function has to be used and not any hash function