Ranin-Karp and Naive Approach

import random
import timeit


def generate_dna_sequence(length):
    bases = ["A", "T", "G", "C"]
    return "".join([random.choice(bases) for _ in range(length)])

def naive_search(text, pattern):
    for i in range(len(text) - len(pattern) + 1):
        if text[i:i+len(pattern)] == pattern:
            return i
    return None

def rabin_karp(text, pattern):
    p_hash = hash(pattern)
    for i in range(len(text) - len(pattern) + 1):
        t_hash = hash(text[i:i+len(pattern)])
        if t_hash == p_hash and text[i:i+len(pattern)] == pattern:
            return i
    return None

dna_sequence = generate_dna_sequence(1000000)
pattern = generate_dna_sequence(10000)

print("Naive search took", timeit.timeit(lambda: naive_search(dna_sequence, pattern), number=5), "seconds.")
print("Rabin-Karp took", timeit.timeit(lambda: rabin_karp(dna_sequence, pattern), number=5), "seconds.")

I am trying to compare the time it takes to search a DNA sequence using Rabin-Karp and Naive approach. Theoretically, Rabin-Karp is supposed to be a faster algorithm but when I run my code, the Naive approach seems to be doing better for any length pattern and dna sequence to be searched. I have tried using different hashing functions to bring down the time but I got the best results using python's inbuilt hash function. What could I be doing wrong? I am thankful for any help

I think "hash" is not doing what you expect it do.

check out this : https://www.geeksforgeeks.org/python-program-for-rabin-karp-algorithm-for-pattern-searching/

i tried it with your example and it was faster than the naive approach.

Naive search took 1.5969108999997843 seconds.
Rabin-Karp took 10.925225599989062 seconds.
RK from link above (after converting to python3) took 1.4543468999909237 seconds.

Thank you very much for your help, I have realized that a specific type of hash function has to be used and not any hash function

Svara