Skip to content

daxus4/kmp_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KMP Project

Pattern-matching (PM) algorithms can find occurrences of a pattern string (P) in another string (S), which is usually orders of magnitude longer. With the advent of bioinformatics, the need for PM algorithms that can work in linear time and memory complexity increased because there are applications that require to match multiple patterns on massive strings, for example the entire human genome. Knutt, Morris and Pratt developed a PM algorithm, often called KMP in literature, that, by building an index of P, has linear time and memory performance [1].

In this project I reproduced the algorithm in Python and then implementing a parallel version that split S in overlapped substring and can run an instance of KMP on each substring. The substrings are overlapped because occurrences of the pattern can be found straddling two strings and only if those two string overlap with a determined number of character this algorithm can work correctly.

I tested both the standard and the parallel version, with particular attention when I was dealing with the problem mentioned above.

[1] Knuth, Donald; Morris, James H.; Pratt, Vaughan (1977). "Fast pattern matching in strings". SIAM Journal on Computing. 6 (2): 323–350. CiteSeerX 10.1.1.93.8147. doi:10.1137/0206024.

About

kmp algorithm implemented in python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published