KMP Project

Pattern-matching (PM) algorithms can find occurrences of a pattern string (P) in another string (S), which is usually orders of magnitude longer. With the advent of bioinformatics, the need for PM algorithms that can work in linear time and memory complexity increased because there are applications that require to match multiple patterns on massive strings, for example the entire human genome. Knutt, Morris and Pratt developed a PM algorithm, often called KMP in literature, that, by building an index of P, has linear time and memory performance [1].

In this project I reproduced the algorithm in Python and then implementing a parallel version that split S in overlapped substring and can run an instance of KMP on each substring. The substrings are overlapped because occurrences of the pattern can be found straddling two strings and only if those two string overlap with a determined number of character this algorithm can work correctly.

I tested both the standard and the parallel version, with particular attention when I was dealing with the problem mentioned above.

[1] Knuth, Donald; Morris, James H.; Pratt, Vaughan (1977). "Fast pattern matching in strings". SIAM Journal on Computing. 6 (2): 323–350. CiteSeerX 10.1.1.93.8147. doi:10.1137/0206024.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
kmp.ipynb		kmp.ipynb
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KMP Project

About

Releases

Packages

Languages

License

daxus4/kmp_project

Folders and files

Latest commit

History

Repository files navigation

KMP Project

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages