A tool that helps detect plagiarism among documents using MPI and pThreads.
mpicc *.c -o copyBGone
mpirun -np <num-ranks> ./copyBGone [-k|--kgram <num-kgrams>] [-t|--threads <num-threads-per-rank>] [-w|--window <window-size>] file1 [file2...]
Currently, this program only works with text documents. The goal is to generalize it and use it with code, images, and even audio. To accomplish that, tokenizers are necessary to parse the data.