TF_IDF_search_engine

src/tf_idf_search.py:

This search engine only use TF-IDF measure with provided documents, query and desired range of results. A default document, data/train.txt, is fed if no assigned document.

Result:

s = TfIdfSearch(doc=(pd.read_csv("../data/train.txt", sep=';').iloc[:, 0]), q="i am so excited to see it")
s.search(0, 3)

will return:

['i feel so excited about it', 'i think i was feeling so excited today', 'i feel so excited for college']

src/tf_idf_clustering.py:

Using Batch K-means algorithm to visualise the spread of data, data/train.txt.

Result:

Conclusion:

It can be seen the clusters are overlapping to each others. It can be inferred that this model's performance could be better if using different parameters. Other ways of classification with deep learning will be added in the future.

Acknowledgements:

Elvis - https://lnkd.in/eXJ8QVB & Hugging face team with License CC BY-SA 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TF_IDF_search_engine

src/tf_idf_search.py:

Result:

src/tf_idf_clustering.py:

Result:

Conclusion:

Acknowledgements:

About

Languages

xruifan/TF_IDF_search_engine

Folders and files

Latest commit

History

Repository files navigation

TF_IDF_search_engine

src/tf_idf_search.py:

Result:

src/tf_idf_clustering.py:

Result:

Conclusion:

Acknowledgements:

About

Topics

Resources

Stars

Watchers

Forks

Languages