Repository containing 4 possible approaches for entity resolution on one of ICIJ datasets containing personal information of people involved in Panama Papers leak.
The 4 proposed methods are Fuzzy 1 feat, TFIDF 1 feat, TFIDF 4 feats, TFIDF 4 feats+FUZZY 1 feat (Hybrid method) and are explained in the report inside the repository and you can find also the notebooks to run this experiments.
You can also find "check_one_person_4_feats.py" and "check_one_person_1_feat.ipynb" which are simple and interactive frameworks to search personal information inside our subset of the Panama Papers in less than 3 seconds!