Skip to content

Repository containing 4 possible approaches for entity resolution on one of ICIJ datasets containing personal information of people involved in Panama Papers leak.

Notifications You must be signed in to change notification settings

LucaCellamare/ER-panama-papers

Repository files navigation

ER-panama-papers

Repository containing 4 possible approaches for entity resolution on one of ICIJ datasets containing personal information of people involved in Panama Papers leak.

The 4 proposed methods are Fuzzy 1 feat, TFIDF 1 feat, TFIDF 4 feats, TFIDF 4 feats+FUZZY 1 feat (Hybrid method) and are explained in the report inside the repository and you can find also the notebooks to run this experiments.

You can also find "check_one_person_4_feats.py" and "check_one_person_1_feat.ipynb" which are simple and interactive frameworks to search personal information inside our subset of the Panama Papers in less than 3 seconds!

About

Repository containing 4 possible approaches for entity resolution on one of ICIJ datasets containing personal information of people involved in Panama Papers leak.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published