This project explores the use of ML in the legal sector.
- A legal document is provided in pdf format. Usually multiple pages.
- The function extracts the texts in the legal document (it can be modified to extract images too but this is beyond the scope of this work)
- A series of fucntions utilizes list comprehensions and regex to clean the texts.
- (a.) Topic modelling is carried out on the clean texts to extract subject-matter and themes of the document using LatentDirichletAllocation (LDA). (b.) A soft-text summarization is also carried out, supported by wordcloud display.