Skip to content

Latest commit

 

History

History
46 lines (32 loc) · 1.35 KB

README.md

File metadata and controls

46 lines (32 loc) · 1.35 KB

Extract emails from rtf,txt,text,doc,docx and PDF file

  • Install Python3 and Pip3
  • pip3 install -r requirements.txt
  • python extract_emails.py --help Note: If your file has doc extension then you must have
  • On windows you must install pypiwin32
  • On Linux or Mac Install Libre Office

pypiwin32 is Windows python module so ignore install error on linux based os.

Options

  • --dir option to provide the directory/folder absolute path, default is current folder
  • --file option to scan only one file
  • --ext option to restrict the scanning of file extensions, default all supported extensions
  • --dst option to set the output file name, by default it will print on the console

NOTE: Change output file for each run otherwise it will overwrite the existing results.

Usage

Extract emails from a specific file xyz.pdf

python extract_emails.py --file=xyz.pdf --dst=emails.txt

Extract emails from all files from a folder/directory XYZ

python extract_emails.py --dir=XYZ --dst=emails.txt

While scanning a folder/directory you can specify file extensions as well, for example it should only scan pdf files then do

python extract_emails.py --dir=XYZ --dst=emails.txt --ext pdf

Scan directory but only parse doc and pdf files

python extract_emails.py --dir=XYZ --dst=emails.txt --ext pdf doc