Skip to content

Latest commit

 

History

History
51 lines (33 loc) · 871 Bytes

README.md

File metadata and controls

51 lines (33 loc) · 871 Bytes

unppt

An extremely fast tool which can extract text from MS-PPT file

Install

  git clone  https://github.com:icecraft/unppt
  cd unppt && pip install -r requirements.txt
  chmod +x bin/extractor

Usage

Usage: main.py [OPTIONS]

Options:
  -p, --path PATH    ppt file path or directory which contains ppt files
                     [required]
  -o, --output PATH  output directory  [required]
  --help             Show this message and exit.
  python main.py -p some.ppt -o output 

or

  python main.py -p some_directory -o output

TODOS

[ ] extract photos, such as bmp, jpg, png ...etc

[ ] extract audio and video

Metric

page count time cost (second) speed
20 0.166 120 page/sec
183 7.0 26 page/sec
260 2.69 96 page/sec