Skip to content
/ dotext Public

Simple Document File Text Extraction Library for Rust

License

Notifications You must be signed in to change notification settings

anvie/dotext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document File Text Extractor

Build Status Build status Crates.io

Simple Rust library to extract readable text from specific document format like Word Document (docx). Currently only support several format, other format coming soon.

Supported Document

  • Microsoft Word (docx)
  • Microsoft Excel (xlsx)
  • Microsoft Power Point (pptx)
  • OpenOffice Writer (odt)
  • OpenOffice Spreadsheet (ods)
  • OpenDocument Presentation (odp)
  • PDF

Usage

let mut file = Docx::open("samples/sample.docx").unwrap();
let mut isi = String::new();
let _ = file.read_to_string(&mut isi);
println!("CONTENT:");
println!("----------BEGIN----------");
println!("{}", isi);
println!("----------EOF----------");

Test

$ cargo test

or run example:

$ cargo run --example readdocx data/sample.docx

[] Robin Sy.

About

Simple Document File Text Extraction Library for Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages