Reconstruct chunking to be on tokenized documents level #18

alisafaya · 2020-03-16T12:13:05Z

Chunking is currently performed on untokenized text, and because of this the same documents are tokenized twice. If chunking is to be done after texts are tokenized. That would provide more efficiency to the pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstruct chunking to be on tokenized documents level #18

Reconstruct chunking to be on tokenized documents level #18

alisafaya commented Mar 16, 2020

Reconstruct chunking to be on tokenized documents level #18

Reconstruct chunking to be on tokenized documents level #18

Comments

alisafaya commented Mar 16, 2020