Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community]: Ability to Passing a Tokenizer directly to TokenTextSplitter and updating from_tiktoken_encoder, from_huggingface_tokenizer #27036

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

keenborder786
Copy link
Contributor

@keenborder786 keenborder786 commented Oct 1, 2024

  • Description: TokenTextSplitter does not allow to pass a tokenizer instance directly which have been addressed in this PR. Also from_huggingface_tokenizer and from_tiktoken_encoder class methods of TextSplitter seem to be be creating a function for length_function which are no way being used by TokenTextSplitter which has also being fixed in this PR by using the tokenizer rather than length_function in both from_huggingface_tokenizer and from_tiktoken_encoder.

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Oct 1, 2024
Copy link

vercel bot commented Oct 1, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Oct 2, 2024 0:13am

@dosubot dosubot bot added the Ɑ: text splitters Related to text splitters package label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: text splitters Related to text splitters package
Projects
Status: Triage
Development

Successfully merging this pull request may close these issues.

1 participant