Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding Semantic Text Splitter Component (Text Splitters) #4254

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joaoguilhermeS
Copy link
Collaborator

Feat: Semantic Text Splitter with Advanced Threshold Controls

Overview
Introduces a new Semantic Text Splitter component that provides flexible text chunking with statistical threshold controls and regex support.

Features:

  • Multiple threshold control methods:
    • Percentile-based splitting
    • Standard deviation thresholds
    • Interquartile range
  • Configurable chunk size and count
  • Optional regex-based splitting

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request python Pull requests that update Python code labels Oct 23, 2024
@github-actions github-actions bot added enhancement New feature or request and removed enhancement New feature or request labels Oct 23, 2024
Copy link
Collaborator

@edwinjosechittilappilly edwinjosechittilappilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaoguilhermeS is langchain_experimental a part of the pyproject of langflow? if not @ogabrielluiz can we add it to the pyproject?
@joaoguilhermeS Can you confirm if it's already added by uv to the pyproject file?

Also, I suggest we add the beta as True in the component since it is a part of langchain_experimental

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python Pull requests that update Python code size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants