Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve clusters and several bug fixes #1141

Merged
merged 6 commits into from
Jan 29, 2024
Merged

Improve clusters and several bug fixes #1141

merged 6 commits into from
Jan 29, 2024

Conversation

dsmilkov
Copy link
Collaborator

@dsmilkov dsmilkov commented Jan 29, 2024

https://huggingface.co/spaces/lilacai/daniel_staging

Clustering (backend):

  • Lower min_cluster_size to 5 for the categories, so we can have more coherent categories
  • Add timeout of 7 sec (99%-tile response latency for OpenAI is like 3-4 sec) to avoid the 10min timeout. We can now title clusters of 1M docs (11k clusters) in 8mins.
  • Disable internal OpenAI retries (we used to have double retries)
  • Replace "request" with "snippet" in the prompt to avoid biasing towards user's requests -- improves forum/email/text clustering

UI

  • make the histograms reactive to the currently selected group in "group by"
  • make pivot reactive to searches (e.g. keyword search, metadata search)
  • remember the schema and nav bar state when flipping between cluster view and item view
  • Fix a bug with search box state, after page refresh

lilac/data/clustering.py Show resolved Hide resolved
lilac/data/clustering.py Show resolved Hide resolved
@dsmilkov dsmilkov enabled auto-merge (squash) January 29, 2024 16:57
@dsmilkov dsmilkov merged commit 4219c42 into main Jan 29, 2024
4 checks passed
@dsmilkov dsmilkov deleted the ds-fix-ui branch January 29, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants