-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(wren-ai-service): column pruning step in retrieval pipeline and correct code for sql generation in evaluation #1225
Conversation
WalkthroughThe pull request introduces modifications to the AI service's pipeline classes, focusing on enhancing SQL generation and retrieval processes. The changes primarily involve updating the Changes
Sequence DiagramsequenceDiagram
participant Retrieval
participant GenerationPipeline
participant AskPipeline
Retrieval->>GenerationPipeline: Pass SQL generation reasoning
Retrieval->>AskPipeline: Pass SQL generation reasoning
GenerationPipeline->>GenerationPipeline: Process with reasoning
AskPipeline->>AskPipeline: Process with reasoning
Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
wren-ai-service/eval/pipelines.py (2)
184-186
: Add a clarifying comment.
The newly introduced_table_description_indexing
variable is self-explanatory, but consider adding a brief docstring or inline comment explaining its role in this pipeline for future maintainers.
253-253
: Validate default for missing reasoning.
Usingprediction.get("reasoning", "")
silently defaults to an empty string. If this information is critical, consider alerting or logging a warning when it's missing, rather than defaulting to an empty value.wren-ai-service/src/pipelines/retrieval/retrieval.py (1)
419-421
: Safeguard against fragile model name checks.
Relying on substring matching ("gpt-4o"
or"gpt-4o-mini"
) may break if model names evolve. Consider mapping known model names to the chosen encoding in a config or dictionary for better maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wren-ai-service/eval/pipelines.py
(3 hunks)wren-ai-service/src/pipelines/retrieval/retrieval.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
wren-ai-service/eval/pipelines.py (2)
344-344
: Same missing reasoning concern as above.
This echoes the comment on line 253. Make sure the default empty string is intentional and won't mask missing data.
180-180
: Ensure renamed variable is used uniformly.
Since_indexing
was replaced by_db_schema_indexing
, please confirm that all upstream and downstream references now point to_db_schema_indexing
to avoid runtime errors.✅ Verification successful
Variable rename verification successful
The variable_db_schema_indexing
is used consistently throughout the codebase with no remaining references to the old variable name_indexing
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify references to _db_schema_indexing throughout the codebase rg -A 5 '_db_schema_indexing'Length of output: 2108
Script:
#!/bin/bash # Search for any remaining instances of _indexing rg -A 5 '_indexing' wren-ai-service/eval/pipelines.pyLength of output: 1330
wren-ai-service/src/pipelines/retrieval/retrieval.py (1)
271-271
: Dropping the second return value.
You are now exclusively capturing the DDL portion ([0]
) frombuild_table_ddl
. If the boolean flag is needed elsewhere (e.g., to detect calculated fields), confirm it’s handled correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
wren-launcher/utils/docker.go (1)
162-165
: Consider using a dedicated YAML parser to set the property accurately.By using
strings.ReplaceAll()
, there's a risk of partial or multiple unintended replacements if the target string appears more than once or in a slightly different context. A more robust approach is to parse the YAML, update the specific property programmatically, and then re-serialize it, preventing accidental changes and improving maintainability.Example diff using a YAML library (pseudo-code):
+import ( + "gopkg.in/yaml.v3" +) // ... - config = strings.ReplaceAll(config, "allow_using_db_schemas_without_pruning: false", "allow_using_db_schemas_without_pruning: true") +var parsedCfg map[string]interface{} +err = yaml.Unmarshal([]byte(config), &parsedCfg) +if err != nil { + return err +} +parsedCfg["allow_using_db_schemas_without_pruning"] = true +newCfgBytes, err := yaml.Marshal(parsedCfg) +if err != nil { + return err +} +config = string(newCfgBytes)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
wren-ai-service/src/pipelines/retrieval/retrieval.py
(1 hunks)wren-launcher/utils/docker.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- wren-ai-service/src/pipelines/retrieval/retrieval.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This PR introduces several improvements to the retrieval and generation pipelines:
Changes
Pre-indexing Pipeline Enhancement
_indexing
to_db_schema_indexing
for better claritySQL Generation Updates
sql_generation_reasoning
field to both Generation and Ask pipelines for EvaluationDB Schema Processing Optimization
allow_using_db_schemas_without_pruning
flagSummary by CodeRabbit
New Features
Refactor
Bug Fixes