feature(LLMJudge): adding set of integration test where OpanAI is used as LLM Judge #1546

scaliseraoul · 2025-01-23T17:30:09Z

Tests added and passed if fixing a bug or adding a new feature.
All code checks passed.

Important

Add integration tests using OpenAI as LLM Judge and update dependencies.

Tests:
- Add test_agent_chat.py for integration tests using OpenAI as LLM Judge.
- Add test_agent_llm_judge.py for evaluating code snippets with OpenAI.
- Tests cover numeric, loan-related, heart stroke-related, and combined questions.
Dependencies:
- Add openai to pyproject.toml dependencies.
Misc:
- Minor error message change in execute_query() in loader.py.

^{This description was created by}^{for 55bcfc3. It will automatically update as commits are pushed.}

…into release/v3 * 'release/v3' of https://github.com/Sinaptik-AI/pandas-ai: fix(VirtualDataframe): fixing virtual dataframe name conflict (Sinaptik-AI#1531)

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

* commit '4db35d2b278559eef8b6d5ad15bdc1d8a3936855': cd: fix poetry version cd: use python 3.10 Release v3.0.0-beta.0 docs: udpated gif in readme 3.0 (Sinaptik-AI#1536)

* commit '7967d07f90866edc71f12fdac61761d0946712f8': fix: dependency issue and updated lock files (Sinaptik-AI#1539)

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

… into fix/SIN-310 * 'fix/SIN-310' of https://github.com/scaliseraoul/pandas-ai: Update pandasai/data_loader/loader.py

…d as LLM Judge

ellipsis-dev

👍 Looks good to me! Reviewed everything up to 55bcfc3 in 1 minute and 59 seconds

More details

Looked at 473 lines of code in 4 files
Skipped 1 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. tests/unit_tests/agent/test_agent_chat.py:4

Draft comment:

from typing import List, Tuple, Union

Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The code is using Python 3.10+ type union syntax with the | operator and imports UnionType from types module. There's no usage of typing.Union in the code. The current imports are correct for the code's needs. Adding Union would be unnecessary and potentially confusing since it's not being used.
Could there be a reason to prefer typing.Union over the | operator for compatibility or readability reasons?
No - the | operator is the modern, preferred syntax in Python 3.10+ and is more readable. The code is clearly using this modern syntax consistently.
The comment should be deleted as the current imports are correct for how unions are being used in the code.

2. tests/unit_tests/agent/test_agent_llm_judge.py:147

Draft comment:

        """Test multiple heart stroke-related questions."""

Reason this comment was not posted:
Confidence changes required: 50%
The test method test_heart_strokes_questions has a docstring that incorrectly mentions 'loan-related questions' instead of 'heart stroke-related questions'. This is a copy-paste error and should be corrected for clarity.

3. pandasai/data_loader/loader.py:173

Draft comment:

                f"Failed to execute query for source type '{source_type}' with query: {formatted_query}"

Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The original error message is slightly more verbose but both versions communicate the same information effectively. The new version is more concise without losing clarity. The change doesn't impact functionality or comprehension. This seems like a purely stylistic preference.
The original message might be marginally more explicit by including the words "source type" and "query". Could this extra clarity be important for debugging?
The new message still clearly indicates what failed (query execution), what the source type was, and includes the full query. The removed words don't add meaningful clarity.
This comment should be deleted as it suggests reverting a minor stylistic change that doesn't impact functionality or clarity of the error message.

4. tests/unit_tests/agent/test_agent_chat.py:233

Draft comment:

        Test heart stroke related questions to ensure the response types match the expected ones.

Reason this comment was not posted:
Confidence changes required: 50%
The test file test_agent_chat.py has a typo in the docstring for test_heart_strokes_questions_type. It should be corrected for clarity.

5. tests/unit_tests/agent/test_agent_llm_judge.py:147

Draft comment:

        Test multiple heart stroke-related questions.

Reason this comment was not posted:
Confidence changes required: 50%
The test file test_agent_llm_judge.py has a typo in the docstring for test_heart_strokes_questions. It should be corrected for clarity.

Workflow ID: wflow_7fPe5Kw1Ui01BqyD

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

tests/unit_tests/agent/test_agent_chat.py

scaliseraoul and others added 16 commits January 20, 2025 12:09

fix(VirtualDataframe): fixing virtual dataframe name conflict

0b838f3

feature(View): enabling view in SemanticLayerSchema

65dc12a

Merge branch 'release/v3' of https://github.com/Sinaptik-AI/pandas-ai …

cf142de

…into release/v3 * 'release/v3' of https://github.com/Sinaptik-AI/pandas-ai: fix(VirtualDataframe): fixing virtual dataframe name conflict (Sinaptik-AI#1531)

Update pandasai/data_loader/semantic_layer_schema.py

57b8900

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Merge branch 'release/v3' into release/v3

5351e46

feature(View): enabling view in loader

3575417

Merge commit '4db35d2b278559eef8b6d5ad15bdc1d8a3936855' into release/v3

8250bdd

* commit '4db35d2b278559eef8b6d5ad15bdc1d8a3936855': cd: fix poetry version cd: use python 3.10 Release v3.0.0-beta.0 docs: udpated gif in readme 3.0 (Sinaptik-AI#1536)

Merge commit '7967d07f90866edc71f12fdac61761d0946712f8' into release/v3

e2ba54b

* commit '7967d07f90866edc71f12fdac61761d0946712f8': fix: dependency issue and updated lock files (Sinaptik-AI#1539)

feature(View): adding integration tests

db0acf0

feature(ChatTest): adding first set of numeric questions

4fcac45

feature(ChatTest): adding set of questions for loans and heart csv

1b20e4c

Update pandasai/data_loader/loader.py

dc63131

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

fix(ChatTest): typo in heart_stroke_path

c766564

Merge branch 'fix/SIN-310' of https://github.com/scaliseraoul/pandas-ai…

223502e

… into fix/SIN-310 * 'fix/SIN-310' of https://github.com/scaliseraoul/pandas-ai: Update pandasai/data_loader/loader.py

Merge branch 'release/v3' into fix/SIN-310

7c20370

feature(LLMJudge): adding set of integration test where OpanAI is use…

55bcfc3

…d as LLM Judge

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 23, 2025

Merge branch 'release/v3' into fix/SIN-310

ee6f14a

ellipsis-dev bot reviewed Jan 23, 2025

View reviewed changes

gventuri requested a review from ArslanSaleem January 24, 2025 10:09

ArslanSaleem reviewed Jan 24, 2025

View reviewed changes

tests/unit_tests/agent/test_agent_chat.py Outdated Show resolved Hide resolved

scaliseraoul added 2 commits January 24, 2025 15:46

fix(LLMJudge): using root path function, global average result

d1d3379

fix: sanitize name of dataset passed as param in pai.create

2040426

gventuri merged commit d42fafd into Sinaptik-AI:release/v3 Jan 24, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(LLMJudge): adding set of integration test where OpanAI is used as LLM Judge #1546

feature(LLMJudge): adding set of integration test where OpanAI is used as LLM Judge #1546

scaliseraoul commented Jan 23, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

feature(LLMJudge): adding set of integration test where OpanAI is used as LLM Judge #1546

feature(LLMJudge): adding set of integration test where OpanAI is used as LLM Judge #1546

Conversation

scaliseraoul commented Jan 23, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

scaliseraoul commented Jan 23, 2025 •

edited by ellipsis-dev bot

Loading