Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(LLMJudge): adding set of integration test where OpanAI is used as LLM Judge #1546

Merged
merged 19 commits into from
Jan 24, 2025

Conversation

scaliseraoul
Copy link

@scaliseraoul scaliseraoul commented Jan 23, 2025


Important

Add integration tests using OpenAI as LLM Judge and update dependencies.

  • Tests:
    • Add test_agent_chat.py for integration tests using OpenAI as LLM Judge.
    • Add test_agent_llm_judge.py for evaluating code snippets with OpenAI.
    • Tests cover numeric, loan-related, heart stroke-related, and combined questions.
  • Dependencies:
    • Add openai to pyproject.toml dependencies.
  • Misc:
    • Minor error message change in execute_query() in loader.py.

This description was created by Ellipsis for 55bcfc3. It will automatically update as commits are pushed.

scaliseraoul and others added 16 commits January 20, 2025 12:09
…into release/v3

* 'release/v3' of https://github.com/Sinaptik-AI/pandas-ai:
  fix(VirtualDataframe): fixing virtual dataframe name conflict (Sinaptik-AI#1531)
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
* commit '4db35d2b278559eef8b6d5ad15bdc1d8a3936855':
  cd: fix poetry version
  cd: use python 3.10
  Release v3.0.0-beta.0
  docs: udpated gif in readme 3.0 (Sinaptik-AI#1536)
* commit '7967d07f90866edc71f12fdac61761d0946712f8':
  fix: dependency issue and updated lock files (Sinaptik-AI#1539)
Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
… into fix/SIN-310

* 'fix/SIN-310' of https://github.com/scaliseraoul/pandas-ai:
  Update pandasai/data_loader/loader.py
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jan 23, 2025
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 55bcfc3 in 1 minute and 59 seconds

More details
  • Looked at 473 lines of code in 4 files
  • Skipped 1 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. tests/unit_tests/agent/test_agent_chat.py:4
  • Draft comment:
from typing import List, Tuple, Union
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The code is using Python 3.10+ type union syntax with the | operator and imports UnionType from types module. There's no usage of typing.Union in the code. The current imports are correct for the code's needs. Adding Union would be unnecessary and potentially confusing since it's not being used.
    Could there be a reason to prefer typing.Union over the | operator for compatibility or readability reasons?
    No - the | operator is the modern, preferred syntax in Python 3.10+ and is more readable. The code is clearly using this modern syntax consistently.
    The comment should be deleted as the current imports are correct for how unions are being used in the code.
2. tests/unit_tests/agent/test_agent_llm_judge.py:147
  • Draft comment:
        """Test multiple heart stroke-related questions."""
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test method test_heart_strokes_questions has a docstring that incorrectly mentions 'loan-related questions' instead of 'heart stroke-related questions'. This is a copy-paste error and should be corrected for clarity.
3. pandasai/data_loader/loader.py:173
  • Draft comment:
                f"Failed to execute query for source type '{source_type}' with query: {formatted_query}"
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The original error message is slightly more verbose but both versions communicate the same information effectively. The new version is more concise without losing clarity. The change doesn't impact functionality or comprehension. This seems like a purely stylistic preference.
    The original message might be marginally more explicit by including the words "source type" and "query". Could this extra clarity be important for debugging?
    The new message still clearly indicates what failed (query execution), what the source type was, and includes the full query. The removed words don't add meaningful clarity.
    This comment should be deleted as it suggests reverting a minor stylistic change that doesn't impact functionality or clarity of the error message.
4. tests/unit_tests/agent/test_agent_chat.py:233
  • Draft comment:
        Test heart stroke related questions to ensure the response types match the expected ones.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test file test_agent_chat.py has a typo in the docstring for test_heart_strokes_questions_type. It should be corrected for clarity.
5. tests/unit_tests/agent/test_agent_llm_judge.py:147
  • Draft comment:
        Test multiple heart stroke-related questions.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The test file test_agent_llm_judge.py has a typo in the docstring for test_heart_strokes_questions. It should be corrected for clarity.

Workflow ID: wflow_7fPe5Kw1Ui01BqyD


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@gventuri gventuri requested a review from ArslanSaleem January 24, 2025 10:09
@gventuri gventuri merged commit d42fafd into Sinaptik-AI:release/v3 Jan 24, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants