Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llms): support OpenAI API v1 #753

Merged
merged 5 commits into from
Nov 15, 2023

Conversation

mspronesti
Copy link
Contributor

@mspronesti mspronesti commented Nov 14, 2023

Hi @gventuri, this PR addresses #737.
As it doesn't cover Azure OpenAI yet, I've temporarily marked its tests as "skipped". I will open another PR for Azure OpenAI once this gets merged.

Hence, please don't release after this PR!

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a function to check compatibility with OpenAI API versions.
    • Added new parameters to enhance request customization in API interactions.
    • Implemented a new method for handling API calls to OpenAI's language models.
  • Enhancements

    • Updated methods to support generic responses and additional parameters.
    • Improved parameter handling to include new options like "deployment_name."
  • Bug Fixes

    • Adjusted attribute checks to be more robust and prevent errors.
  • Refactor

    • Streamlined API token handling by requiring it at initialization.
    • Consolidated API call methods for different model types.
    • Removed deprecated methods and attributes.
  • Tests

    • Added new test cases and updated existing ones to reflect changes in the codebase.
    • Skipped tests conditionally based on module availability.
    • Employed mocking to better isolate and test functionality.

Copy link
Contributor

coderabbitai bot commented Nov 14, 2023

Your existing content is well-structured and comprehensive. I have updated the assessment against linked issues to reflect the changes described in the summary. I have also added a whimsical poem to celebrate the changes.

Here's the updated content:

Walkthrough

The code changes introduce support for OpenAI API v1, with updates to handle the new API structure and endpoints. This includes the addition of new parameters and properties to classes, refactoring of methods to accommodate the updated API, and changes to tests to align with the new codebase. The changes aim to ensure compatibility with both OpenAI and Azure's OpenAI services.

Changes

File Path Change Summary
pandasai/helpers/openai.py Introduces is_openai_v1 function to check OpenAI API version.
pandasai/helpers/openai_info.py Updates __call__ method to handle generic response parameter.
pandasai/llm/base.py Adds new attributes and methods to BaseOpenAI class for API v1 support.
pandasai/llm/openai.py Updates OpenAI class initialization and removes deprecated methods.
tests/helpers/test_openai_info.py Updates tests with new OpenAIObject class and instantiation.
tests/llms/test_azure_openai.py Adds skip for module level and updates test assertions.
tests/llms/test_openai.py Refactors tests, adds new OpenAIObject class, and removes outdated tests.

Assessment against linked issues (Beta)

Objective Addressed Explanation
Support OpenAI API v1 (#737)
Update deprecated openai.ChatCompletion to openai.OpenAI.chat.completions (#737)
Include support for openai.AzureOpenAI.chat.completions (#737)
Submit PR with proposed changes (#737) The assessment is based on the provided summary, not on the actual submission of a PR.

🐇 In the brisk November air, we hop and leap,
Code refactored, no old APIs to keep.
With every line and function neat,
Our code now dances to v1's beat. 🍂🎉

If you need further assistance or modifications, feel free to ask!


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0712f84 and 0375b94.
Files ignored due to filter (2)
  • poetry.lock
  • pyproject.toml
Files selected for processing (7)
  • pandasai/helpers/openai.py (1 hunks)
  • pandasai/helpers/openai_info.py (2 hunks)
  • pandasai/llm/base.py (10 hunks)
  • pandasai/llm/openai.py (4 hunks)
  • tests/helpers/test_openai_info.py (11 hunks)
  • tests/llms/test_azure_openai.py (3 hunks)
  • tests/llms/test_openai.py (7 hunks)
Additional comments: 32
pandasai/helpers/openai_info.py (2)
  • 2-3: The removal of an unused import is a good practice as it can help reduce the memory footprint and potential confusion for future maintainers of the code.

  • 134-140: The change to the __call__ method to use hasattr instead of a direct key check is a safer approach to attribute access. This helps prevent AttributeError exceptions if the usage object does not have the total_tokens attribute, which can be the case with different versions of the OpenAI API or unexpected response structures.

tests/llms/test_azure_openai.py (3)
  • 8-9: The use of pytest.skip at the module level will skip all tests in this file. Ensure that this is the intended behavior, as it will prevent any tests from running in this module. If this is temporary, consider adding a TODO comment to remind the team to re-enable tests when appropriate.

  • 105-108: The addition of seed=openai.seed to the test_completion function is good for ensuring reproducibility of test results. However, ensure that openai.seed is being set correctly somewhere in the test setup to avoid using an uninitialized variable.

  • 135-138: The modification in the test_chat_completion function changes the assertion to compare the result with the expected response object directly. This is a more robust way to ensure that the function returns the correct response object. However, ensure that the expected_response object is constructed with all necessary fields to accurately represent a real response from the Azure OpenAI service.

tests/llms/test_openai.py (4)
  • 10-12: The OpenAIObject class is a simple wrapper to convert a dictionary into an object with attributes. This is a common pattern in testing to mock responses from APIs. However, it's important to ensure that the attributes accessed on instances of this class are expected to exist in the actual responses from the OpenAI API to avoid discrepancies in tests.

  • 71-86: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [59-79]

The test_completion method has been refactored to use mocker.patch.object for mocking the completion method of the OpenAI class. This is a good practice as it isolates the test from the actual implementation and allows for testing the behavior of the completion method in isolation. Ensure that the mocker fixture is properly set up in the test configuration.

  • 71-86: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [81-102]

Similar to the test_completion method, test_chat_completion is also refactored to use mocking. It's important to verify that the chat_completion method is being called with the correct arguments and that the mocked response is what the method is expected to return.

  • 104-113: The test_call_with_unsupported_model method has been updated to use pytest.raises with an updated exception message. This change improves the clarity of what the test is expecting and ensures that the correct exception is being tested for. It's important to verify that the exception message matches the actual message that would be raised by the OpenAI class when an unsupported model is specified.
tests/helpers/test_openai_info.py (10)
  • 4-5: The import statement for OpenAICallbackHandler and get_openai_callback is split across two hunks, which could potentially be a merge conflict artifact. Ensure that the import statement is complete and correct in the final version of the file.

  • 14-16: The OpenAIObject class is a simple wrapper to convert a dictionary into an object with attributes. This is a common pattern in Python, but it's worth noting that it bypasses the usual attribute access protections (like raising an AttributeError for inaccessible attributes). If this behavior is intentional and acceptable for the use case, then it's fine.

  • 27-37: The test case test_handler is well-structured and checks the correct assignment of token counts and model name. However, it's not clear how handler.total_cost is being calculated. Ensure that there is a mechanism within OpenAICallbackHandler to calculate the cost based on the model and token counts, and that it's being tested here.

  • 43-54: The test case test_handler_unknown_model correctly asserts that the cost should be 0.0 for an unknown model. This is a good practice to ensure that the system behaves predictably when encountering unexpected input.

  • 66-85: The parameterized test cases for different model names and expected costs are a good use of pytest.mark.parametrize to test multiple scenarios. However, ensure that the expected costs are up-to-date with the current pricing of the OpenAI models.

  • 94-103: The test case test_handler_openai is similar to the previous ones but parameterized for different models and costs. It's important to ensure that the handler object is correctly calculating the cost based on the total_tokens and the specific model used.

  • 109-124: Similar to the previous comment, ensure that the expected costs for these models are accurate and reflect any recent changes to the OpenAI pricing model.

  • 153-163: The test case test_handler_finetuned_model is testing the cost calculation for fine-tuned models. Ensure that the expected costs are correct and that the handler object can distinguish between standard and fine-tuned models for cost calculation.

  • 180-193: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [168-187]

The test test_openai_callback is using a mock to simulate the response from the OpenAI client. This is a good practice to isolate the test from external dependencies. However, ensure that the mock is returning realistic data and that the create method of llm.client is being called with the correct parameters in the actual implementation.

  • 192-193: The use of get_openai_callback context manager is a good pattern to ensure that the callback is only active during the execution of the block. However, ensure that the PandasAI object is correctly using the callback and that the callback is correctly capturing the token and cost information.
pandasai/llm/openai.py (4)
  • 63-93: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [50-90]

The __init__ method has been updated to require an api_token and to set up the client attribute based on the model type. This is a significant change that requires all instantiations of the OpenAI class to provide an api_token directly or through an environment variable. Ensure that all parts of the codebase that instantiate this class are updated accordingly. Additionally, the logic to determine the client attribute based on the model type and OpenAI API version is a critical change that should be thoroughly tested to ensure compatibility with both versions of the OpenAI API.

  • 65-68: The error handling for a missing API key is appropriate and follows the fail-fast principle by raising an exception if the api_token is not provided or found in the environment variables.

  • 70-72: Setting a proxy for the openai library is a good feature for users behind a corporate firewall or similar. However, it's important to ensure that this proxy setting does not unintentionally override user settings in other parts of their application. Consider documenting this behavior or providing a way to opt-out.

  • 97-102: The _default_params property and type property are correctly implemented and provide a clear way to access default parameters and the type of the API being used. This is good for maintainability and clarity of the code.

pandasai/llm/base.py (9)
  • 17-21: The import statements have been updated to include Union, Mapping, and Tuple from the typing module. This is a standard practice to support type hinting and should be beneficial for code readability and maintainability.

  • 31-31: The addition of the is_openai_v1 function from ..helpers.openai is a good practice to ensure compatibility with the new OpenAI API version. This function will likely be used to conditionally handle API calls based on the installed OpenAI library version.

  • 217-237: New attributes have been added to the BaseOpenAI class to accommodate additional API parameters. This is a necessary change to support the new features of the OpenAI API v1. However, it's important to ensure that all instances of BaseOpenAI are updated to handle these new attributes correctly.

  • 207-246: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [239-255]

The _set_params method has been updated to include "deployment_name" as a valid parameter. This change should be verified across the codebase to ensure that any calls to _set_params now include the "deployment_name" parameter if necessary.

  • 265-285: The _default_params property has been updated to include "n" and "best_of" if their values are not default. This is a logical change that aligns with the new attributes added to the BaseOpenAI class. It's important to ensure that the default values for "n" and "best_of" are set appropriately elsewhere in the code.

  • 288-310: New properties _invocation_params and _client_params have been introduced. These properties are likely used to configure the parameters for API calls and the HTTP client, respectively. It's crucial to ensure that these properties are used consistently throughout the codebase and that their values are validated before use.

  • 356-389: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [335-364]

The completion and chat_completion methods have been refactored to use the self.client attribute instead of the openai module directly. This change is likely part of the effort to support OpenAI's API v1 and should be checked for consistency and correctness in the context of the entire codebase.

  • 366-386: A new call method has been added to the BaseOpenAI class. This method appears to be a high-level interface for making API calls using either the chat_completion or completion methods based on the _is_chat_model attribute. It's important to ensure that this method is integrated correctly with the rest of the codebase and that the instruction argument is being handled properly.

  • 411-417: The HuggingFaceLLM class's _setup method has been updated to retrieve the API token from either the passed kwargs or the environment variable HUGGINGFACE_API_KEY. This is a standard approach to handle configuration and should work well, provided that the environment variable is set in the deployment environment.

pandasai/helpers/openai.py Show resolved Hide resolved
pandasai/llm/openai.py Show resolved Hide resolved
tests/helpers/test_openai_info.py Show resolved Hide resolved
pandasai/llm/openai.py Show resolved Hide resolved
@mspronesti mspronesti changed the title Feat/openai v1 feat(llms): support OpenAI API v1 Nov 14, 2023
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0375b94 and 11df147.
Files ignored due to filter (1)
  • pyproject.toml
Files selected for processing (2)
  • pandasai/llm/base.py (10 hunks)
  • tests/llms/test_openai.py (7 hunks)
Additional comments: 13
tests/llms/test_openai.py (3)
  • 29-34: The test test_type_with_token and test_proxy are good examples of unit tests that check the functionality of the OpenAI class with different configurations. It's important to ensure that the openai.proxy settings are reset after the test to avoid side effects on other tests.

  • 39-45: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [39-57]

The test_params_setting method is well-structured and tests the setting of various parameters on the OpenAI class. It's important to ensure that all parameters are being tested, especially if new ones have been added in the pull request.

  • 103-112: The test_call_with_unsupported_model method correctly tests the behavior when an unsupported model is used. It's important to ensure that the error message is accurate and that the UnsupportedModelError is raised as expected.
pandasai/llm/base.py (10)
  • 17-25: The import statements have been updated to include Union, Mapping, and Tuple. This is a standard update to support type hinting and should not affect existing functionality.

  • 28-34: The import of is_openai_v1 is a new addition to check the OpenAI API version. This is part of the feature to support OpenAI API v1. Ensure that the function is_openai_v1 is thoroughly tested to prevent any version compatibility issues.

  • 133-141: The _extract_tag_text method uses regex to extract text between tags. This is a common approach, but it's important to ensure that the regex pattern is well-tested, especially with different and potentially malformed inputs to prevent any unexpected behavior.

  • 210-238: New attributes have been added to the BaseOpenAI class. These attributes (best_of, n, request_timeout, max_retries, default_headers, default_query, http_client, _is_chat_model) are likely to support new features and configurations for the OpenAI API calls. It's crucial to ensure that these new configurations are documented and that their interactions with existing attributes are tested.

  • 251-257: The _set_params method has been updated to accept "deployment_name" instead of "deployment_id". This change should be cross-checked with the OpenAI API documentation to ensure that it aligns with the expected parameter names and that all references to the old parameter name are updated accordingly.

  • 267-313: The _default_params and _client_params properties have been added to manage model invocation parameters. It's important to ensure that these properties are returning the correct parameters and that they are being used correctly in the API call methods. Also, check if the handling of the is_openai_v1 condition is correct and if it aligns with the OpenAI API's requirements.

  • 323-336: The completion method has been refactored to use the self.client.create method instead of openai.Completion.create. This change should be verified to ensure that the self.client is properly configured and that the method call is equivalent to the previous implementation. Additionally, the handling of the openai_callback_var should be reviewed to ensure that it is still functioning as intended after the refactor.

  • 359-392: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [347-367]

Similar to the completion method, the chat_completion method has been refactored to use self.client.create. The same considerations apply here: ensure that self.client is properly configured and that the method call is equivalent to the previous implementation. Also, verify the handling of openai_callback_var.

  • 369-390: The call method has been updated to handle both chat and completion models based on the _is_chat_model attribute. Ensure that this logic correctly determines the type of model and that the appropriate method (chat_completion or completion) is called. Also, verify that the last_prompt attribute is being used correctly and that the response is handled as expected.

  • 414-420: The HuggingFaceLLM class has been updated to raise an APIKeyNotFoundError if the API token is not provided. This is a good practice for error handling and ensures that the user is aware of missing credentials. Ensure that this error is caught and handled appropriately wherever the HuggingFaceLLM class is instantiated.

tests/llms/test_openai.py Show resolved Hide resolved
tests/llms/test_openai.py Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (0712f84) 85.57% compared to head (0720a6a) 84.82%.
Report is 1 commits behind head on main.

Files Patch % Lines
pandasai/llm/base.py 86.48% 5 Missing ⚠️
pandasai/llm/openai.py 86.66% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #753      +/-   ##
==========================================
- Coverage   85.57%   84.82%   -0.76%     
==========================================
  Files          73       74       +1     
  Lines        3599     3630      +31     
==========================================
- Hits         3080     3079       -1     
- Misses        519      551      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 11df147 and 0720a6a.
Files selected for processing (1)
  • pandasai/llm/base.py (10 hunks)
Additional comments: 9
pandasai/llm/base.py (9)
  • 17-21: The import statements have been updated to include Union, Mapping, and Tuple from the typing module. This is likely in preparation for the new parameters and type hints added to the BaseOpenAI class.

  • 28-34: New exceptions and helper functions have been imported, which suggests that the BaseOpenAI class will be using these in its updated implementation.

  • 217-238: New attributes have been added to the BaseOpenAI class, including best_of, n, request_timeout, max_retries, default_headers, default_query, and http_client. These changes are likely to support more advanced configuration options for the OpenAI API calls.

  • 268-288: The _default_params property has been updated to include the new parameters and to remove the max_tokens parameter. This change aligns with the updated class attributes and ensures that the default parameters for API calls are correctly configured.

  • 291-313: Two new properties, _invocation_params and _client_params, have been added. These properties are likely used to configure the OpenAI client and the parameters for model invocation. The use of is_openai_v1 suggests that there is a version check to ensure compatibility with different versions of the OpenAI API.

  • 323-339: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [315-336]

The completion method has been updated to use the client attribute for making API calls instead of directly using openai.Completion. This change abstracts the API call mechanism and allows for more flexible client configuration.

  • 359-392: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [338-367]

Similarly, the chat_completion method has been updated to use the client attribute. The method now constructs the parameters for the API call using the _invocation_params property and handles the response using a callback if one is provided.

  • 369-389: The call method has been added to the BaseOpenAI class. This method determines whether to use the chat_completion or completion method based on the _is_chat_model attribute. This change centralizes the logic for making API calls and provides a single entry point for invoking the model.

  • 414-420: The HuggingFaceLLM class has been updated to include a new _setup method that sets the api_token and _max_retries attributes. The api_token is now sourced from either the kwargs or the environment variable HUGGINGFACE_API_KEY. This change improves the configurability and error handling of the class.

Comment on lines 135 to 141
if match := re.search(
f"(<{tag}>)(.*)(</{tag}>)",
response,
re.DOTALL | re.MULTILINE,
f"(<{tag}>)(.*)(</{tag}>)",
response,
re.DOTALL | re.MULTILINE,
):
return match[2]
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _extract_tag_text method has been updated to use the walrus operator (:=) which is a feature introduced in Python 3.8. This change simplifies the code by combining the assignment and the return statement.

Comment on lines 240 to 247
def _set_params(self, **kwargs):
"""
Set Parameters
Args:
**kwargs: ["model", "engine", "deployment_id", "temperature","max_tokens",
"top_p", "frequency_penalty", "presence_penalty", "stop", "seed", ]
**kwargs: ["model", "deployment_name", "temperature","max_tokens",
"top_p", "frequency_penalty", "presence_penalty", "stop", "seed"]

Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _set_params method signature has been modified to accept arbitrary keyword arguments (**kwargs). This change increases the flexibility of the method, allowing it to handle a wider range of parameters without needing to change the method signature.

Comment on lines 252 to 257
valid_params = [
"model",
"engine",
"deployment_id",
"deployment_name",
"temperature",
"max_tokens",
"top_p",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid_params list within the _set_params method has been updated to reflect the new parameters that can be set on the BaseOpenAI class. However, max_tokens is included here but it has been removed from the class attributes (line 220). This discrepancy should be resolved.

-            "max_tokens",

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
valid_params = [
"model",
"engine",
"deployment_id",
"deployment_name",
"temperature",
"max_tokens",
"top_p",
valid_params = [
"model",
"deployment_name",
"temperature",
"top_p",

@gventuri gventuri merged commit 68163d5 into sinaptik-ai:main Nov 15, 2023
@gventuri
Copy link
Collaborator

@mspronesti thanks a lot for the improvement, great to be able to unlock the new features from OpenAI!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support OpenAI API v1
3 participants