-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(llms): support OpenAI API v1 #753
Conversation
Your existing content is well-structured and comprehensive. I have updated the assessment against linked issues to reflect the changes described in the summary. I have also added a whimsical poem to celebrate the changes. Here's the updated content: WalkthroughThe code changes introduce support for OpenAI API v1, with updates to handle the new API structure and endpoints. This includes the addition of new parameters and properties to classes, refactoring of methods to accommodate the updated API, and changes to tests to align with the new codebase. The changes aim to ensure compatibility with both OpenAI and Azure's OpenAI services. Changes
Assessment against linked issues (Beta)
If you need further assistance or modifications, feel free to ask! TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Configuration used: CodeRabbit UI
Files ignored due to filter (2)
- poetry.lock
- pyproject.toml
Files selected for processing (7)
- pandasai/helpers/openai.py (1 hunks)
- pandasai/helpers/openai_info.py (2 hunks)
- pandasai/llm/base.py (10 hunks)
- pandasai/llm/openai.py (4 hunks)
- tests/helpers/test_openai_info.py (11 hunks)
- tests/llms/test_azure_openai.py (3 hunks)
- tests/llms/test_openai.py (7 hunks)
Additional comments: 32
pandasai/helpers/openai_info.py (2)
2-3: The removal of an unused import is a good practice as it can help reduce the memory footprint and potential confusion for future maintainers of the code.
134-140: The change to the
__call__
method to usehasattr
instead of a direct key check is a safer approach to attribute access. This helps preventAttributeError
exceptions if theusage
object does not have thetotal_tokens
attribute, which can be the case with different versions of the OpenAI API or unexpected response structures.tests/llms/test_azure_openai.py (3)
8-9: The use of
pytest.skip
at the module level will skip all tests in this file. Ensure that this is the intended behavior, as it will prevent any tests from running in this module. If this is temporary, consider adding a TODO comment to remind the team to re-enable tests when appropriate.105-108: The addition of
seed=openai.seed
to thetest_completion
function is good for ensuring reproducibility of test results. However, ensure thatopenai.seed
is being set correctly somewhere in the test setup to avoid using an uninitialized variable.135-138: The modification in the
test_chat_completion
function changes the assertion to compare the result with the expected response object directly. This is a more robust way to ensure that the function returns the correct response object. However, ensure that theexpected_response
object is constructed with all necessary fields to accurately represent a real response from the Azure OpenAI service.tests/llms/test_openai.py (4)
10-12: The
OpenAIObject
class is a simple wrapper to convert a dictionary into an object with attributes. This is a common pattern in testing to mock responses from APIs. However, it's important to ensure that the attributes accessed on instances of this class are expected to exist in the actual responses from the OpenAI API to avoid discrepancies in tests.71-86: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [59-79]
The
test_completion
method has been refactored to usemocker.patch.object
for mocking thecompletion
method of theOpenAI
class. This is a good practice as it isolates the test from the actual implementation and allows for testing the behavior of thecompletion
method in isolation. Ensure that themocker
fixture is properly set up in the test configuration.
- 71-86: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [81-102]
Similar to the
test_completion
method,test_chat_completion
is also refactored to use mocking. It's important to verify that thechat_completion
method is being called with the correct arguments and that the mocked response is what the method is expected to return.
- 104-113: The
test_call_with_unsupported_model
method has been updated to usepytest.raises
with an updated exception message. This change improves the clarity of what the test is expecting and ensures that the correct exception is being tested for. It's important to verify that the exception message matches the actual message that would be raised by theOpenAI
class when an unsupported model is specified.tests/helpers/test_openai_info.py (10)
4-5: The import statement for
OpenAICallbackHandler
andget_openai_callback
is split across two hunks, which could potentially be a merge conflict artifact. Ensure that the import statement is complete and correct in the final version of the file.14-16: The
OpenAIObject
class is a simple wrapper to convert a dictionary into an object with attributes. This is a common pattern in Python, but it's worth noting that it bypasses the usual attribute access protections (like raising an AttributeError for inaccessible attributes). If this behavior is intentional and acceptable for the use case, then it's fine.27-37: The test case
test_handler
is well-structured and checks the correct assignment of token counts and model name. However, it's not clear howhandler.total_cost
is being calculated. Ensure that there is a mechanism withinOpenAICallbackHandler
to calculate the cost based on the model and token counts, and that it's being tested here.43-54: The test case
test_handler_unknown_model
correctly asserts that the cost should be0.0
for an unknown model. This is a good practice to ensure that the system behaves predictably when encountering unexpected input.66-85: The parameterized test cases for different model names and expected costs are a good use of
pytest.mark.parametrize
to test multiple scenarios. However, ensure that the expected costs are up-to-date with the current pricing of the OpenAI models.94-103: The test case
test_handler_openai
is similar to the previous ones but parameterized for different models and costs. It's important to ensure that thehandler
object is correctly calculating the cost based on thetotal_tokens
and the specific model used.109-124: Similar to the previous comment, ensure that the expected costs for these models are accurate and reflect any recent changes to the OpenAI pricing model.
153-163: The test case
test_handler_finetuned_model
is testing the cost calculation for fine-tuned models. Ensure that the expected costs are correct and that thehandler
object can distinguish between standard and fine-tuned models for cost calculation.180-193: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [168-187]
The test
test_openai_callback
is using a mock to simulate the response from the OpenAI client. This is a good practice to isolate the test from external dependencies. However, ensure that the mock is returning realistic data and that thecreate
method ofllm.client
is being called with the correct parameters in the actual implementation.
- 192-193: The use of
get_openai_callback
context manager is a good pattern to ensure that the callback is only active during the execution of the block. However, ensure that thePandasAI
object is correctly using the callback and that the callback is correctly capturing the token and cost information.pandasai/llm/openai.py (4)
- 63-93: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [50-90]
The
__init__
method has been updated to require anapi_token
and to set up theclient
attribute based on the model type. This is a significant change that requires all instantiations of theOpenAI
class to provide anapi_token
directly or through an environment variable. Ensure that all parts of the codebase that instantiate this class are updated accordingly. Additionally, the logic to determine theclient
attribute based on the model type and OpenAI API version is a critical change that should be thoroughly tested to ensure compatibility with both versions of the OpenAI API.
65-68: The error handling for a missing API key is appropriate and follows the fail-fast principle by raising an exception if the
api_token
is not provided or found in the environment variables.70-72: Setting a proxy for the
openai
library is a good feature for users behind a corporate firewall or similar. However, it's important to ensure that this proxy setting does not unintentionally override user settings in other parts of their application. Consider documenting this behavior or providing a way to opt-out.97-102: The
_default_params
property andtype
property are correctly implemented and provide a clear way to access default parameters and the type of the API being used. This is good for maintainability and clarity of the code.pandasai/llm/base.py (9)
17-21: The import statements have been updated to include
Union
,Mapping
, andTuple
from thetyping
module. This is a standard practice to support type hinting and should be beneficial for code readability and maintainability.31-31: The addition of the
is_openai_v1
function from..helpers.openai
is a good practice to ensure compatibility with the new OpenAI API version. This function will likely be used to conditionally handle API calls based on the installed OpenAI library version.217-237: New attributes have been added to the
BaseOpenAI
class to accommodate additional API parameters. This is a necessary change to support the new features of the OpenAI API v1. However, it's important to ensure that all instances ofBaseOpenAI
are updated to handle these new attributes correctly.207-246: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [239-255]
The
_set_params
method has been updated to include "deployment_name" as a valid parameter. This change should be verified across the codebase to ensure that any calls to_set_params
now include the "deployment_name" parameter if necessary.
265-285: The
_default_params
property has been updated to include "n" and "best_of" if their values are not default. This is a logical change that aligns with the new attributes added to theBaseOpenAI
class. It's important to ensure that the default values for "n" and "best_of" are set appropriately elsewhere in the code.288-310: New properties
_invocation_params
and_client_params
have been introduced. These properties are likely used to configure the parameters for API calls and the HTTP client, respectively. It's crucial to ensure that these properties are used consistently throughout the codebase and that their values are validated before use.356-389: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [335-364]
The
completion
andchat_completion
methods have been refactored to use theself.client
attribute instead of theopenai
module directly. This change is likely part of the effort to support OpenAI's API v1 and should be checked for consistency and correctness in the context of the entire codebase.
366-386: A new
call
method has been added to theBaseOpenAI
class. This method appears to be a high-level interface for making API calls using either thechat_completion
orcompletion
methods based on the_is_chat_model
attribute. It's important to ensure that this method is integrated correctly with the rest of the codebase and that theinstruction
argument is being handled properly.411-417: The
HuggingFaceLLM
class's_setup
method has been updated to retrieve the API token from either the passedkwargs
or the environment variableHUGGINGFACE_API_KEY
. This is a standard approach to handle configuration and should work well, provided that the environment variable is set in the deployment environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files ignored due to filter (1)
- pyproject.toml
Files selected for processing (2)
- pandasai/llm/base.py (10 hunks)
- tests/llms/test_openai.py (7 hunks)
Additional comments: 13
tests/llms/test_openai.py (3)
29-34: The test
test_type_with_token
andtest_proxy
are good examples of unit tests that check the functionality of theOpenAI
class with different configurations. It's important to ensure that theopenai.proxy
settings are reset after the test to avoid side effects on other tests.39-45: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [39-57]
The
test_params_setting
method is well-structured and tests the setting of various parameters on theOpenAI
class. It's important to ensure that all parameters are being tested, especially if new ones have been added in the pull request.
- 103-112: The
test_call_with_unsupported_model
method correctly tests the behavior when an unsupported model is used. It's important to ensure that the error message is accurate and that theUnsupportedModelError
is raised as expected.pandasai/llm/base.py (10)
17-25: The import statements have been updated to include
Union
,Mapping
, andTuple
. This is a standard update to support type hinting and should not affect existing functionality.28-34: The import of
is_openai_v1
is a new addition to check the OpenAI API version. This is part of the feature to support OpenAI API v1. Ensure that the functionis_openai_v1
is thoroughly tested to prevent any version compatibility issues.133-141: The
_extract_tag_text
method uses regex to extract text between tags. This is a common approach, but it's important to ensure that the regex pattern is well-tested, especially with different and potentially malformed inputs to prevent any unexpected behavior.210-238: New attributes have been added to the
BaseOpenAI
class. These attributes (best_of
,n
,request_timeout
,max_retries
,default_headers
,default_query
,http_client
,_is_chat_model
) are likely to support new features and configurations for the OpenAI API calls. It's crucial to ensure that these new configurations are documented and that their interactions with existing attributes are tested.251-257: The
_set_params
method has been updated to accept "deployment_name" instead of "deployment_id". This change should be cross-checked with the OpenAI API documentation to ensure that it aligns with the expected parameter names and that all references to the old parameter name are updated accordingly.267-313: The
_default_params
and_client_params
properties have been added to manage model invocation parameters. It's important to ensure that these properties are returning the correct parameters and that they are being used correctly in the API call methods. Also, check if the handling of theis_openai_v1
condition is correct and if it aligns with the OpenAI API's requirements.323-336: The
completion
method has been refactored to use theself.client.create
method instead ofopenai.Completion.create
. This change should be verified to ensure that theself.client
is properly configured and that the method call is equivalent to the previous implementation. Additionally, the handling of theopenai_callback_var
should be reviewed to ensure that it is still functioning as intended after the refactor.359-392: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [347-367]
Similar to the
completion
method, thechat_completion
method has been refactored to useself.client.create
. The same considerations apply here: ensure thatself.client
is properly configured and that the method call is equivalent to the previous implementation. Also, verify the handling ofopenai_callback_var
.
369-390: The
call
method has been updated to handle both chat and completion models based on the_is_chat_model
attribute. Ensure that this logic correctly determines the type of model and that the appropriate method (chat_completion
orcompletion
) is called. Also, verify that thelast_prompt
attribute is being used correctly and that the response is handled as expected.414-420: The
HuggingFaceLLM
class has been updated to raise anAPIKeyNotFoundError
if the API token is not provided. This is a good practice for error handling and ensures that the user is aware of missing credentials. Ensure that this error is caught and handled appropriately wherever theHuggingFaceLLM
class is instantiated.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #753 +/- ##
==========================================
- Coverage 85.57% 84.82% -0.76%
==========================================
Files 73 74 +1
Lines 3599 3630 +31
==========================================
- Hits 3080 3079 -1
- Misses 519 551 +32 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Configuration used: CodeRabbit UI
Files selected for processing (1)
- pandasai/llm/base.py (10 hunks)
Additional comments: 9
pandasai/llm/base.py (9)
17-21: The import statements have been updated to include
Union
,Mapping
, andTuple
from thetyping
module. This is likely in preparation for the new parameters and type hints added to theBaseOpenAI
class.28-34: New exceptions and helper functions have been imported, which suggests that the
BaseOpenAI
class will be using these in its updated implementation.217-238: New attributes have been added to the
BaseOpenAI
class, includingbest_of
,n
,request_timeout
,max_retries
,default_headers
,default_query
, andhttp_client
. These changes are likely to support more advanced configuration options for the OpenAI API calls.268-288: The
_default_params
property has been updated to include the new parameters and to remove themax_tokens
parameter. This change aligns with the updated class attributes and ensures that the default parameters for API calls are correctly configured.291-313: Two new properties,
_invocation_params
and_client_params
, have been added. These properties are likely used to configure the OpenAI client and the parameters for model invocation. The use ofis_openai_v1
suggests that there is a version check to ensure compatibility with different versions of the OpenAI API.323-339: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [315-336]
The
completion
method has been updated to use theclient
attribute for making API calls instead of directly usingopenai.Completion
. This change abstracts the API call mechanism and allows for more flexible client configuration.
- 359-392: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [338-367]
Similarly, the
chat_completion
method has been updated to use theclient
attribute. The method now constructs the parameters for the API call using the_invocation_params
property and handles the response using a callback if one is provided.
369-389: The
call
method has been added to theBaseOpenAI
class. This method determines whether to use thechat_completion
orcompletion
method based on the_is_chat_model
attribute. This change centralizes the logic for making API calls and provides a single entry point for invoking the model.414-420: The
HuggingFaceLLM
class has been updated to include a new_setup
method that sets theapi_token
and_max_retries
attributes. Theapi_token
is now sourced from either thekwargs
or the environment variableHUGGINGFACE_API_KEY
. This change improves the configurability and error handling of the class.
if match := re.search( | ||
f"(<{tag}>)(.*)(</{tag}>)", | ||
response, | ||
re.DOTALL | re.MULTILINE, | ||
f"(<{tag}>)(.*)(</{tag}>)", | ||
response, | ||
re.DOTALL | re.MULTILINE, | ||
): | ||
return match[2] | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _extract_tag_text
method has been updated to use the walrus operator (:=
) which is a feature introduced in Python 3.8. This change simplifies the code by combining the assignment and the return statement.
def _set_params(self, **kwargs): | ||
""" | ||
Set Parameters | ||
Args: | ||
**kwargs: ["model", "engine", "deployment_id", "temperature","max_tokens", | ||
"top_p", "frequency_penalty", "presence_penalty", "stop", "seed", ] | ||
**kwargs: ["model", "deployment_name", "temperature","max_tokens", | ||
"top_p", "frequency_penalty", "presence_penalty", "stop", "seed"] | ||
|
||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _set_params
method signature has been modified to accept arbitrary keyword arguments (**kwargs
). This change increases the flexibility of the method, allowing it to handle a wider range of parameters without needing to change the method signature.
valid_params = [ | ||
"model", | ||
"engine", | ||
"deployment_id", | ||
"deployment_name", | ||
"temperature", | ||
"max_tokens", | ||
"top_p", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The valid_params
list within the _set_params
method has been updated to reflect the new parameters that can be set on the BaseOpenAI
class. However, max_tokens
is included here but it has been removed from the class attributes (line 220). This discrepancy should be resolved.
- "max_tokens",
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
valid_params = [ | |
"model", | |
"engine", | |
"deployment_id", | |
"deployment_name", | |
"temperature", | |
"max_tokens", | |
"top_p", | |
valid_params = [ | |
"model", | |
"deployment_name", | |
"temperature", | |
"top_p", |
@mspronesti thanks a lot for the improvement, great to be able to unlock the new features from OpenAI! |
Hi @gventuri, this PR addresses #737.
As it doesn't cover Azure OpenAI yet, I've temporarily marked its tests as "skipped". I will open another PR for Azure OpenAI once this gets merged.
Hence, please don't release after this PR!
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Enhancements
Bug Fixes
Refactor
Tests