-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(LogSerialization): dataframe timestamp serialization #751
Conversation
WalkthroughThe recent updates involve enhancing a Python codebase, specifically focusing on the handling of dataframes within a query execution tracker. A new method for converting dataframes to dictionaries has been introduced, streamlining the process across different functions. Additionally, the test suite has been expanded with new tests to ensure the correct formatting of responses, especially when dealing with datetime fields in dataframes. Changes
Poem
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (2)
- pandasai/helpers/query_exec_tracker.py (3 hunks)
- tests/test_query_tracker.py (3 hunks)
Additional comments: 5
tests/test_query_tracker.py (2)
1-4: The import statements have been reorganized, which is a common practice for readability and to follow PEP 8 guidelines. Ensure that the newly added imports (
json
anddatetime
) are used in the test cases.10-13: The addition of
datetime
is necessary for the new test case that deals with datetime fields in DataFrames. This is a good practice to ensure that the test cases cover the new functionality introduced in the code.pandasai/helpers/query_exec_tracker.py (3)
88-93: The new
convert_dataframe_to_dict
method is a good addition for centralizing the DataFrame to dictionary conversion logic. It uses theto_json
method withorient='split'
anddate_format='iso'
which is appropriate for ensuring that timestamps are serialized correctly. This should address the issue mentioned in the pull request summary.100-106: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [95-103]
The
add_dataframes
method has been updated to use the newconvert_dataframe_to_dict
method. This change ensures that all DataFrames added to the_dataframes
list are consistently converted to dictionaries. It's important to ensure that thehead_df
attribute used in line 102 is a property of theSmartDataFrame
and that it returns a DataFrame that can be passed toconvert_dataframe_to_dict
.
- 202-210: The
_format_response
method has been updated to handle the serialization of DataFrames and plots. For DataFrames, it now uses theconvert_dataframe_to_dict
method, which is consistent with the changes made inadd_dataframes
. For plots, it reads the file, encodes it in Base64, and returns it as a data URI. This is a common approach for serializing images to be sent over JSON. However, ensure that the file atresult["value"]
exists and is accessible before attempting to open it. Also, consider handling potential exceptions that may arise from file I/O operations.
tests/test_query_tracker.py
Outdated
assert len(formatted_response["value"]["headers"]) == 3 | ||
assert len(formatted_response["value"]["rows"]) == 10 | ||
|
||
def test_format_response_dataframe_with_datetime_field( | ||
self, tracker: QueryExecTracker, sample_df: pd.DataFrame | ||
): | ||
# Add a date column with random dates for demonstration | ||
start_date = datetime(2023, 1, 1) | ||
date_range = [start_date + timedelta(days=x) for x in range(len(sample_df))] | ||
|
||
sample_df["date"] = date_range | ||
|
||
print("Sample", sample_df) | ||
column_dtypes = sample_df.dtypes | ||
|
||
# Display the result | ||
print(column_dtypes) | ||
# Create a sample ResponseType for a dataframe | ||
response = {"type": "dataframe", "value": sample_df} | ||
|
||
# Format the response using _format_response | ||
formatted_response = tracker._format_response(response) | ||
|
||
# Validate dataframe json serialization | ||
json.dumps(formatted_response) | ||
|
||
# Check if the response is formatted correctly | ||
assert formatted_response["type"] == "dataframe" | ||
assert len(formatted_response["value"]["headers"]) == 4 | ||
assert len(formatted_response["value"]["rows"]) == 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test case test_format_response_dataframe_with_datetime_field
is designed to ensure that the _format_response
method can handle DataFrames with datetime fields correctly. This is a crucial test since datetime serialization can often be problematic.
However, there are a couple of print statements (lines 134 and 138) that seem to be used for debugging purposes. These should be removed before merging the code to maintain a clean and professional codebase.
- print("Sample", sample_df)
- column_dtypes = sample_df.dtypes
- print(column_dtypes)
Commitable suggestion
[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.
assert len(formatted_response["value"]["headers"]) == 3 | |
assert len(formatted_response["value"]["rows"]) == 10 | |
def test_format_response_dataframe_with_datetime_field( | |
self, tracker: QueryExecTracker, sample_df: pd.DataFrame | |
): | |
# Add a date column with random dates for demonstration | |
start_date = datetime(2023, 1, 1) | |
date_range = [start_date + timedelta(days=x) for x in range(len(sample_df))] | |
sample_df["date"] = date_range | |
print("Sample", sample_df) | |
column_dtypes = sample_df.dtypes | |
# Display the result | |
print(column_dtypes) | |
# Create a sample ResponseType for a dataframe | |
response = {"type": "dataframe", "value": sample_df} | |
# Format the response using _format_response | |
formatted_response = tracker._format_response(response) | |
# Validate dataframe json serialization | |
json.dumps(formatted_response) | |
# Check if the response is formatted correctly | |
assert formatted_response["type"] == "dataframe" | |
assert len(formatted_response["value"]["headers"]) == 4 | |
assert len(formatted_response["value"]["rows"]) == 10 | |
assert len(formatted_response["value"]["headers"]) == 3 | |
assert len(formatted_response["value"]["rows"]) == 10 | |
def test_format_response_dataframe_with_datetime_field( | |
self, tracker: QueryExecTracker, sample_df: pd.DataFrame | |
): | |
# Add a date column with random dates for demonstration | |
start_date = datetime(2023, 1, 1) | |
date_range = [start_date + timedelta(days=x) for x in range(len(sample_df))] | |
sample_df["date"] = date_range | |
# Create a sample ResponseType for a dataframe | |
response = {"type": "dataframe", "value": sample_df} | |
# Format the response using _format_response | |
formatted_response = tracker._format_response(response) | |
# Validate dataframe json serialization | |
json.dumps(formatted_response) | |
# Check if the response is formatted correctly | |
assert formatted_response["type"] == "dataframe" | |
assert len(formatted_response["value"]["headers"]) == 4 | |
assert len(formatted_response["value"]["rows"]) == 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- tests/test_query_tracker.py (3 hunks)
Files skipped from review due to trivial changes (1)
- tests/test_query_tracker.py
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #751 +/- ##
==========================================
+ Coverage 85.57% 85.59% +0.01%
==========================================
Files 73 73
Lines 3599 3603 +4
==========================================
+ Hits 3080 3084 +4
Misses 519 519 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
New Features
Tests