Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(LogSerialization): dataframe timestamp serialization #751

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Nov 14, 2023

Summary by CodeRabbit

  • New Features

    • Implemented a method for converting dataframes to dictionary format to enhance data handling.
  • Tests

    • Added new tests to validate the correct formatting of dataframes with datetime fields.
    • Updated existing tests to include date columns, ensuring comprehensive coverage.

Copy link
Contributor

coderabbitai bot commented Nov 14, 2023

Walkthrough

The recent updates involve enhancing a Python codebase, specifically focusing on the handling of dataframes within a query execution tracker. A new method for converting dataframes to dictionaries has been introduced, streamlining the process across different functions. Additionally, the test suite has been expanded with new tests to ensure the correct formatting of responses, especially when dealing with datetime fields in dataframes.

Changes

File Path Change Summary
pandasai/helpers/query_exec_tracker.py Introduced convert_dataframe_to_dict method; updated add_dataframes and _format_response to use this new method.
tests/test_query_tracker.py Added json and datetime imports; included new test test_format_response_dataframe_with_datetime_field; updated test_format_response_dataframe to handle a date column.

Poem

In the code where dataframes dwell,
A rabbit hopped, casting a spell. 🐰✨
With autumn's crisp and coding flair,
It shaped the tests with utmost care. 🍂📊


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0712f84 and d82ecd3.
Files selected for processing (2)
  • pandasai/helpers/query_exec_tracker.py (3 hunks)
  • tests/test_query_tracker.py (3 hunks)
Additional comments: 5
tests/test_query_tracker.py (2)
  • 1-4: The import statements have been reorganized, which is a common practice for readability and to follow PEP 8 guidelines. Ensure that the newly added imports (json and datetime) are used in the test cases.

  • 10-13: The addition of datetime is necessary for the new test case that deals with datetime fields in DataFrames. This is a good practice to ensure that the test cases cover the new functionality introduced in the code.

pandasai/helpers/query_exec_tracker.py (3)
  • 88-93: The new convert_dataframe_to_dict method is a good addition for centralizing the DataFrame to dictionary conversion logic. It uses the to_json method with orient='split' and date_format='iso' which is appropriate for ensuring that timestamps are serialized correctly. This should address the issue mentioned in the pull request summary.

  • 100-106: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [95-103]

The add_dataframes method has been updated to use the new convert_dataframe_to_dict method. This change ensures that all DataFrames added to the _dataframes list are consistently converted to dictionaries. It's important to ensure that the head_df attribute used in line 102 is a property of the SmartDataFrame and that it returns a DataFrame that can be passed to convert_dataframe_to_dict.

  • 202-210: The _format_response method has been updated to handle the serialization of DataFrames and plots. For DataFrames, it now uses the convert_dataframe_to_dict method, which is consistent with the changes made in add_dataframes. For plots, it reads the file, encodes it in Base64, and returns it as a data URI. This is a common approach for serializing images to be sent over JSON. However, ensure that the file at result["value"] exists and is accessible before attempting to open it. Also, consider handling potential exceptions that may arise from file I/O operations.

Comment on lines 122 to 151
assert len(formatted_response["value"]["headers"]) == 3
assert len(formatted_response["value"]["rows"]) == 10

def test_format_response_dataframe_with_datetime_field(
self, tracker: QueryExecTracker, sample_df: pd.DataFrame
):
# Add a date column with random dates for demonstration
start_date = datetime(2023, 1, 1)
date_range = [start_date + timedelta(days=x) for x in range(len(sample_df))]

sample_df["date"] = date_range

print("Sample", sample_df)
column_dtypes = sample_df.dtypes

# Display the result
print(column_dtypes)
# Create a sample ResponseType for a dataframe
response = {"type": "dataframe", "value": sample_df}

# Format the response using _format_response
formatted_response = tracker._format_response(response)

# Validate dataframe json serialization
json.dumps(formatted_response)

# Check if the response is formatted correctly
assert formatted_response["type"] == "dataframe"
assert len(formatted_response["value"]["headers"]) == 4
assert len(formatted_response["value"]["rows"]) == 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test case test_format_response_dataframe_with_datetime_field is designed to ensure that the _format_response method can handle DataFrames with datetime fields correctly. This is a crucial test since datetime serialization can often be problematic.

However, there are a couple of print statements (lines 134 and 138) that seem to be used for debugging purposes. These should be removed before merging the code to maintain a clean and professional codebase.

-        print("Sample", sample_df)
-        column_dtypes = sample_df.dtypes
-        print(column_dtypes)

Commitable suggestion

[!IMPORTANT]
Carefully review the code before committing. Make sure it correctly replaces the highlighted code, has no missing lines and indentaion issues.

Suggested change
assert len(formatted_response["value"]["headers"]) == 3
assert len(formatted_response["value"]["rows"]) == 10
def test_format_response_dataframe_with_datetime_field(
self, tracker: QueryExecTracker, sample_df: pd.DataFrame
):
# Add a date column with random dates for demonstration
start_date = datetime(2023, 1, 1)
date_range = [start_date + timedelta(days=x) for x in range(len(sample_df))]
sample_df["date"] = date_range
print("Sample", sample_df)
column_dtypes = sample_df.dtypes
# Display the result
print(column_dtypes)
# Create a sample ResponseType for a dataframe
response = {"type": "dataframe", "value": sample_df}
# Format the response using _format_response
formatted_response = tracker._format_response(response)
# Validate dataframe json serialization
json.dumps(formatted_response)
# Check if the response is formatted correctly
assert formatted_response["type"] == "dataframe"
assert len(formatted_response["value"]["headers"]) == 4
assert len(formatted_response["value"]["rows"]) == 10
assert len(formatted_response["value"]["headers"]) == 3
assert len(formatted_response["value"]["rows"]) == 10
def test_format_response_dataframe_with_datetime_field(
self, tracker: QueryExecTracker, sample_df: pd.DataFrame
):
# Add a date column with random dates for demonstration
start_date = datetime(2023, 1, 1)
date_range = [start_date + timedelta(days=x) for x in range(len(sample_df))]
sample_df["date"] = date_range
# Create a sample ResponseType for a dataframe
response = {"type": "dataframe", "value": sample_df}
# Format the response using _format_response
formatted_response = tracker._format_response(response)
# Validate dataframe json serialization
json.dumps(formatted_response)
# Check if the response is formatted correctly
assert formatted_response["type"] == "dataframe"
assert len(formatted_response["value"]["headers"]) == 4
assert len(formatted_response["value"]["rows"]) == 10

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d82ecd3 and cd87101.
Files selected for processing (1)
  • tests/test_query_tracker.py (3 hunks)
Files skipped from review due to trivial changes (1)
  • tests/test_query_tracker.py

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (0712f84) 85.57% compared to head (cd87101) 85.59%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
+ Coverage   85.57%   85.59%   +0.01%     
==========================================
  Files          73       73              
  Lines        3599     3603       +4     
==========================================
+ Hits         3080     3084       +4     
  Misses        519      519              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gventuri gventuri merged commit 6de6b39 into main Nov 15, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants