Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task(connector-sdk): CSV data validation example #78

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fivetran-bhargavpatel
Copy link
Collaborator

@fivetran-bhargavpatel fivetran-bhargavpatel commented Feb 24, 2025

Closes https://fivetran.height.app/T-858709

What does the change do?

New example is added

How did you validate the change?

Run the fivetran debug and data is getting saved into DB.
Screenshot 2025-03-07 at 11 16 02 AM

Copy link

height bot commented Feb 24, 2025

This pull request has been linked to and will mark 1 task as "Done" when merged:

  • T-858709 Add SDK connector example to process csv file with basic data validation (unlink task)

💡Tip: You can link multiple Height tasks to a pull request.

Copy link
Collaborator

@fivetran-satvikpatil fivetran-satvikpatil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few comments, please take a look.

@@ -0,0 +1,7 @@
{
"aws_access_key_id": "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow a similar format to the one used for other examples.
e.g.: "aws_access_key_id": "YOUR_AWS_ACCESS_KEY_ID",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,201 @@
# This is a simple example for how to work with the fivetran_connector_sdk module.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention this example in README file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"string_value": "STRING",
"json_value": "JSON",
"native_date_value": "NAIVE_DATE",
"native_date_time_value": "NAIVE_DATETIME"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming, did you want to mention it as naive_date_time_value instead of native_date_time_value, same for previous line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

for index, row in df.iterrows():
for result in upsert_csv_row(row):
if result is None:
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a log mentioning that we skipped a row and mention the line number and file name for which it was skipped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def validate_int_value(row, value: str):
try:
return int(value), True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return two values (value and boolean), instead can't we just return None in case of failure, and have the check in upsert_csv_row for the value, and if it is None, return None.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

def validate_json_value(row, value: str):
try:
json.loads(value) # Attempt to parse the JSON string
return value, True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return parsed value here (json.loads(value))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. but in duckDB it is saved as STRING value

except ValueError:
print_error_message(row, f"Invalid integer value: '{value}'")
return None, False
except TypeError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both log and return value are same for both the caught exception, lets merge both of them.
e.g. except (ValueError, TypeError) as e:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def validate_native_date_time_value(row, value: str):
try:
return datetime.strptime(value, "%Y-%m-%d %H:%M:%S"), True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add logic to trim the whitespace before using value to identify the datetime?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Adding this code to your `connector.py` allows you to test your connector by running your file directly from your IDE:
connector.debug(configuration=configuration)

# Resulting table:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table removed

@fivetran-satvikpatil
Copy link
Collaborator

Also, please add details of testing in description.

@fivetran-bhargavpatel
Copy link
Collaborator Author

Also, please add details of testing in description.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants