-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task(connector-sdk): CSV data validation example #78
base: main
Are you sure you want to change the base?
Conversation
This pull request has been linked to and will mark 1 task as "Done" when merged:
|
2d21ff2
to
cbcf3c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few comments, please take a look.
@@ -0,0 +1,7 @@ | |||
{ | |||
"aws_access_key_id": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow a similar format to the one used for other examples.
e.g.: "aws_access_key_id": "YOUR_AWS_ACCESS_KEY_ID",
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -0,0 +1,201 @@ | |||
# This is a simple example for how to work with the fivetran_connector_sdk module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention this example in README file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"string_value": "STRING", | ||
"json_value": "JSON", | ||
"native_date_value": "NAIVE_DATE", | ||
"native_date_time_value": "NAIVE_DATETIME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, did you want to mention it as naive_date_time_value
instead of native_date_time_value
, same for previous line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
for index, row in df.iterrows(): | ||
for result in upsert_csv_row(row): | ||
if result is None: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a log mentioning that we skipped a row and mention the line number and file name for which it was skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
def validate_int_value(row, value: str): | ||
try: | ||
return int(value), True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to return two values (value and boolean), instead can't we just return None in case of failure, and have the check in upsert_csv_row
for the value, and if it is None, return None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
def validate_json_value(row, value: str): | ||
try: | ||
json.loads(value) # Attempt to parse the JSON string | ||
return value, True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we return parsed value here (json.loads(value)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. but in duckDB it is saved as STRING value
except ValueError: | ||
print_error_message(row, f"Invalid integer value: '{value}'") | ||
return None, False | ||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both log and return value are same for both the caught exception, lets merge both of them.
e.g. except (ValueError, TypeError) as e:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
def validate_native_date_time_value(row, value: str): | ||
try: | ||
return datetime.strptime(value, "%Y-%m-%d %H:%M:%S"), True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add logic to trim the whitespace before using value to identify the datetime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Adding this code to your `connector.py` allows you to test your connector by running your file directly from your IDE: | ||
connector.debug(configuration=configuration) | ||
|
||
# Resulting table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table removed
Also, please add details of testing in description. |
30c0bab
to
6a1d6bb
Compare
6a1d6bb
to
d978e1d
Compare
d978e1d
to
1862965
Compare
Done |
Closes https://fivetran.height.app/T-858709
What does the change do?
New example is added
How did you validate the change?
Run the

fivetran debug
and data is getting saved into DB.