You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 17, 2024. It is now read-only.
Describe the bug
When performing a diff between a table in postres and snowflake -- the diff function will fail if Snowflake primary key field is a TEXT field and the table is empty.
I noticed in the logs for successful diffs that tables that were not empty were reporting that the id field was a String_UUID -- for example:
I dug in a little deeper and noticed that the types are determined by sample some of the data via that function _refine_coltypes. It feels a bit like without any values to sample it defaults the type to the type in the db and then eventually raises this NotImplementedError
Make sure to include the following (minus sensitive information):
We are importing the package and calling it from a python script in airflow -- the abreviated code looks like this:
File "/usr/local/airflow/dags/warehouse/data_engineering/avalanche_data_integrity/data_differ.py", line 60, in compare_dbs
control_flow(
File "/usr/local/airflow/dags/warehouse/data_engineering/avalanche_data_integrity/helpers.py", line 945, in control_flow
diff_magic(
File "/usr/local/airflow/dags/warehouse/data_engineering/avalanche_data_integrity/helpers.py", line 744, in diff_magic
for sign, columns in diff_result_wrapper:
File "/usr/local/lib/python3.11/site-packages/data_diff/diff_tables.py", line 95, in __iter__
for i in self.diff:
File "/usr/local/lib/python3.11/site-packages/data_diff/diff_tables.py", line 266, in _diff_tables_wrapper
raise error
File "/usr/local/lib/python3.11/site-packages/data_diff/diff_tables.py", line 239, in _diff_tables_wrapper
yield from self._diff_tables_root(table1, table2, info_tree)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/data_diff/diff_tables.py", line 274, in _diff_tables_root
return self._bisect_and_diff_tables(table1, table2, info_tree)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/data_diff/diff_tables.py", line 299, in _bisect_and_diff_tables
raise NotImplementedError(f"Cannot use a column of type {kt} as a key")
NotImplementedError: Cannot use a column of type Text(_notes=[], collation=None) as a key
If possible, please paste these as text, and not a screenshot.
Describe the environment
Running in airflow with data-diff==0.11.1
The text was updated successfully, but these errors were encountered:
Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. We have completely rewritten the diffing engine in the cloud over the past few months and have solved the fundamental issues with the original algorithm used in the data-diff package. Feel free to take it for a trial or contact us at [email protected] if you have any questions.
Describe the bug
When performing a diff between a table in postres and snowflake -- the diff function will fail if Snowflake primary key field is a
TEXT
field and the table is empty.I noticed in the logs for successful diffs that tables that were not empty were reporting that the id field was a
String_UUID
-- for example:For the table that failed the log looked like this:
I dug in a little deeper and noticed that the types are determined by sample some of the data via that function
_refine_coltypes
. It feels a bit like without any values to sample it defaults the type to the type in the db and then eventually raises thisNotImplementedError
Make sure to include the following (minus sensitive information):
We are importing the package and calling it from a python script in airflow -- the abreviated code looks like this:
Here is an excerpt of the traceback:
If possible, please paste these as text, and not a screenshot.
Describe the environment
Running in airflow with
data-diff==0.11.1
The text was updated successfully, but these errors were encountered: