Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

PostgreSQL and MySQL interpret string ordering differently #762

Closed
mrn3 opened this issue Nov 2, 2023 · 5 comments
Closed

PostgreSQL and MySQL interpret string ordering differently #762

mrn3 opened this issue Nov 2, 2023 · 5 comments
Labels
bug Something isn't working stale_immune Immunity to stale bot

Comments

@mrn3
Copy link

mrn3 commented Nov 2, 2023

Describe the bug
Data diff results are not correct when comparing PostgreSQL and MySQL. This is because of the difference in the sorting order of hypens.

alphanum is being used to convert a string to number with that alphanum as base, so if we have hyphen - in the alphanum it is used to generate a set of strings based on range and that range will change across databases as the hyphen is treated differently by the databases.

This is also very related to #513 and may be a duplicate.

Make sure to include the following (minus sensitive information):

  • This issue is caused when issuing data-diff comparison of a PostgreSQL and MySQL database table.

Describe the environment

Debian Linux with data-diff 0.9.7.

@mrn3 mrn3 added the bug Something isn't working label Nov 2, 2023
@github-actions github-actions bot added the triage label Nov 2, 2023
@mrn3
Copy link
Author

mrn3 commented Nov 2, 2023

I am working on #763 to resolve this issue.

@mrn3
Copy link
Author

mrn3 commented Nov 2, 2023

This fixes the issue, but isn't ideal yet because it breaks other cases. We plan to make it so either a user will choose which kinds of alphanums they want and pass it in as a parameter, or it will automatically detect it based on the type of database.

@dlawin
Copy link
Contributor

dlawin commented Nov 2, 2023

Data diff results are not correct when comparing PostgreSQL and MySQL. This is because of the difference in the sorting order of hypens.

@mrn3, I wonder if we can set the collation to something equivalent?

Only tangentially related, but I was considering setting mysql collation in this PR: #712

https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html

@dlawin dlawin added stale_immune Immunity to stale bot and removed triage labels Nov 2, 2023
@mrn3
Copy link
Author

mrn3 commented Nov 2, 2023

Yeah, we considered that - so you are thinking that maybe just setting the collation will resolve our original issue?

I do know that the fix I have that PR for does fix the issue.

@glebmezh
Copy link
Contributor

Hi @mrn3,

I'm sorry for the delay in following up on this. Thank you for opening this issue and looking into the potential solutions!

We made a hard decision to sunset the data-diff package and won't provide further development or support.

Over the past few months, we have completely rewritten the diffing engine in the cloud and have solved the fundamental issues with the original algorithm used in the data-diff package, as well as collation support, etc. Feel free to take it for a trial or contact us at [email protected] if you have any questions.

-Gleb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working stale_immune Immunity to stale bot
Projects
None yet
Development

No branches or pull requests

3 participants