Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MySQL DataType handler to transform datatype value #5165

Merged
merged 6 commits into from
Nov 4, 2024

Conversation

dinujoh
Copy link
Member

@dinujoh dinujoh commented Nov 1, 2024

Description

This PR introduces a DataType handling system to properly transform MySQL data types from binary format to their corresponding string representations. This is essential for both change stream processing and parquet file export processing where data from events may be encoded in bytes.

Some handler implementation are stubbed. Will add follow up PR when they are tested with correct implementation.

Issues Resolved

Contributes to #4561

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dinujoh dinujoh changed the title Add MySQL DataType handler to transform datatype value Add MySQL Datatype handler to transform datatype value Nov 1, 2024
@dinujoh dinujoh changed the title Add MySQL Datatype handler to transform datatype value Add MySQL DataType handler to transform datatype value Nov 1, 2024
chenqi0805
chenqi0805 previously approved these changes Nov 1, 2024
Copy link
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design looks nice! Thanks for working on this.

A few small comments/questions below:

return handleNumericType(columnType, (Number) value);
}

private String handleNumericType(final MySQLDataType columnType, final Number value) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we converting all numeric types to String type here? For numbers that OpenSearch supports, should we keep them as numbers?

Comment on lines +25 to +26
VARCHAR("varchar", DataCategory.STRING, DataSubCategory.CHAR),
TINYTEXT("tinytext", DataCategory.STRING, DataSubCategory.BYTES),
Copy link
Collaborator

@oeyh oeyh Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering what's the difference between CHAR and BYTES sub category. Are they represented differently in binlog events?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The once tagged with DataSubCategory.BYTES the value is encoded in bytes in binlogs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is VARCHAR encoded in bytes in binlog as well, or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, CHAR and VARCHAR are not encoded in bytes

@dinujoh dinujoh merged commit e49c997 into opensearch-project:main Nov 4, 2024
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants