Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest/cassandra): Add support for Cassandra as a source #11822

Conversation

sagar-salvi-apptware
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@jjoyce0510
Copy link
Collaborator

Things to confirm:

  • Platform Instance is supported, and is included when generating keyspace (container) AND table (dataset) URNs
  • Make sure that UDT column types are correctly flattened and supported
  • Make sure the column type mapping matches the following
Cassandra Data Type DataHub Data Type
ascii StringType
bigint NumberType
blob BytesType
boolean BooleanType
counter NumberType
date DateType
decimal NumberType
double NumberType
float NumberType
inet StringType
int NumberType
list ArrayType
map MapType
set ArrayType
smallint NumberType
text StringType
time TimeType
timestamp DateType
timeuuid StringType
tinyint NumberType
tuple ArrayType
uuid StringType
varchar StringType
varint NumberType
frozen<map<text, text>> MapType
frozen<list> ArrayType
frozen<set> ArrayType

@jjoyce0510
Copy link
Collaborator

set -> ArrayType list -> ArrayType
map<?> -> MapType

And UDT -> Flattened structure

@jjoyce0510
Copy link
Collaborator

jjoyce0510 commented Nov 8, 2024

Views & Tables should have the same aspects

  • Data Platform Instance
  • Sub Types
  • Dataset Properties
  • Container
  • Browse Paths V2

And ensure we have table + column comments as description fields.

.....And

Views have

  • View Properties

@jjoyce0510
Copy link
Collaborator

Table + Column Comments should be QA'd locally.

@jjoyce0510
Copy link
Collaborator

Also please make sure error handling is clear. And try-excepts are around areas where we need. This connector should only fail in known cases. We should not have ANY uncaught exceptions!

Copy link
Collaborator

@mayurinehate mayurinehate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

@mayurinehate mayurinehate merged commit fd2da83 into datahub-project:master Nov 15, 2024
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants