Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primary key auto-detection for semantic sdtypes #1731

Merged
merged 3 commits into from
Jan 10, 2024

Conversation

fealho
Copy link
Member

@fealho fealho commented Jan 5, 2024

CU-86ayxgv0h, Resolve #1724

@sdv-team
Copy link
Contributor

sdv-team commented Jan 5, 2024

@codecov-commenter
Copy link

codecov-commenter commented Jan 5, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (64e8df2) 97.13% compared to head (5391578) 97.11%.
Report is 1 commits behind head on main.

Files Patch % Lines
sdv/metadata/single_table.py 83.33% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1731      +/-   ##
==========================================
- Coverage   97.13%   97.11%   -0.02%     
==========================================
  Files          48       48              
  Lines        4507     4512       +5     
==========================================
+ Hits         4378     4382       +4     
- Misses        129      130       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fealho fealho marked this pull request as ready for review January 5, 2024 17:38
@fealho fealho requested a review from a team as a code owner January 5, 2024 17:38
@fealho fealho requested review from rwedge and frances-h and removed request for a team January 5, 2024 17:38
Comment on lines +423 to +425
# When no primary key column was set, choose the first pii field
if self.primary_key is None and first_pii_field:
self.primary_key = first_pii_field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, is the logic we want here to set the primary key to the first ID or PII column we find? Or is it to fall back on PII columns only if we don't find an ID column in the table?

Copy link
Member Author

@fealho fealho Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue doesn't specify, so I implemented the second approach. I could change this if we have evidence that PII columns are just as likely as ID ones to be selected as primary keys (in real world datasets).

@fealho fealho requested a review from frances-h January 5, 2024 23:14
Copy link
Contributor

@frances-h frances-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fealho fealho merged commit 15f18fd into main Jan 10, 2024
37 checks passed
@fealho fealho deleted the issue-1724-metadata-autodetection branch January 10, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata auto-detection should find primary keys of semantic sdtypes
5 participants