-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(models) : Joins (Datasets) schema, resolvers and UI #8325
feat(models) : Joins (Datasets) schema, resolvers and UI #8325
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! this is looking solid, thank you so much for your effort here we're really appreciative.
I left some comments, some more serious than others.
In my first review I only had time to get through the modeling and backend changes. I'll take more of a look at datahub-web-react soon!
@@ -701,6 +710,9 @@ private void configureDataPlatformInstanceResolvers(final RuntimeWiring.Builder | |||
|
|||
private void configureQueryResolvers(final RuntimeWiring.Builder builder) { | |||
builder.type("Query", typeWiring -> typeWiring | |||
.dataFetcher("join", new AuthenticatedResolver<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AuthenticatedResolver
is actually deprecated and an authentication check is handled elsewhere so if the user isn't logged in they can't get to any of these endpoints! so feel free to remove it from here as well as below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also i see duplicate "join"
entries here in configureQueryResolvers
so I think you can just remove this one entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
- container | ||
- globalTags | ||
- glossaryTerms | ||
- browsePaths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i'm not sure I see browsePaths
on joins right now since they won't have their own entity profile page. also unsure if we need container
or institutionalMemory
aspects either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the present, I removed BrowsePaths and Container
InstitutionalMemory is needed since we are providing a place for the documentation.
private com.linkedin.common.urn.JoinUrn getJoinUrn(String urnStr) { | ||
try { | ||
return JoinUrn.createFromString(urnStr); | ||
} catch (URISyntaxException e) { | ||
throw new RuntimeException(String.format("Failed to retrieve data product with urn %s, invalid urn", urnStr)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method and getUrn
above look to be duplicated. Also we have this method getUrn
to do just this in a generic UrnUtils
class you can use!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still a nit but I don't think you need this method but can just use the UrnUtils.getUrn
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
""" | ||
List all Join | ||
""" | ||
listJoin(input: ListJoinInput!): ListJoinResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm I don't see this implemented anywhere in GmsGraphQLEngine
- are you still planning on creating this endpoint?
and when would we list all joins in general? won't it always be in the context of a dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, In the beginning made stubs for all endpoints, yeah not needed.
""" | ||
Edges extending from this entity grouped by direction in the lineage graph | ||
""" | ||
lineage(input: LineageInput!): EntityLineageResult |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there lineage on Joins? I don't think so right? there are Relationships
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct - no lineage on Joins but relationships between datasets and joins. Removed lineage
""" | ||
Display name of the Dataset | ||
""" | ||
name: String | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm is this intended? I don't think we're trying to allow dataset names to be editable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - removed.
"owners" : [ "urn:li:corpuser:fbar", "urn:li:corpuser:bfoo" ], | ||
"fields" : [ { | ||
"name" : "joinId", | ||
"doc" : "Join native name e.g. <db>.<table>, /dir/subdir/<name>, or <name>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be a UUID right? otherwise we wouldn't be able to have two joins with the same name and that could be problematic between all of the joins and datasets out there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
@Searchable = { | ||
"fieldType": "TEXT_PARTIAL", | ||
"enableAutocomplete": true, | ||
"boostScore": 10.0 | ||
} | ||
afield: SchemaFieldPath | ||
|
||
/** | ||
* All fields from dataset B that are required for the join, maps to aFields 1:1 | ||
*/ | ||
@Searchable = { | ||
"fieldType": "TEXT_PARTIAL", | ||
"enableAutocomplete": true, | ||
"boostScore": 10.0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make these TEXT
fields instead of TEXT_PARTIAL
since that's what we have on other schema fields. also we don't want to enableAutoComplete
so we can just remove that. also i think you can remove boostScore
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
"fieldType": "DATETIME" | ||
} | ||
} | ||
lastModified: optional TimeStamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using AuditStamp
instead of TimeStamp
? it just gives us some more optional fields to be more flexible in the future and requires the actor
field to be populated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
@Searchable = { | ||
"fieldType": "TEXT", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this to be searchable? it looks like right now it's just a random UUID and we're making the name
searchable anyways for joins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed... removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got through most of the frontend code as well! I'm going to pull this down and test locally later this week and may have some more feedback then as well. In general, frontend code could be broken down into smaller, more consumable chunks. I'm pretty concerned about the number of queries we're making and all of the updates to our .graphql
files as well that seem unrelated (changing these files can have big impact on the performance of our queries)
@@ -96,6 +97,10 @@ export class DatasetEntity implements Entity<Dataset> { | |||
name: 'Schema', | |||
component: SchemaTab, | |||
}, | |||
{ | |||
name: 'Relationships', | |||
component: RelationTab, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can rename this component RelationshipsTab
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
// const fieldSortInput: FieldSortInput = { | ||
// field: 'lastModifiedAt', | ||
// sortOrder: Sort.Desc, | ||
// }; | ||
const tabs = [ | ||
{ | ||
key: 'joinsTab', | ||
tab: 'Joins', | ||
}, | ||
// { | ||
// key: 'pkFkTab', | ||
// tab: 'PK/FK', | ||
// }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove this commented out code? or do we still want this tab here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed commented code.
joinData = joins.search.searchResults.map((r) => r.entity as Join); | ||
} | ||
|
||
const joinPreview = (record: Join): JSX.Element => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this RelationTab
is getting pretty big and complicated. I think moving this joinPreview
function with all of its rendering logic below out to its own component would be nice! or you can simply move this name logic into the existing JoinPreview
component
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored.
)} | ||
{loadingJoin && ( | ||
<div> | ||
Joins <LoadingOutlined />{' '} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I think we usually just show <LoadingOutlined />
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
const { data: table2Dataset } = useGetDatasetQuery({ | ||
variables: { | ||
urn: selectDataset || 'urn:li:dataset:(urn:li:dataPlatform:snowflake,testData,DEV)', | ||
}, | ||
}); | ||
const { data: table2Schema } = useGetDatasetSchemaQuery({ | ||
variables: { | ||
urn: selectDataset || 'urn:li:dataset:(urn:li:dataPlatform:snowflake,testData,DEV)', | ||
}, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we're hardcoding these urns as backup here? also i believe we'll be immediately firing these queries when we load this page. I think these should probably make these lazy queries to only fire once they select a dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also why do we need to call useGetDatasetQuery
and useGetDatasetSchemaQuery
? we're already getting the dataset when we search for it right? we could store that in state. I get needing to make a query to show the new dataset's schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general we want to minimize the number of queries we make and only make queries that are (1) necessary and (2) when we need them (as opposed to on page load)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both graphql queries removed and utilized lazy queries.
selectedEntities={selectedEntities} | ||
/> | ||
</TabToolbar> | ||
{!singleSelect && ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm hiding this bar based on singleSelect
feels off. maybe we need another optional prop for this component saying that we should hide this select bar if it's true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
export const JoinTab = ({ properties }: { properties?: any }) => { | ||
const { data: entityData } = useGetJoinQuery({ variables: { urn: properties } }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm confused - why is properties
any
type and then we pass that into the query as the urn
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @chriscollins3456 for review and valuable feedback!
While we are working on other, for this one:
i'm confused - why is
properties
any
type and then we pass that into the query as theurn
parameter?
Here JoinTab is called from JoinEntity and tab type is 'EntityTab' with definition as below:
export type EntityTab = {
name: string;
component: React.FunctionComponent<{ properties?: any }>;
display?: {
visible: (GenericEntityProperties, T) => boolean; // Whether the tab is visible on the UI. Defaults to true.
enabled: (GenericEntityProperties, T) => boolean; // Whether the tab is enabled on the UI. Defaults to true.
};
properties?: any;
};
Component can support 'React.FunctionComponent<{ properties?: any }>;' this signature only.
This has properties to pass any argument to Tab and I wanted to get urn from it.
Please feel free to add any suggestion here. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed as suggested- utilized useEntityData from EntityContext.
incoming: relationships( | ||
input: { types: ["joinA", "joinB"], direction: INCOMING, start: 0, count: 100 } | ||
) { | ||
...fullRelationshipResults | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we will then be doing a relationships query here for all dataset queries - can we move this out and create a whole new query for getting joins associated with this dataset? also, aren't we making a query to get joins where a given dataset is either joinA
or joinB
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed this fragment and we are fetching joins with graphql search query.
fragment searchResultFieldsWithRelationships on Entity { | ||
urn | ||
type | ||
... on Dataset { | ||
name | ||
origin | ||
uri | ||
platform { | ||
...platformFields | ||
} | ||
dataPlatformInstance { | ||
...dataPlatformInstanceFields | ||
} | ||
editableProperties { | ||
name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we adding all of this stuff in our graphql files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed entries which are not required.
name | ||
description | ||
} | ||
platform { | ||
...platformFields | ||
} | ||
ownership { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also confused why this lineage.graphql
file is being updated? Joins aren't lineage, they are relationships, but we'll never get them in a lineage specific query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Rtekal pl join UI er model relation
Rtekal pl join UI er model relation
Yep Chris. Took care of all your review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you so much for this!! also thank you so much for being so helpful on working with me on these requested changes - you all are real troopers no doubt about it.
I have one tiny final change that I'd like to request just to update the casing in one field in the model, which I checked should require only updates to a couple other places.
However, i'm approving this to get CI going to make sure everything is green before we merge this in! that's the only real blocker now to make sure tests and linting are passed. I'll help you guys out and point out changes to make CI pass
Unfortunately I don't have write access to poorvi's repo so I can't push code to get CI green with you unless I get access. Either way I'm happy to help guide you on anything that we might need to do :)
/** | ||
* ERModelRelationFieldMapping (in future we can make it an array) | ||
*/ | ||
relationshipfieldMappings: array [RelationshipFieldMapping] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more tiny one! would you mind updating the casing of this model to be relationshipFieldMappings
with a capital "F"? then i checked the code and there's just a couple of places that would have to be changed to reflect then then it's good!
but it actually looks like you recently merged in master and the CI is looking solid! so really we should only have to worry about the casing request in that final comment and then we're merging this bad boy I went ahead and merged in master one more time to make sure we're up to date |
Rtekal pl join UI er model relation
Rtekal pl join UI er model relation
Sure Chris. Thank you. I went ahead and did the last review changes. Thank you again for all your help and extreme patience with our contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! i think we're good to merge :)
…ject#8325) Co-authored-by: Raj Tekal <[email protected]> Co-authored-by: Raj Tekal <[email protected]> Co-authored-by: Raj Tekal <[email protected]> Co-authored-by: Chris Collins <[email protected]>
Checklist