Skip to content

Commit

Permalink
add regex support
Browse files Browse the repository at this point in the history
  • Loading branch information
anshbansal committed Sep 22, 2023
1 parent 2bad891 commit 4620cb0
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 5 deletions.
25 changes: 24 additions & 1 deletion metadata-ingestion/docs/transformer/dataset_transformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,37 @@ The below table shows transformer which can transform aspects of entity [Dataset
| Dataset Aspect | Transformer |
|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `status` | - [Mark Dataset status](#mark-dataset-status) |
| `ownership` | - [Simple Add Dataset ownership](#simple-add-dataset-ownership)<br/> - [Pattern Add Dataset ownership](#pattern-add-dataset-ownership)<br/> - [Simple Remove Dataset Ownership](#simple-remove-dataset-ownership) |
| `ownership` | - [Simple Add Dataset ownership](#simple-add-dataset-ownership)<br/> - [Pattern Add Dataset ownership](#pattern-add-dataset-ownership)<br/> - [Simple Remove Dataset Ownership](#simple-remove-dataset-ownership)<br/> - [Extract Ownership from Tags](#extract-ownership-from-tags) |
| `globalTags` | - [Simple Add Dataset globalTags ](#simple-add-dataset-globaltags)<br/> - [Pattern Add Dataset globalTags](#pattern-add-dataset-globaltags)<br/> - [Add Dataset globalTags](#add-dataset-globaltags) |
| `browsePaths` | - [Set Dataset browsePath](#set-dataset-browsepath) |
| `glossaryTerms` | - [Simple Add Dataset glossaryTerms ](#simple-add-dataset-glossaryterms)<br/> - [Pattern Add Dataset glossaryTerms](#pattern-add-dataset-glossaryterms) |
| `schemaMetadata` | - [Pattern Add Dataset Schema Field glossaryTerms](#pattern-add-dataset-schema-field-glossaryterms)<br/> - [Pattern Add Dataset Schema Field globalTags](#pattern-add-dataset-schema-field-globaltags) |
| `datasetProperties` | - [Simple Add Dataset datasetProperties](#simple-add-dataset-datasetproperties)<br/> - [Add Dataset datasetProperties](#add-dataset-datasetproperties) |
| `domains` | - [Simple Add Dataset domains](#simple-add-dataset-domains)<br/> - [Pattern Add Dataset domains](#pattern-add-dataset-domains) |

## Extract Ownership from Tags
### Config Details
| Field | Required | Type | Default | Description |
|-----------------------------|----------|---------|---------------|---------------------------------------------|
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. |
| `tag_prefix` | | str | | Regex to use for tags to match against. Supports Regex to match a prefix which is used to remove content. Rest of string is considered owner ID for creating owner URN. |
| `is_user` | | bool | `true` | Whether should be consider a user or not. If `false` then considered a group. |
| `email_domain` | | str | | If set then this is appended to create owner URN. |
| `owner_type` | | str | `TECHNICAL_OWNER` | Ownership type. |
| `owner_type_urn` | | str | `None` | Set to a custom ownership type's URN if using custom ownership. |

Matches against a tag prefix and considers string in tags after that prefix as owner to create ownership.

```yaml
transformers:
- type: "extract_ownership_from_tags"
config:
tag_prefix: "dbt:techno-genie:"
is_user: true
email_domain: "coolcompany.com"

```
## Mark Dataset Status
### Config Details
| Field | Required | Type | Default | Description |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,9 @@ def transform_aspect(
for tag_class in tags:
tag_urn = TagUrn.create_from_string(tag_class.tag)
tag_str = tag_urn.get_entity_id()[0]
if re.match(self.config.tag_prefix, tag_str):
result = re.search(self.config.tag_prefix, tag_str)
print(result.span)
owner_str = tag_str[len(self.config.tag_prefix) :]
re_match = re.search(self.config.tag_prefix, tag_str)
if re_match:
owner_str = tag_str[re_match.end():].strip()
owner_urn_str = self.get_owner_urn(owner_str)
if self.config.is_user:
owner_urn = str(CorpuserUrn.create_from_id(owner_urn_str))
Expand Down

0 comments on commit 4620cb0

Please sign in to comment.