Skip to content

Commit

Permalink
Merge branch 'master' into origin/FixFlakyAddUserTest
Browse files Browse the repository at this point in the history
  • Loading branch information
jjoyce0510 authored Jul 25, 2023
2 parents cb2a2a9 + b12de09 commit 5b21eb3
Show file tree
Hide file tree
Showing 99 changed files with 6,768 additions and 3,330 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ public CompletableFuture<BrowseResultsV2> get(DataFetchingEnvironment environmen
final int start = input.getStart() != null ? input.getStart() : DEFAULT_START;
final int count = input.getCount() != null ? input.getCount() : DEFAULT_COUNT;
final String query = input.getQuery() != null ? input.getQuery() : "*";
// escape forward slash since it is a reserved character in Elasticsearch
final String sanitizedQuery = ResolverUtils.escapeForwardSlash(query);

return CompletableFuture.supplyAsync(() -> {
try {
Expand All @@ -64,7 +66,7 @@ public CompletableFuture<BrowseResultsV2> get(DataFetchingEnvironment environmen
maybeResolvedView != null
? SearchUtils.combineFilters(filter, maybeResolvedView.getDefinition().getFilter())
: filter,
query,
sanitizedQuery,
start,
count,
context.getAuthentication()
Expand Down
21 changes: 17 additions & 4 deletions datahub-web-react/src/app/home/AcrylDemoBanner.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,17 @@ const StyledLink = styled(Link)`
font-weight: 700;
`;

const TextContent = styled.div`
max-width: 1025px;
`;

export default function AcrylDemoBanner() {
return (
<BannerWrapper>
<Logo src={AcrylLogo} />
<TextWrapper>
<Title>Schedule a Demo of Managed Datahub</Title>
<span>
<Title>Schedule a Demo of Managed DataHub</Title>
<TextContent>
DataHub is already the industry&apos;s #1 Open Source Data Catalog.{' '}
<StyledLink
href="https://www.acryldata.io/datahub-sign-up"
Expand All @@ -48,8 +52,17 @@ export default function AcrylDemoBanner() {
>
Schedule a demo
</StyledLink>{' '}
of Acryl Cloud to see the advanced features that take it to the next level!
</span>
of Acryl DataHub to see the advanced features that take it to the next level or purchase Acryl Cloud
on{' '}
<StyledLink
href="https://aws.amazon.com/marketplace/pp/prodview-ratzv4k453pck?sr=0-1&ref_=beagle&applicationId=AWSMPContessa"
target="_blank"
rel="noopener noreferrer"
>
AWS Marketplace
</StyledLink>
!
</TextContent>
</TextWrapper>
</BannerWrapper>
);
Expand Down
1 change: 1 addition & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,7 @@ module.exports = {
"docs/how/add-custom-ingestion-source",
"docs/how/add-custom-data-platform",
"docs/advanced/browse-paths-upgrade",
"docs/browseV2/browse-paths-v2",
],
},
],
Expand Down
51 changes: 51 additions & 0 deletions docs/browseV2/browse-paths-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# Generating Browse Paths (V2)

<FeatureAvailability/>

## Introduction

Browse (V2) is a way for users to explore and dive deeper into their data. Its integration with the search experience allows users to combine search queries and filters with entity type and platform nested folders.

Most entities should have a browse path that allows users to navigate the left side panel on the search page to find groups of entities under different folders that come from these browse paths. Below, you can see an example of the sidebar with some new browse paths.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/browseV2/browseV2Sidebar.png"/>
</p>

This new browse sidebar always starts with Entity Type, then optionally shows Environment (PROD, DEV, etc.) if there are 2 or more Environments, then Platform. Below the Platform level, we render out folders that come directly from entity's [browsePathsV2](https://datahubproject.io/docs/generated/metamodel/entities/dataset#browsepathsv2) aspects.

## Generating Custom Browse Paths

A `browsePathsV2` aspect has a field called `path` which contains a list of `BrowsePathEntry` objects. Each object in the path represents one level of the entity's browse path where the first entry is the highest level and the last entry is the lowest level.

If an entity has this aspect filled out, their browse path will show up in the browse sidebar so that you can navigate its folders and select one to filter search results down.

For example, in the browse sidebar on the left of the image above, there are 10 Dataset entities from the BigQuery Platform that have `browsePathsV2` aspects that look like the following:

```
[ { id: "bigquery-public-data" }, { id: "covid19_public_forecasts" } ]
```

The `id` in a `BrowsePathEntry` is required and is what will be shown in the UI unless the optional `urn` field is populated. If the `urn` field is populated, we will try to resolve this path entry into an entity object and display that entity's name. We will also show a link to allow you to open up the entity profile.

The `urn` field should only be populated if there is an entity in your DataHub instance that belongs in that entity's browse path. This makes most sense for Datasets to have Container entities in the browse paths as well as some other cases such as a DataFlow being part of a DataJob's browse path. For any other situation, feel free to leave `urn` empty and populate `id` with the text you want to be shown in the UI for your entity's path.

## Additional Resources

### GraphQL

* [browseV2](../../graphql/queries.md#browsev2)

## FAQ and Troubleshooting

**How are browsePathsV2 aspects created?**

We create `browsePathsV2` aspects for all entities that should have one by default when you ingest your data if this aspect is not already provided. This happens based on separator characters that appear within an Urn.

Our ingestion sources are also producing `browsePathsV2` aspects since CLI version v0.10.5.

### Related Features

* [Search](../how/search.md)
3 changes: 3 additions & 0 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,14 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
certain column-level metrics. Instead, set `profile_table_level_only` to `false` and
individually enable / disable desired field metrics.
- #8451: The `bigquery-beta` and `snowflake-beta` source aliases have been dropped. Use `bigquery` and `snowflake` as the source type instead.
- #8472: Ingestion runs created with Pipeline.create will show up in the DataHub ingestion tab as CLI-based runs. To revert to the previous behavior of not showing these runs in DataHub, pass `no_default_report=True`.

### Potential Downtime

### Deprecations

- #8198: In the Python SDK, the `PlatformKey` class has been renamed to `ContainerKey`.

### Other notable Changes

## 0.10.4
Expand Down
10 changes: 10 additions & 0 deletions metadata-ingestion/examples/library/create_mlmodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,16 @@
description="my feature",
groups=model_group_urns,
mlFeatures=feature_urns,
trainingMetrics=[
models.MLMetricClass(
name="accuracy", description="accuracy of the model", value="1.0"
)
],
hyperParams=[
models.MLHyperParamClass(
name="hyper_1", description="hyper_1", value="0.102"
)
],
),
)

Expand Down
3 changes: 2 additions & 1 deletion metadata-ingestion/src/datahub/cli/check_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def metadata_file(json_file: str, rewrite: bool, unpack_mces: bool) -> None:
"type": "file",
"config": {"filename": out_file.name},
},
}
},
no_default_report=True,
)

pipeline.run()
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/src/datahub/cli/docker_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -985,7 +985,7 @@ def ingest_sample_data(path: Optional[str], token: Optional[str]) -> None:
if token is not None:
recipe["sink"]["config"]["token"] = token

pipeline = Pipeline.create(recipe)
pipeline = Pipeline.create(recipe, no_default_report=True)
pipeline.run()
ret = pipeline.pretty_print_summary()
sys.exit(ret)
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/src/datahub/cli/ingest_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ def mcps(path: str) -> None:
},
}

pipeline = Pipeline.create(recipe)
pipeline = Pipeline.create(recipe, no_default_report=True)
pipeline.run()
ret = pipeline.pretty_print_summary()
sys.exit(ret)
Expand Down
21 changes: 19 additions & 2 deletions metadata-ingestion/src/datahub/emitter/mce_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@
import time
from enum import Enum
from hashlib import md5
from typing import Any, List, Optional, Type, TypeVar, Union, cast, get_type_hints
from typing import (
TYPE_CHECKING,
Any,
List,
Optional,
Type,
TypeVar,
Union,
cast,
get_type_hints,
)

import typing_inspect

Expand Down Expand Up @@ -50,6 +60,9 @@
os.getenv("DATAHUB_DATASET_URN_TO_LOWER", "false") == "true"
)

if TYPE_CHECKING:
from datahub.emitter.mcp_builder import DatahubKey


# TODO: Delete this once lower-casing is the standard.
def set_dataset_urn_to_lower(value: bool) -> None:
Expand Down Expand Up @@ -132,7 +145,11 @@ def dataset_key_to_urn(key: DatasetKeyClass) -> str:
)


def make_container_urn(guid: str) -> str:
def make_container_urn(guid: Union[str, "DatahubKey"]) -> str:
from datahub.emitter.mcp_builder import DatahubKey

if isinstance(guid, DatahubKey):
guid = guid.guid()
return f"urn:li:container:{guid}"


Expand Down
29 changes: 18 additions & 11 deletions metadata-ingestion/src/datahub/emitter/mcp_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,9 @@ def guid(self) -> str:
return _stable_guid_from_dict(bag)


class PlatformKey(DatahubKey):
class ContainerKey(DatahubKey):
"""Base class for container guid keys. Most users should use one of the subclasses instead."""

platform: str
instance: Optional[str] = None

Expand All @@ -81,20 +83,27 @@ def guid_dict(self) -> Dict[str, str]:
def property_dict(self) -> Dict[str, str]:
return self.dict(by_alias=True, exclude_none=True)

def as_urn(self) -> str:
return make_container_urn(guid=self.guid())


# DEPRECATION: Keeping the `PlatformKey` name around for backwards compatibility.
PlatformKey = ContainerKey


class DatabaseKey(PlatformKey):
class DatabaseKey(ContainerKey):
database: str


class SchemaKey(DatabaseKey):
db_schema: str = Field(alias="schema")


class ProjectIdKey(PlatformKey):
class ProjectIdKey(ContainerKey):
project_id: str


class MetastoreKey(PlatformKey):
class MetastoreKey(ContainerKey):
metastore: str


Expand All @@ -110,11 +119,11 @@ class BigQueryDatasetKey(ProjectIdKey):
dataset_id: str


class FolderKey(PlatformKey):
class FolderKey(ContainerKey):
folder_abs_path: str


class BucketKey(PlatformKey):
class BucketKey(ContainerKey):
bucket_name: str


Expand All @@ -127,7 +136,7 @@ def default(self, obj: Any) -> Any:
return json.JSONEncoder.default(self, obj)


KeyType = TypeVar("KeyType", bound=PlatformKey)
KeyType = TypeVar("KeyType", bound=ContainerKey)


def add_domain_to_entity_wu(
Expand Down Expand Up @@ -188,7 +197,7 @@ def gen_containers(
container_key: KeyType,
name: str,
sub_types: List[str],
parent_container_key: Optional[PlatformKey] = None,
parent_container_key: Optional[ContainerKey] = None,
extra_properties: Optional[Dict[str, str]] = None,
domain_urn: Optional[str] = None,
description: Optional[str] = None,
Expand All @@ -199,9 +208,7 @@ def gen_containers(
created: Optional[int] = None,
last_modified: Optional[int] = None,
) -> Iterable[MetadataWorkUnit]:
container_urn = make_container_urn(
guid=container_key.guid(),
)
container_urn = container_key.as_urn()
yield MetadataChangeProposalWrapper(
entityUrn=f"{container_urn}",
# entityKeyAspect=ContainerKeyClass(guid=parent_container_key.guid()),
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/src/datahub/ingestion/api/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ def _get_browse_path_processor(self, dry_run: bool) -> MetadataWorkUnitProcessor

platform_instance: Optional[str] = None
if isinstance(config, PlatformInstanceConfigMixin) and config.platform_instance:
platform_instance = platform_instance
platform_instance = config.platform_instance

return partial(
auto_browse_path_v2,
Expand Down
25 changes: 13 additions & 12 deletions metadata-ingestion/src/datahub/ingestion/graph/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,12 @@ class DatahubClientConfig(ConfigModel):
"""Configuration class for holding connectivity to datahub gms"""

server: str = "http://localhost:8080"
token: Optional[str]
timeout_sec: Optional[int]
retry_status_codes: Optional[List[int]]
retry_max_times: Optional[int]
extra_headers: Optional[Dict[str, str]]
ca_certificate_path: Optional[str]
token: Optional[str] = None
timeout_sec: Optional[int] = None
retry_status_codes: Optional[List[int]] = None
retry_max_times: Optional[int] = None
extra_headers: Optional[Dict[str, str]] = None
ca_certificate_path: Optional[str] = None
disable_ssl_verification: bool = False

_max_threads_moved_to_sink = pydantic_removed_field(
Expand All @@ -88,6 +88,12 @@ class RemovedStatusFilter(enum.Enum):
"""Search only soft-deleted entities."""


@dataclass
class RelatedEntity:
urn: str
relationship_type: str


def _graphql_entity_type(entity_type: str) -> str:
"""Convert the entity types into GraphQL "EntityType" enum values."""

Expand Down Expand Up @@ -769,11 +775,6 @@ class RelationshipDirection(str, enum.Enum):
INCOMING = "INCOMING"
OUTGOING = "OUTGOING"

@dataclass
class RelatedEntity:
urn: str
relationship_type: str

def get_related_entities(
self,
entity_urn: str,
Expand All @@ -794,7 +795,7 @@ def get_related_entities(
},
)
for related_entity in response.get("entities", []):
yield DataHubGraph.RelatedEntity(
yield RelatedEntity(
urn=related_entity["urn"],
relationship_type=related_entity["relationshipType"],
)
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/src/datahub/ingestion/run/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ def create(
dry_run: bool = False,
preview_mode: bool = False,
preview_workunits: int = 10,
report_to: Optional[str] = None,
report_to: Optional[str] = "datahub",
no_default_report: bool = False,
raw_config: Optional[dict] = None,
) -> "Pipeline":
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
support_status,
)
from datahub.ingestion.api.source import Source
from datahub.ingestion.api.source_helpers import auto_workunit_reporter
from datahub.ingestion.api.workunit import MetadataWorkUnit
from datahub.ingestion.source.aws.sagemaker_processors.common import (
SagemakerSourceConfig,
Expand Down Expand Up @@ -57,9 +56,6 @@ def create(cls, config_dict, ctx):
config = SagemakerSourceConfig.parse_obj(config_dict)
return cls(config, ctx)

def get_workunits(self) -> Iterable[MetadataWorkUnit]:
return auto_workunit_reporter(self.report, self.get_workunits_internal())

def get_workunits_internal(self) -> Iterable[MetadataWorkUnit]:
# get common lineage graph
lineage_processor = LineageProcessor(
Expand Down
Loading

0 comments on commit 5b21eb3

Please sign in to comment.