-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(doc): improving docs across multiple sources (#4815)
- Loading branch information
Showing
20 changed files
with
581 additions
and
247 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
source: | ||
type: athena | ||
config: | ||
# Coordinates | ||
aws_region: my_aws_region | ||
work_group: primary | ||
|
||
# Options | ||
s3_staging_dir: "s3://my_staging_athena_results_bucket/results/" | ||
|
||
sink: | ||
# sink configs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
## Integration Details | ||
|
||
<!-- Plain-language description of what this integration is meant to do. --> | ||
<!-- Include details about where metadata is extracted from (ie. logs, source API, manifest, etc.) --> | ||
|
||
The Datahub Pulsar source plugin extracts `topic` and `schema` metadata from an Apache Pulsar instance and ingest the information into Datahub. The plugin uses the [Pulsar admin Rest API interface](https://pulsar.apache.org/admin-rest-api/#) to interact with the Pulsar instance. The following APIs are used in order to: | ||
- [Get the list of existing tenants](https://pulsar.apache.org/admin-rest-api/#tag/tenants) | ||
- [Get the list of namespaces associated with each tenant](https://pulsar.apache.org/admin-rest-api/#tag/namespaces) | ||
- [Get the list of topics associated with each namespace](https://pulsar.apache.org/admin-rest-api/#tag/persistent-topic) | ||
- persistent topics | ||
- persistent partitioned topics | ||
- non-persistent topics | ||
- non-persistent partitioned topics | ||
- [Get the latest schema associated with each topic](https://pulsar.apache.org/admin-rest-api/#tag/schemas) | ||
|
||
The data is extracted on `tenant` and `namespace` basis, topics with corresponding schema (if available) are ingested as [Dataset](docs/generated/metamodel/entities/dataset.md) into Datahub. Some additional values like `schema description`, `schema_version`, `schema_type` and `partitioned` are included as `DatasetProperties`. | ||
|
||
|
||
### Concept Mapping | ||
|
||
<!-- This should be a manual mapping of concepts from the source to the DataHub Metadata Model --> | ||
<!-- Authors should provide as much context as possible about how this mapping was generated, including assumptions made, known shortcuts, & any other caveats --> | ||
|
||
This ingestion source maps the following Source System Concepts to DataHub Concepts: | ||
|
||
<!-- Remove all unnecessary/irrelevant DataHub Concepts --> | ||
|
||
|
||
| Source Concept | DataHub Concept | Notes | | ||
|----------------|--------------------------------------------------------------------|---------------------------------------------------------------------------| | ||
| `pulsar` | [Data Platform](docs/generated/metamodel/entities/dataPlatform.md) | | | ||
| Pulsar Topic | [Dataset](docs/generated/metamodel/entities/dataset.md) | _subType_: `topic` | | ||
| Pulsar Schema | [SchemaField](docs/generated/metamodel/entities/schemaField.md) | Maps to the fields defined within the `Avro` or `JSON` schema definition. | | ||
|
||
|
||
## Metadata Ingestion Quickstart | ||
|
||
For context on getting started with ingestion, check out our [metadata ingestion guide](../../../../metadata-ingestion/README.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
source: | ||
type: "pulsar" | ||
config: | ||
env: "TEST" | ||
platform_instance: "local" | ||
## Pulsar client connection config ## | ||
web_service_url: "https://localhost:8443" | ||
verify_ssl: "/opt/certs/ca.cert.pem" | ||
# Issuer url for auth document, for example "http://localhost:8083/realms/pulsar" | ||
issuer_url: <issuer_url> | ||
client_id: ${CLIENT_ID} | ||
client_secret: ${CLIENT_SECRET} | ||
# Tenant list to scrape | ||
tenants: | ||
- tenant_1 | ||
- tenant_2 | ||
# Topic filter pattern | ||
topic_patterns: | ||
allow: | ||
- ".*sales.*" | ||
|
||
sink: | ||
# sink configs |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
To get all metadata from Snowflake you need to use two plugins `snowflake` and `snowflake-usage`. Both of them are described in this page. These will require 2 separate recipes. We understand this is not ideal and we plan to make this easier in the future. |
Oops, something went wrong.