Skip to content

Commit

Permalink
docs(ingest): clarify adding source guide (#9161)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsheth2 authored Nov 6, 2023
1 parent 81daae8 commit 0215666
Showing 1 changed file with 18 additions and 14 deletions.
32 changes: 18 additions & 14 deletions metadata-ingestion/adding-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ There are two ways of adding a metadata ingestion source.
2. You are writing the custom source for yourself and are not going to contribute back (yet).

If you are going for case (1) just follow the steps 1 to 9 below. In case you are building it for yourself you can skip
steps 4-9 (but maybe write tests and docs for yourself as well) and follow the documentation
steps 4-8 (but maybe write tests and docs for yourself as well) and follow the documentation
on [how to use custom ingestion sources](../docs/how/add-custom-ingestion-source.md)
without forking Datahub.

Expand All @@ -27,6 +27,7 @@ from `ConfigModel`. The [file source](./src/datahub/ingestion/source/file.py) is
We use [pydantic](https://pydantic-docs.helpmanual.io) conventions for documenting configuration flags. Use the `description` attribute to write rich documentation for your configuration field.

For example, the following code:

```python
from pydantic import Field
from datahub.api.configuration.common import ConfigModel
Expand All @@ -49,12 +50,10 @@ generates the following documentation:
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/metadata-ingestion/generated_config_docs.png"/>
</p>


:::note
Inline markdown or code snippets are not yet supported for field level documentation.
:::


### 2. Set up the reporter

The reporter interface enables the source to report statistics, warnings, failures, and other information about the run.
Expand All @@ -71,6 +70,8 @@ some [convenience methods](./src/datahub/emitter/mce_builder.py) for commonly us

### 4. Set up the dependencies

Note: Steps 4-8 are only required if you intend to contribute the source back to the Datahub project.

Declare the source's pip dependencies in the `plugins` variable of the [setup script](./setup.py).

### 5. Enable discoverability
Expand Down Expand Up @@ -119,37 +120,38 @@ from datahub.ingestion.api.decorators import (
@capability(SourceCapability.LINEAGE_COARSE, "Enabled by default")
class FileSource(Source):
"""
The File Source can be used to produce all kinds of metadata from a generic metadata events file.
The File Source can be used to produce all kinds of metadata from a generic metadata events file.
:::note
Events in this file can be in MCE form or MCP form.
:::
"""

... source code goes here

```


#### 7.2 Write custom documentation

- Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components.
- Create a copy of [`source-docs-template.md`](./source-docs-template.md) and edit all relevant components.
- Name the document as `<plugin.md>` and move it to `metadata-ingestion/docs/sources/<platform>/<plugin>.md`. For example for the Kafka platform, under the `kafka` plugin, move the document to `metadata-ingestion/docs/sources/kafka/kafka.md`.
- Add a quickstart recipe corresponding to the plugin under `metadata-ingestion/docs/sources/<platform>/<plugin>_recipe.yml`. For example, for the Kafka platform, under the `kafka` plugin, there is a quickstart recipe located at `metadata-ingestion/docs/sources/kafka/kafka_recipe.yml`.
- To write platform-specific documentation (that is cross-plugin), write the documentation under `metadata-ingestion/docs/sources/<platform>/README.md`. For example, cross-plugin documentation for the BigQuery platform is located under `metadata-ingestion/docs/sources/bigquery/README.md`.

#### 7.3 Viewing the Documentation

Documentation for the source can be viewed by running the documentation generator from the `docs-website` module.
Documentation for the source can be viewed by running the documentation generator from the `docs-website` module.

##### Step 1: Build the Ingestion docs

```console
# From the root of DataHub repo
./gradlew :metadata-ingestion:docGen
```

If this finishes successfully, you will see output messages like:

```console
Ingestion Documentation Generation Complete
############################################
Expand All @@ -170,14 +172,16 @@ Ingestion Documentation Generation Complete
You can also find documentation files generated at `./docs/generated/ingestion/sources` relative to the root of the DataHub repo. You should be able to locate your specific source's markdown file here and investigate it to make sure things look as expected.

#### Step 2: Build the Entire Documentation
To view how this documentation looks in the browser, there is one more step. Just build the entire docusaurus page from the `docs-website` module.

To view how this documentation looks in the browser, there is one more step. Just build the entire docusaurus page from the `docs-website` module.

```console
# From the root of DataHub repo
./gradlew :docs-website:build
```

This will generate messages like:

```console
...
> Task :docs-website:yarnGenerate
Expand Down Expand Up @@ -219,15 +223,15 @@ BUILD SUCCESSFUL in 35s
36 actionable tasks: 16 executed, 20 up-to-date
```

After this you need to run the following script from the `docs-website` module.
After this you need to run the following script from the `docs-website` module.

```console
cd docs-website
npm run serve
```

Now, browse to http://localhost:3000 or whichever port npm is running on, to browse the docs.
Your source should show up on the left sidebar under `Metadata Ingestion / Sources`.

Now, browse to http://localhost:3000 or whichever port npm is running on, to browse the docs.
Your source should show up on the left sidebar under `Metadata Ingestion / Sources`.

### 8. Add SQL Alchemy mapping (if applicable)

Expand Down

0 comments on commit 0215666

Please sign in to comment.