Skip to content

Commit

Permalink
Merge branch 'master' into jackson-upgrade
Browse files Browse the repository at this point in the history
  • Loading branch information
david-leifker authored Aug 25, 2023
2 parents cde66bb + 04ecf4f commit aa1464f
Show file tree
Hide file tree
Showing 298 changed files with 1,446 additions and 206 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ metadata-ingestion/generated/**

# docs
docs/generated/
docs-website/versioned_docs/
tmp*
temp/**

Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,11 @@ Please follow the [DataHub Quickstart Guide](https://datahubproject.io/docs/quic

If you're looking to build & modify datahub please take a look at our [Development Guide](https://datahubproject.io/docs/developers).

[![DataHub Demo GIF](docs/imgs/entity.png)](https://demo.datahubproject.io/)
<p align="center">
<a href="https://demo.datahubproject.io/">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/entity.png"/>
</a>
</p>

## Source Code and Repositories

Expand Down
4 changes: 3 additions & 1 deletion datahub-web-react/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ for functional configurability should reside.
to render a view associated with a particular entity type (user, dataset, etc.).


![entity-registry](./entity-registry.png)
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/entity-registry.png"/>
</p>

**graphql** - The React App talks to the `dathub-frontend` server using GraphQL. This module is where the *queries* issued
against the server are defined. Once defined, running `yarn run generate` will code-gen TypeScript objects to make invoking
Expand Down
66 changes: 55 additions & 11 deletions docker/airflow/local_airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,35 +138,75 @@ Successfully added `conn_id`=datahub_rest_default : datahub_rest://:@http://data

Navigate the Airflow UI to find the sample Airflow dag we just brought in

![Find the DAG](../../docs/imgs/airflow/find_the_dag.png)

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/find_the_dag.png"/>
</p>


By default, Airflow loads all DAG-s in paused status. Unpause the sample DAG to use it.
![Paused DAG](../../docs/imgs/airflow/paused_dag.png)
![Unpaused DAG](../../docs/imgs/airflow/unpaused_dag.png)

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/paused_dag.png"/>
</p>


<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/unpaused_dag.png"/>
</p>


Then trigger the DAG to run.

![Trigger the DAG](../../docs/imgs/airflow/trigger_dag.png)

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/trigger_dag.png"/>
</p>


After the DAG runs successfully, go over to your DataHub instance to see the Pipeline and navigate its lineage.

![DataHub Pipeline View](../../docs/imgs/airflow/datahub_pipeline_view.png)

![DataHub Pipeline Entity](../../docs/imgs/airflow/datahub_pipeline_entity.png)
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/datahub_pipeline_view.png"/>
</p>



<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/datahub_pipeline_entity.png"/>
</p>

![DataHub Task View](../../docs/imgs/airflow/datahub_task_view.png)

![DataHub Lineage View](../../docs/imgs/airflow/datahub_lineage_view.png)

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/datahub_task_view.png"/>
</p>



<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/datahub_lineage_view.png"/>
</p>


## TroubleShooting

Most issues are related to connectivity between Airflow and DataHub.

Here is how you can debug them.

![Find the Task Log](../../docs/imgs/airflow/finding_failed_log.png)

![Inspect the Log](../../docs/imgs/airflow/connection_error.png)
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/finding_failed_log.png"/>
</p>



<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/connection_error.png"/>
</p>


In this case, clearly the connection `datahub-rest` has not been registered. Looks like we forgot to register the connection with Airflow!
Let's execute Step 4 to register the datahub connection with Airflow.
Expand All @@ -175,4 +215,8 @@ In case the connection was registered successfully but you are still seeing `Fai

After re-running the DAG, we see success!

![Pipeline Success](../../docs/imgs/airflow/successful_run.png)

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/airflow/successful_run.png"/>
</p>

9 changes: 7 additions & 2 deletions docs-website/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,12 @@ task yarnGenerate(type: YarnTask, dependsOn: [yarnInstall,
args = ['run', 'generate']
}

task yarnStart(type: YarnTask, dependsOn: [yarnInstall, yarnGenerate]) {
task downloadHistoricalVersions(type: Exec) {
workingDir '.'
commandLine 'python3', 'download_historical_versions.py'
}

task yarnStart(type: YarnTask, dependsOn: [yarnInstall, yarnGenerate, downloadHistoricalVersions]) {
args = ['run', 'start']
}
task fastReload(type: YarnTask) {
Expand Down Expand Up @@ -105,7 +110,7 @@ task serve(type: YarnTask, dependsOn: [yarnInstall] ) {
}


task yarnBuild(type: YarnTask, dependsOn: [yarnLint, yarnGenerate]) {
task yarnBuild(type: YarnTask, dependsOn: [yarnLint, yarnGenerate, downloadHistoricalVersions]) {
inputs.files(projectMdFiles)
inputs.file("package.json").withPathSensitivity(PathSensitivity.RELATIVE)
inputs.dir("src").withPathSensitivity(PathSensitivity.RELATIVE)
Expand Down
5 changes: 5 additions & 0 deletions docs-website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ module.exports = {
label: "Roadmap",
position: "right",
},
{
type: 'docsVersionDropdown',
position: 'right',
dropdownActiveClassDisabled: true,
},
{
href: "https://slack.datahubproject.io",
"aria-label": "Slack",
Expand Down
60 changes: 60 additions & 0 deletions docs-website/download_historical_versions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import os
import tarfile
import urllib.request
import json

repo_url = "https://api.github.com/repos/datahub-project/static-assets"


def download_file(url, destination):
with urllib.request.urlopen(url) as response:
with open(destination, "wb") as f:
while True:
chunk = response.read(8192)
if not chunk:
break
f.write(chunk)


def fetch_tar_urls(repo_url, folder_path):
api_url = f"{repo_url}/contents/{folder_path}"
response = urllib.request.urlopen(api_url)
data = response.read().decode('utf-8')
tar_urls = [
file["download_url"] for file in json.loads(data) if file["name"].endswith(".tar.gz")
]
print(tar_urls)
return tar_urls


def main():
folder_path = "versioned_docs"
destination_dir = "versioned_docs"
if not os.path.exists(destination_dir):
os.makedirs(destination_dir)

tar_urls = fetch_tar_urls(repo_url, folder_path)

for url in tar_urls:
filename = os.path.basename(url)
destination_path = os.path.join(destination_dir, filename)

version = '.'.join(filename.split('.')[:3])
extracted_path = os.path.join(destination_dir, version)
print("extracted_path", extracted_path)
if os.path.exists(extracted_path):
print(f"{extracted_path} already exists, skipping downloads")
continue
try:
download_file(url, destination_path)
print(f"Downloaded {filename} to {destination_dir}")
with tarfile.open(destination_path, "r:gz") as tar:
tar.extractall()
os.remove(destination_path)
except urllib.error.URLError as e:
print(f"Error while downloading {filename}: {e}")
continue


if __name__ == "__main__":
main()
15 changes: 10 additions & 5 deletions docs-website/src/pages/docs/_components/SearchBar/index.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -303,11 +303,16 @@ function SearchBar() {
strokeLinejoin="round"
></path>
</svg>

{docsSearchVersionsHelpers.versioningEnabled && <SearchVersionSelectList docsSearchVersionsHelpers={docsSearchVersionsHelpers} />}
</form>

<div className={styles.searchResultsColumn}>{!!searchResultState.totalResults && documentsFoundPlural(searchResultState.totalResults)}</div>
{docsSearchVersionsHelpers.versioningEnabled && (
<SearchVersionSelectList
docsSearchVersionsHelpers={docsSearchVersionsHelpers}
/>
)}
<div className={styles.searchResultsColumn}>
{!!searchResultState.totalResults &&
documentsFoundPlural(searchResultState.totalResults)}
</div>

{searchResultState.items.length > 0 ? (
<main>
Expand Down Expand Up @@ -369,4 +374,4 @@ function SearchBar() {
);
}

export default SearchBar;
export default SearchBar;
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,21 @@
height: 1.5rem;
}

.searchQueryInput {
padding: 0.8rem 0.8rem 0.8rem 3rem;
}

.searchVersionInput {
padding: 0.8rem 2rem 0.8rem 2rem;
text-align: center;
}

.searchQueryInput,
.searchVersionInput {
border-radius: 1000em;
border-style: solid;
border-color: transparent;
font: var(--ifm-font-size-base) var(--ifm-font-family-base);
padding: 0.8rem 0.8rem 0.8rem 3rem;
width: 100%;
background: var(--docsearch-searchbox-background);
color: var(--docsearch-text-color);
Expand Down Expand Up @@ -93,6 +101,7 @@
@media only screen and (max-width: 996px) {
.searchVersionColumn {
max-width: 40% !important;
margin: auto;
}

.searchResultsColumn {
Expand All @@ -113,9 +122,15 @@
.searchVersionColumn {
max-width: 100% !important;
padding-left: var(--ifm-spacing-horizontal) !important;
margin: auto;
}
}

.searchVersionColumn {
margin: auto;
}


.loadingSpinner {
width: 3rem;
height: 3rem;
Expand Down
Loading

0 comments on commit aa1464f

Please sign in to comment.