Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for a job writing to a Unity Catalog volume #51

Merged
merged 13 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions knowledge_base/write_from_job_to_volume/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.databricks/
20 changes: 20 additions & 0 deletions knowledge_base/write_from_job_to_volume/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Save job result to volume

This example demonstrates how to define and use a Unity Catalog Volume in a Databricks Asset Bundle.

Specifically we'll define a `hello_world_job` job which writes "Hello, World!"
to a file in a Unity Catalog Volume.

The bundle also defines a Volume and the associated Schema in which the Job writes text to.

## Prerequisites

* Databricks CLI v0.236.0 or above

## Usage

Update the `host` field under `workspace` in `databricks.yml` to the Databricks workspace you wish to deploy to.

Run `databricks bundle deploy` to deploy the job.

Run `databricks bundle run hello_world_job` to run the job and store the results in UC volume.
12 changes: 12 additions & 0 deletions knowledge_base/write_from_job_to_volume/databricks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
bundle:
name: write_from_job_to_volume

include:
- resources/*.yml

workspace:
host: https://e2-dogfood.staging.cloud.databricks.com

targets:
dev:
default: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
resources:
jobs:
hello_world_job:
name: hello_world_job

# No job cluster is configured. The job will run on serverless compute.
# You can explicitly configure job compute here if your workspace does
# not have serverless compute enabled.
tasks:
- task_key: hello_world_job_task
notebook_task:
notebook_path: ../src/hello.ipynb

parameters:
- name: file_path
default: /Volumes/main/${resources.schemas.hello_world_schema.name}/${resources.volumes.my_volume.name}/hello_world.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
resources:
schemas:
hello_world_schema:
catalog_name: main
name: ${workspace.current_user.short_name}_hello_world
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
resources:
volumes:
my_volume:
catalog_name: main
# We use the ${resources.schemas...} interpolation syntax to force the creation
# of the schema before the volume. Usage of the ${resources.schemas...} syntax
# allows Databricks Asset Bundles to form a dependency graph between resources.
schema_name: ${resources.schemas.hello_world_schema.name}
name: my_volume
21 changes: 21 additions & 0 deletions knowledge_base/write_from_job_to_volume/src/hello.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file_path = dbutils.widgets.get(\"file_path\")\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually work does it? Without a dbutils.widgets.text() and/or widgets section in the ipynb JSON below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works fine:

Screenshot 2024-12-09 at 10 18 18 PM
(.venv) ➜  cli git:(detect/schema-dep) databricks fs cat dbfs:/Volumes/main/shreyas_goenka_hello_world/my_volume/hello_world.txt -p dogfood
Hello World!%

"dbutils.fs.put(file_path, \"Hello World!\", overwrite=True)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}