[Feature] Add support for cluster execution of arbitrary notebook code #976

lukeSmth · 2023-12-14T17:20:20Z

Loving the extension! Huge improvement for using engineering best practices and integrating Databricks compute with the larger ecosystem of locally executed tools.

I'd like to see support for executing arbitrary notebook code (not just Spark calls) on remote Databricks clusters. This would allow local developers to seamlessly take advantage of Databricks compute for heavy, non-Spark workflows (model training for example).

Two approaches come to mind:

Pipe commands to the Command Execution API, possibly using a local Jupyter Kernel to interop between the notebook environment and Databricks.
Connect to the driver node Jupyter Kernel over SSH

Command Execution API

The Databricks Power Tools extension solves this by using the Command Execution API.

I don't know Rust, but as far as I can tell this article Connecting Jupyter with Databricks aims to wrap the API with a local Jupyter kernel (which would allow connections to any Jupyter client).

SSH

This seems the most straightforward in terms of net new code required. Also seems identical to the deprecated (for security purposes?) jupyterlab-integration.

MrTeale · 2024-05-02T22:35:09Z

+1 on this

@kartikgupta-db - If you have a rough understanding of what would need to change for this to be implemented and would accept a PR, I'd be willing to have a go. Just need some guidance on getting started

lukeSmth changed the title ~~Add support for cluster execution of arbitrary notebook code~~ [Feature] Add support for cluster execution of arbitrary notebook code Dec 14, 2023

kartikgupta-db added the enhancement New feature or request label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add support for cluster execution of arbitrary notebook code #976

[Feature] Add support for cluster execution of arbitrary notebook code #976

lukeSmth commented Dec 14, 2023

MrTeale commented May 2, 2024

[Feature] Add support for cluster execution of arbitrary notebook code #976

[Feature] Add support for cluster execution of arbitrary notebook code #976

Comments

lukeSmth commented Dec 14, 2023

MrTeale commented May 2, 2024