You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loving the extension! Huge improvement for using engineering best practices and integrating Databricks compute with the larger ecosystem of locally executed tools.
I'd like to see support for executing arbitrary notebook code (not just Spark calls) on remote Databricks clusters. This would allow local developers to seamlessly take advantage of Databricks compute for heavy, non-Spark workflows (model training for example).
Two approaches come to mind:
Pipe commands to the Command Execution API, possibly using a local Jupyter Kernel to interop between the notebook environment and Databricks.
Connect to the driver node Jupyter Kernel over SSH
I don't know Rust, but as far as I can tell this article Connecting Jupyter with Databricks aims to wrap the API with a local Jupyter kernel (which would allow connections to any Jupyter client).
SSH
This seems the most straightforward in terms of net new code required. Also seems identical to the deprecated (for security purposes?) jupyterlab-integration.
The text was updated successfully, but these errors were encountered:
lukeSmth
changed the title
Add support for cluster execution of arbitrary notebook code
[Feature] Add support for cluster execution of arbitrary notebook code
Dec 14, 2023
@kartikgupta-db - If you have a rough understanding of what would need to change for this to be implemented and would accept a PR, I'd be willing to have a go. Just need some guidance on getting started
Loving the extension! Huge improvement for using engineering best practices and integrating Databricks compute with the larger ecosystem of locally executed tools.
I'd like to see support for executing arbitrary notebook code (not just Spark calls) on remote Databricks clusters. This would allow local developers to seamlessly take advantage of Databricks compute for heavy, non-Spark workflows (model training for example).
Two approaches come to mind:
Command Execution API
The Databricks Power Tools extension solves this by using the Command Execution API.
I don't know Rust, but as far as I can tell this article Connecting Jupyter with Databricks aims to wrap the API with a local Jupyter kernel (which would allow connections to any Jupyter client).
SSH
This seems the most straightforward in terms of net new code required. Also seems identical to the deprecated (for security purposes?) jupyterlab-integration.
The text was updated successfully, but these errors were encountered: