README: Add sparkmagic instructions

swan-cern · Apr 13, 2022 · 0678141 · 0678141
1 parent f49a1f7
commit 0678141
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -22,11 +22,11 @@ SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live mo
 ## Requirements
 
 -   Jupyter Lab 3 OR Jupyter Notebook 4.4.0 or higher
--   pyspark 2 or 3
+-   Local pyspark 2/3 or [sparkmagic](https://github.com/jupyter-incubator/sparkmagic) to connect to a remote spark instance
 
 ## Features
 
--   Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
+-   Automatically displays a live monitoring tool below cells that run Spark jobs
 -   A table of jobs and stages with progressbars
 -   A timeline which shows jobs, stages, and tasks
 -   A graph showing number of active tasks & executor cores vs time
@@ -57,6 +57,8 @@ jupyter nbextension enable  sparkmonitor --py
 # The jupyterlab extension is automatically enabled
 ```
 
+### Connecting to a local spark instance
+
 With the extension installed, a `SparkConf` object called `conf` will be usable from your notebooks. You can use it as follows:
 
 ```python
@@ -75,6 +77,21 @@ spark = SparkSession.builder\
         .getOrCreate()
 ```
 
+### Connecting to a remote spark instance via sparkmagic
+
+First setup sparkmagic & verify everything is working fine.
+
+Then copy the required jar file to the remote host & set
+`spark.driver.extraClassPath` & `spark.extraListeners` as above.
+
+Then set `SPARKMONITOR_SERVER_PORT` environment variable for the jupyter instance
+
+Then you'll need to specify 2 more environment variables for the spark instance:
+- `SPARKMONITOR_KERNEL_HOST` corresponding to the host IP of the jupyter instance
+- `SPARKMONITOR_KERNEL_PORT` = `SPARKMONITOR_SERVER_PORT`
+
+For yarn, you may use [spark.yarn.appMasterEnv](https://spark.apache.org/docs/2.4.6/running-on-yarn.html#spark-properties) to set the variables
+
 ## Development
 
 If you'd like to develop the extension:
@@ -114,4 +131,4 @@ This repository is published to pypi as [sparkmonitor](https://pypi.org/project/
 
 - 2.x see the [github releases page](https://github.com/swan-cern/sparkmonitor/releases) of this repository
 
-- 1.x and below were published from [swan-cern/jupyter-extensions](https://github.com/swan-cern/jupyter-extensions) and some initial versions from [krishnan-r/sparkmonitor](https://github.com/krishnan-r/sparkmonitor)
+- 1.x and below were published from [swan-cern/jupyter-extensions](https://github.com/swan-cern/jupyter-extensions) and some initial versions from [krishnan-r/sparkmonitor](https://github.com/krishnan-r/sparkmonitor)
diff --git a/sparkmonitor/kernelextension.py b/sparkmonitor/kernelextension.py
@@ -199,6 +199,8 @@ def load_ipython_extension(ipython):
                     'swan_spark_conf': conf,  # For backward compatibility with fork
                 }
             )  # Add to users namespace
+    elif not port:
+        logger.warning("Could not import spark. Please see README")
 
 
 def configure(conf):