Skip to content

Commit

Permalink
README: Add sparkmagic instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
utkarshgupta137 committed Apr 13, 2022
1 parent f49a1f7 commit 0678141
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 3 deletions.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live mo
## Requirements

- Jupyter Lab 3 OR Jupyter Notebook 4.4.0 or higher
- pyspark 2 or 3
- Local pyspark 2/3 or [sparkmagic](https://github.com/jupyter-incubator/sparkmagic) to connect to a remote spark instance

## Features

- Automatically displays a live monitoring tool below cells that run Spark jobs in a Jupyter notebook
- Automatically displays a live monitoring tool below cells that run Spark jobs
- A table of jobs and stages with progressbars
- A timeline which shows jobs, stages, and tasks
- A graph showing number of active tasks & executor cores vs time
Expand Down Expand Up @@ -57,6 +57,8 @@ jupyter nbextension enable sparkmonitor --py
# The jupyterlab extension is automatically enabled
```

### Connecting to a local spark instance

With the extension installed, a `SparkConf` object called `conf` will be usable from your notebooks. You can use it as follows:

```python
Expand All @@ -75,6 +77,21 @@ spark = SparkSession.builder\
.getOrCreate()
```

### Connecting to a remote spark instance via sparkmagic

First setup sparkmagic & verify everything is working fine.

Then copy the required jar file to the remote host & set
`spark.driver.extraClassPath` & `spark.extraListeners` as above.

Then set `SPARKMONITOR_SERVER_PORT` environment variable for the jupyter instance

Then you'll need to specify 2 more environment variables for the spark instance:
- `SPARKMONITOR_KERNEL_HOST` corresponding to the host IP of the jupyter instance
- `SPARKMONITOR_KERNEL_PORT` = `SPARKMONITOR_SERVER_PORT`

For yarn, you may use [spark.yarn.appMasterEnv](https://spark.apache.org/docs/2.4.6/running-on-yarn.html#spark-properties) to set the variables

## Development

If you'd like to develop the extension:
Expand Down Expand Up @@ -114,4 +131,4 @@ This repository is published to pypi as [sparkmonitor](https://pypi.org/project/

- 2.x see the [github releases page](https://github.com/swan-cern/sparkmonitor/releases) of this repository

- 1.x and below were published from [swan-cern/jupyter-extensions](https://github.com/swan-cern/jupyter-extensions) and some initial versions from [krishnan-r/sparkmonitor](https://github.com/krishnan-r/sparkmonitor)
- 1.x and below were published from [swan-cern/jupyter-extensions](https://github.com/swan-cern/jupyter-extensions) and some initial versions from [krishnan-r/sparkmonitor](https://github.com/krishnan-r/sparkmonitor)
2 changes: 2 additions & 0 deletions sparkmonitor/kernelextension.py
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,8 @@ def load_ipython_extension(ipython):
'swan_spark_conf': conf, # For backward compatibility with fork
}
) # Add to users namespace
elif not port:
logger.warning("Could not import spark. Please see README")


def configure(conf):
Expand Down

0 comments on commit 0678141

Please sign in to comment.