Vagrant template to provision a standalone Spark cluster with lean defaults. This is a great way to set up a Spark cluster on your laptop that can easily be deleted later with no changes to your machine.
- See
Vagrantfile
for details and to make changes. - Spark running as a standalone cluster. Tested with Spark 2.1.x and 2.2.x.
- One head node Ubuntu 16.04 machine and
N
worker (slave) machines. - Spark running in standalone cluster mode.
To spin up your own local Spark cluster, clone this repository first.
Next, download a pre-built Spark package and place it into this directory, named "spark.tgz".
Next, open up Vagrantfile
in a text editor.
You'll want to change the N_WORKERS
variable near the top of the file.
Vagrant will spin up one "head node" and N
worker nodes in a Spark standalone cluster.
Feel free to make other changes, e.g. RAM and CPU for each of the machines.
When you're ready, just run vagrant up
in the directory the Vagrantfile
is in. Wait a few minutes and your Spark cluster will be ready.
SSH in using vagrant ssh hn0
or vagrant ssh wn0
.
You'll also be able to see the Spark WebUI at http://172.28.128.150:8080
.
Shut down the cluster with vagrant halt
and delete it with vagrant destroy
. You can always run vagrant up
to turn on or build a brand new cluster.
To run SparkPi
on the cluster, run the following commands:
vagrant ssh hn0
spark-submit --class org.apache.spark.examples.SparkPi ~/spark/examples/jars/spark-examples_2.11-2.2.1.jar 1000
See the LICENSE.txt file.