-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactive docker image #709
base: main
Are you sure you want to change the base?
Changes from all commits
d2d621a
13b4283
f805a97
53c4c54
3e2de92
7f85c97
873fa34
d00509b
afd854e
33dc9f4
dbef269
7fb707f
3e434b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
FROM mcr.microsoft.com/dotnet-spark:2.4.6-0.12.1-interactive |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# .NET for Apache Spark Interactive | ||
|
||
This interactive notebook contains allows you to explore .NET for Apache Spark in your web-browser. | ||
|
||
To launch it, just click the button below: | ||
|
||
[![Binder](./dotnet-spark-binder.svg)](https://mybinder.org/v2/gh/indy-3rdman/spark/docker_images_init?urlpath=lab/tree/nb/) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# .NET for Apache Spark interactive Docker image | ||
|
||
## Description | ||
|
||
This directory contains the source code to build a docker interactive image by using [jupyter/base-notebook](https://hub.docker.com/r/jupyter/base-notebook) as foundation. | ||
|
||
## Building | ||
|
||
To build the image, just execute the [build.sh](build.sh) bash script. Per default it should build an image using the latest supported versions of .NET Core, Apache Spark and .NET for Apache Spark. | ||
|
||
You can also build for different versions, by specifying one of the following options: | ||
|
||
```bash | ||
-a, --apache-spark | ||
-d, --dotnet-spark | ||
``` | ||
|
||
For more details please run | ||
|
||
```bash | ||
build.sh -h | ||
``` | ||
|
||
Please note, that not all version combinations are supported, however. | ||
|
||
## The image build stages | ||
|
||
Using different stages makes sense to efficiently build multiple images that are based on the same .NET core SDK etc, but are using different .NET for Apache Spark or Apache Spark versions. | ||
In that way, dependencies (e.g. .NET Core SDK) do not have to be downloaded again and again, while building an image for a different version. This saves time and bandwidth. | ||
|
||
The three stages used in the build process are: | ||
|
||
- ### **dotnet-interactive** | ||
|
||
Builds on the jupyter/base-notebook image and installs the .NET Core SDK, along with Microsoft.DotNet.Interactive. | ||
|
||
- ### **dotnet-spark-base (interactive)** | ||
|
||
Adds the specified .NET for Apache Spark version to the dotnet-interactive image and also copies/builds the HelloSpark example into the image. HelloSpark is also use to install the correct microsoft-spark-*.jar version that is required to start a spark-submit session in debug mode. | ||
|
||
- ### **dotnet-spark (interactive)** | ||
|
||
Gets/installs the specified Apache Spark version and adds the example notebooks. | ||
|
||
## Docker Run Example | ||
|
||
To start a new container based on the dotnet-spark interactive image, just run the following command. | ||
|
||
```bash | ||
docker run --name dotnet-spark-interactive -d -p 8888:8888 3rdman/dotnet-spark:interactive-latest | ||
``` | ||
|
||
After that, examine the logs of the container to get the correct URL that is required to connect to Juypter using the authentication token. | ||
|
||
```bash | ||
docker logs -f dotnet-spark-interactive | ||
``` | ||
|
||
![launch](img/dotnet-interactive-docker-launch.gif) | ||
|
||
It is important to start the .NET for Apache Spark backend in debug mode first, before using it in any of the notebooks. | ||
|
||
The helper script start-spark-debug.sh can do this for you, as demonstrated below. | ||
|
||
![debug](img/dotnet-interactive-start-debug.gif) | ||
|
||
Once the backend is running, please open 02-basic-example.ipynb to learn how you can use .NET for Apache Spark in your own notebooks. | ||
|
||
![example](img/dotnet-interactive-basic-example.gif) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,251 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Create different versions of the .NET for Apache Spark interactive docker image | ||
# based on the Apach Spark and .NET for Apache Spark version. | ||
|
||
set -o errexit # abort on nonzero exitstatus | ||
set -o nounset # abort on unbound variable | ||
set -o pipefail # don't hide errors within pipes | ||
|
||
readonly image_repository='3rdman' | ||
readonly supported_apache_spark_versions=( | ||
"2.3.0" "2.3.1" "2.3.2" "2.3.3" "2.3.4" | ||
"2.4.0" "2.4.1" "2.4.3" "2.4.4" "2.4.5" "2.4.6" "2.4.7" | ||
"3.0.0" "3.0.1" | ||
) | ||
readonly supported_dotnet_spark_versions=("1.0.0") | ||
readonly dotnet_core_version=3.1 | ||
|
||
dotnet_spark_version=1.0.0 | ||
dotnet_spark_jar="" | ||
apache_spark_version=3.0.1 | ||
apache_spark_short_version="${apache_spark_version:0:3}" | ||
|
||
main() { | ||
# Parse the options an set the related variables | ||
while [[ "$#" -gt 0 ]]; do | ||
case $1 in | ||
-a|--apache-spark) opt_check_apache_spark_version "$2"; shift ;; | ||
-d|--dotnet-spark) opt_check_dotnet_spark_version "$2"; shift ;; | ||
-h|--help) print_help | ||
exit 1 ;; | ||
*) echo "Unknown parameter passed: $1"; exit 1 ;; | ||
esac | ||
shift | ||
done | ||
|
||
echo "Building .NET for Apache Spark ${dotnet_spark_version} runtime image with Apache Spark ${apache_spark_version}" | ||
|
||
# execute the different build stages | ||
cleanup | ||
|
||
set_dotnet_spark_jar | ||
build_dotnet_interactive | ||
build_dotnet_spark_base_interactive | ||
build_dotnet_spark_interactive | ||
|
||
trap finish EXIT ERR | ||
|
||
exit 0 | ||
} | ||
|
||
####################################### | ||
# Checks if the provided Apache Spark version number is supported | ||
# Arguments: | ||
# The version number string | ||
# Result: | ||
# Sets the global variable apache_spark_version if supported, | ||
# otherwise exits with a related message | ||
####################################### | ||
opt_check_apache_spark_version() { | ||
local provided_version="${1}" | ||
local valid_version="" | ||
|
||
for value in "${supported_apache_spark_versions[@]}" | ||
do | ||
[[ "${provided_version}" = "$value" ]] && valid_version="${provided_version}" | ||
done | ||
|
||
if [ -z "${valid_version}" ] | ||
then | ||
echo "${provided_version} is an unsupported Apache Spark version." | ||
exit 1 ; | ||
else | ||
apache_spark_version="${valid_version}" | ||
apache_spark_short_version="${apache_spark_version:0:3}" | ||
fi | ||
} | ||
|
||
####################################### | ||
# Checks if the provided .NET for Apache Spark version number is supported | ||
# Arguments: | ||
# The version number string | ||
# Result: | ||
# Sets the global variable dotnet_spark_version if supported, | ||
# otherwise exits with a related message | ||
####################################### | ||
opt_check_dotnet_spark_version() { | ||
local provided_version="${1}" | ||
local valid_version="" | ||
|
||
for value in "${supported_dotnet_spark_versions[@]}" | ||
do | ||
[[ "${provided_version}" = "$value" ]] && valid_version="${provided_version}" | ||
done | ||
|
||
if [ -z "${valid_version}" ] | ||
then | ||
echo "${provided_version} is an unsupported .NET for Apache Spark version." | ||
exit 1 ; | ||
else | ||
dotnet_spark_version="${valid_version}" | ||
fi | ||
} | ||
|
||
####################################### | ||
# Replaces every occurence of search_string by replacement_string in a file | ||
# Arguments: | ||
# The file name | ||
# The string to search for | ||
# The string to replace the search string with | ||
# Result: | ||
# An updated file with the replaced string | ||
####################################### | ||
replace_text_in_file() { | ||
local filename=${1} | ||
local search_string=${2} | ||
local replacement_string=${3} | ||
|
||
sh -c 'sed -i.bak "s/$1/$2/g" "$3" && rm "$3.bak"' _ "${search_string}" "${replacement_string}" "${filename}" | ||
} | ||
|
||
####################################### | ||
# Sets the microsoft-spark JAR name based on the Apache Spark version | ||
####################################### | ||
set_dotnet_spark_jar() { | ||
local scala_version="2.11" | ||
local short_spark_version="${apache_spark_short_version//./-}" | ||
|
||
case "${apache_spark_version:0:1}" in | ||
2) scala_version=2.11 ;; | ||
3) scala_version=2.12 ;; | ||
esac | ||
|
||
dotnet_spark_jar="microsoft-spark-${short_spark_version}_${scala_version}-${dotnet_spark_version}.jar" | ||
} | ||
|
||
####################################### | ||
# Runs the docker build command with the related build arguments | ||
# Arguments: | ||
# The image name (incl. tag) | ||
# Result: | ||
# A local docker image with the specified name | ||
####################################### | ||
build_image() { | ||
local image_name="${1}" | ||
local build_args="--build-arg dotnet_core_version=${dotnet_core_version} | ||
--build-arg dotnet_spark_version=${dotnet_spark_version} | ||
--build-arg SPARK_VERSION=${apache_spark_version} | ||
--build-arg DOTNET_SPARK_JAR=${dotnet_spark_jar}" | ||
local cmd="docker build ${build_args} -t ${image_name} ." | ||
|
||
echo "Building ${image_name}" | ||
|
||
${cmd} | ||
} | ||
|
||
####################################### | ||
# Use the Dockerfile in the sub-folder dotnet-interactive to build the image of the first stage | ||
# Result: | ||
# A dotnet-interactive docker image tagged with the .NET core version | ||
####################################### | ||
build_dotnet_interactive() { | ||
local image_name="dotnet-interactive:${dotnet_core_version}" | ||
|
||
cd dotnet-interactive | ||
build_image "${image_name}" | ||
cd ~- | ||
} | ||
|
||
####################################### | ||
# Use the Dockerfile in the sub-folder dotnet-spark-base to build the image of the second stage | ||
# The image contains the specified .NET for Apache Spark version | ||
# Result: | ||
# A dotnet-spark-base-interactive docker image tagged with the .NET for Apache Spark version | ||
####################################### | ||
build_dotnet_spark_base_interactive() { | ||
local image_name="dotnet-spark-base-interactive:${dotnet_spark_version}" | ||
|
||
cd dotnet-spark-base | ||
build_image "${image_name}" | ||
cd ~- | ||
} | ||
|
||
####################################### | ||
# Use the Dockerfile in the sub-folder dotnet-spark to build the image of the last stage | ||
# The image contains the specified Apache Spark version | ||
# Result: | ||
# A dotnet-spark docker image tagged with the .NET for Apache Spark version, Apache Spark version and the suffix -interactive | ||
####################################### | ||
build_dotnet_spark_interactive() { | ||
local image_name="${image_repository}/dotnet-spark:${dotnet_spark_version}-${apache_spark_version}-interactive" | ||
|
||
cd dotnet-spark | ||
cp --recursive templates/scripts ./bin | ||
cp --recursive templates/HelloSpark ./HelloSpark | ||
|
||
replace_text_in_file HelloSpark/HelloSpark.csproj "<TargetFramework><\/TargetFramework>" "<TargetFramework>netcoreapp${dotnet_core_version}<\/TargetFramework>" | ||
replace_text_in_file HelloSpark/HelloSpark.csproj "PackageReference Include=\"Microsoft.Spark\" Version=\"\"" "PackageReference Include=\"Microsoft.Spark\" Version=\"${dotnet_spark_version}\"" | ||
|
||
replace_text_in_file HelloSpark/README.txt "netcoreappX.X" "netcoreapp${dotnet_core_version}" | ||
replace_text_in_file HelloSpark/README.txt "spark-X.X.X" "spark-${apache_spark_short_version}.x" | ||
replace_text_in_file HelloSpark/README.txt "microsoft-spark-${apache_spark_short_version}.x-X.X.X.jar" "${dotnet_spark_jar}" | ||
|
||
replace_text_in_file bin/start-spark-debug.sh "microsoft-spark-X.X.X.jar" "${dotnet_spark_jar}" | ||
|
||
replace_text_in_file 02-basic-example.ipynb "nuget: Microsoft.Spark,X.X.X" "${dotnet_spark_version}" | ||
|
||
build_image "${image_name}" | ||
cd ~- | ||
} | ||
|
||
####################################### | ||
# Remove the temporary folders created during the different build stages | ||
####################################### | ||
cleanup() | ||
{ | ||
cd dotnet-spark | ||
rm --recursive --force bin | ||
rm --recursive --force HelloSpark | ||
cd ~- | ||
} | ||
|
||
finish() | ||
{ | ||
result=$? | ||
cleanup | ||
exit ${result} | ||
} | ||
|
||
####################################### | ||
# Display the help text | ||
####################################### | ||
print_help() { | ||
cat <<HELPMSG | ||
Usage: build.sh [OPTIONS]" | ||
|
||
Builds a .NET for Apache Spark interactive docker image | ||
|
||
Options: | ||
-a, --apache-spark A supported Apache Spark version to be used within the image | ||
-d, --dotnet-spark The .NET for Apache Spark version to be used within the image | ||
-h, --help Show this usage help | ||
|
||
If -a or -d is not defined, default values are used | ||
|
||
Apache Spark: $apache_spark_version | ||
.NET for Apache Spark: $dotnet_spark_version | ||
HELPMSG | ||
} | ||
|
||
main "${@}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
FROM jupyter/base-notebook:ubuntu-18.04 | ||
LABEL maintainer="Martin Kandlbinder <[email protected]>" | ||
|
||
ARG DOTNET_CORE_VERSION=3.1 | ||
ENV DOTNET_CORE_VERSION=$DOTNET_CORE_VERSION \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per the Dockerfile Best Practices, sort multi-line instructions to improve readability where possible (e.g. cross dependencies) |
||
DOTNET_RUNNING_IN_CONTAINER=true \ | ||
DOTNET_USE_POLLING_FILE_WATCHER=true \ | ||
NUGET_XMLDOC_MODE=skip \ | ||
PATH="${PATH}:${HOME}/.dotnet/tools" | ||
|
||
USER root | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y --no-install-recommends \ | ||
apt-utils \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is requiring all of these native dependencies? Several are already provided by the base image so they don't seem necessary to declare. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be cleaned up now. Java obviously is required by spark. |
||
dialog \ | ||
libgssapi-krb5-2 \ | ||
libicu60 \ | ||
openjdk-8-jdk \ | ||
software-properties-common \ | ||
unzip \ | ||
&& wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb \ | ||
&& dpkg -i packages-microsoft-prod.deb \ | ||
&& add-apt-repository universe \ | ||
&& apt-get install -y apt-transport-https \ | ||
&& apt-get update \ | ||
&& apt-get install -y dotnet-sdk-$DOTNET_CORE_VERSION \ | ||
&& apt-get clean && rm -rf /var/lib/apt/lists/* \ | ||
&& rm -rf packages-microsoft-prod.deb | ||
|
||
COPY ./nuget.config ${HOME}/nuget.config | ||
|
||
USER ${NB_USER} | ||
|
||
RUN pip install nteract_on_jupyter \ | ||
&& dotnet tool install -g Microsoft.dotnet-interactive \ | ||
&& dotnet interactive jupyter install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a question for Spark team. Thoughts on how to keep this version list up-to-date and other versions included in this script up-to-date? It feels like there should be long term plans for getting this updated "automatically" as part of the release process. Without this they will become stale and/or be a maintenance burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed