Skip to content

Compiling MsPASS from source code

Gary Pavlis edited this page Sep 28, 2021 · 53 revisions

Preliminaries

When you might want to build a local copy

Anyone planning serious development with MsPASS aiming to extend the package will find a local copy useful for testing and debugging.
Cases of particular note that would benefit from building a local copy are:

  1. If you want to use tools that are not part of the standard development docker container you will definitely find building local copy helpful. For example, if you use any IDE for python development you will likely find it faster to prototype new algorithms by building a local copy.
  2. If you are aiming to adapt a set of legacy code in a compiled language (i.e. C/C++ or FORTRAN) working with a local copy will almost certainly be advantageous.

Limitations

These instructions here are aimed primarily at building a local copy that will not use the parallel schedulers (i.e. Spark and DASK). In general, we recommend developing prototype processing functions that can be reduced to a python function call that implements the algorithm. We show in related documentation how to parallelize any such function with Spark or Dask and to use MsPASS decorators to add additional MsPASS features.

Unless you are a MongoDB expert we recommend local development utilize an instance of MongoDB running in the MsPASS container with docker. Instructions for launching the MsPASS container and docker are found here. We note that most will want to run MongoDB mounting a local file system to contain the data files maintained by MongoDB. Use this incantation found in the URL noted above:

docker run --name MsPASS -d -p 27017:27017 --mount src=`pwd`,target=/home,type=bind wangyinz/mspass

where details will vary with your situation - see the above URL link for context. The key point is you can use that step to avoid the painful process of installing MongoDB on your system.

Requirements

MsPASS assumes it will be running under some flavor of Unix. This includes macOS and various flavors of Unix running on all current-generation HPC systems. This may work on newer versions of Windows that coexist with Ubuntu, but at this writing, it is unknown if that is feasible.

You will to be sure the system has two things installed before trying to install MsPASS from source code:

  1. A C++ compiler that supports the C++17 and higher standards. Unless you have something really ancient this C++17 standard should automatically be supported. On linux systems you will be fine if you use any reasonably recent version of the gcc compiler suite. For MacOS there are complexities discussed in the following section.
  2. We use some open source packages that have components written in FORTRAN. For that reason you will need to have a FORTRAN compiler installed on your local system. That is another reason MacOS installs are more complicated - Apple does not appear to support FORTRAN in newer releases.
  3. You will need the cross-platform build system called cmake. If you do not have it installed already, Here is an open-source package available here There are binary packages available for most flavors of modern unix and MacOS In the worst case it can be built from source code. On macOS, I (glp) found it necessary to launch the CMake MacOS "Application, which is a GUI front end for CMake on MacOS. Follow the instructions found by clicking on the Menu item Tools->How to Install for Command Line Use.
  4. You will need to be running a version of python 3. MsPASS will not work with a python 2 interpreter. We suggest anaconda as it includes prebuilt versions of most scientific libraries. In our experience, the "individual" version is sufficient for development work on a local copy of MsPASS.

Installing gcc on MacOS

Intel Processor Machines

TODO - fill this in later when I do this process on an older machine.

Installing gcc on ARM64 Mac machines

The newest Apple computers have an optional ARM64 processor. If you aren't sure check the "About this Mac" entry under the Apple icon and if it says "Chip Apple M1" you will have some complexities to deal with. At the time of this writing (Sept 2021) the procedure here worked, but some of this complexity is likely to disappear as the more ARM64 machines enter the pipeline.

As in the intel processor case the problem is that the standard Xcode compiler, clang, does not play well with any common fortran compiler. Hence, we need install the gcc compiler suite including gfortran. The main complication added to above is that the "homebrew (brew)" command line tool has to be set up as described here. Then you can run the following (modified) version of the procedure describe in this article.:

arch -x86_64 brew update
arch -x86_64 brew update
arch -x86_64 brew upgrade
arch -x86_64 brew info gcc
arch -x86_64 brew install gcc
arch -x86_64 brew cleanup

That should install the full gcc compiler suite including gfortran.

Download mspass

If you haven't already done so, download the source code for mspass from Github. In a terminal window running a Unix shell cd to the directory where you plan to install mspass. Then enter the standard command:

git clone [email protected]:mspass-team/mspass.git

which will give you a working copy of the repository. In the future, we expect to have formal releases and this procedure will change.

Spark & Dask Installation

MsPASS uses Spark/Dask to process data in parallel, which means if you are running a local copy, you need to set up Spark or Dask. For the installation, we recommend you follow the instructions on the official website. Since we are using Python to drive the workflow, PySpark is the one we will be using in MsPASS.

For PySpark installation, you could refer to this page. It includes instructions for installing PySpark by using pip, conda, downloading manually, and building from the source.

For Dask installation, it’s pretty much the same as installing PySpark and here is the link you could refer to.

Normally, installations would not be a trouble for most users if you follow the instructions. However, you might need to configure some files or parameters in your machines before you could run with MsPASS. For example, users who don’t have localhost defined in /etc/hosts in their machines should set the SPARK_LOCAL_IP environment variable to localhost or whatever your hostname is. You might also need to set other environment variables like SPARK_HOME if it is not set properly during installation.

Feel free to report an issue if you are unable to resolve it besides above mentioned.

Python Dependencies Installation

In the implementation of MsPASS, we make good use of some open-source libraries in python so that we don’t have to reinvent the wheels. Also, we rely on some widely used python libraries in data processing and engineering in seismology like numpy and obspy. These packages should be installed through pip beforehand. Users could find these packages in the requirements.txt file.

For installation, users could simply run the following command by pip at the root of the MsPASS directory

pip install -r requirements.txt

which will recursively install libraries specified and also the dependencies. For users using Anaconda, they could simply replace pip with Anaconda. We suggest anaconda as it includes prebuilt versions of most scientific libraries.

pybind11 usage and installation

In MsPASS, we are using pybind11 to bind the C++ code and APIs so that we could access properties and call functions in python. MsPASS uses the pybind11 package to bind C++ or C code for use by the python interpreter. For the present all C/C++ code is bound to a single module we call mspasspy.ccore. For those who are interested in how pybind11 works in MsPASS, this particular example illustrates how we adapt an existing C++ algorithm to the MsPASS framework.

Here is a figure that could show the structure of where the C/C++ code lives relative to python code and pybind11 binding code more concisely and clearly.

pybind11

For installation, we recommend you follow the instructions on the pybind11 wiki page here

Add environment variables

In MsPASS, we are using some environment variable to check or validate the path. For example in SchemaBase class and some pybind11 bundle code, we check if the environment variable MSPASS_HOME exists or not to correctly read the YAML file. What you need to do is to run the following command:

export MSPASS_HOME=/mspass

Here /mspass is an example path. In your case, you should replace it with the absolute path where you download MsPASS through git.

Running cmake

First, the user should recognize that we use cmake as a configuration tool. You will use the traditional Unix make utility to drive compilation and linking needed to build MsPASS from the source code you just downloaded. We use a common organization for larger packages in which we keep the source code isolated from the binaries. We recommend calling the top of this tree "build" and placing it at the top of the mspass source code tree. Specifically, assuming your shell is at the top of the mspass tree (i.e. in the mspass directory) type the following into your terminal window:

cd cxx
mkdir build
cd build
cmake ..

It's essential that you need to create the build directory or some other convenient name to run cmake.

The last command should bring up a curses based terminal window form. On the first entry type c in the window to do an initial configuration. When that completes, you may want to edit some of the configuration parameters, but you will almost certainly want to change CMAKE_INSTALL_PREFIX to an appropriate location for your system. When finished type c in the window. You will then see the output of cmake as it checks for required features and builds Unix makefiles. If it fails you will need to do the usual sorting through the error log to troubleshoot the problem. When cmake run successfully, in the same top-level "build" directory type run

make

and wait for compilation to complete. As always in compiling a package this may require some troubleshooting if anything fails.

How MsPASS compile C/C++ code into .so files[TODO]

  1. compiler creates .o files in the build directory
  2. the .o files are packed with ar into static library files
  3. the .so files are created by linking against the binding code and the .a libraries.

Troubleshooting

Some issues may happen to users: