-
Notifications
You must be signed in to change notification settings - Fork 12
Compiling MsPASS from source code
Building a local copy on your local machine is recommended only if you are planning a serious development effort that will extend the package. In particular, if you have an addition or change for the git repository working with a local copy is fairly essential.
If the additions/changes you are planning are purely python we suggest you consider an alternative before undertaking the full build. We supply a development version of the standard docker container that you can obtain by running the following command:
docker pull mspass/mspass:dev
This container is similar to the standard run container with two main additions. First, the container has a number of standard debugging tools including gdb and pdb. Secondly, the C++ code in this container is build in debug mode so debugging the C++ code with gdb using line numbers and symbols is possible.
Anyone planning serious development with MsPASS aiming to extend the package will find a local copy useful for testing and debugging. Cases of particular note that would benefit from building a local copy are:
- If you want to use tools that are not part of the standard development docker container you will definitely find building a local copy helpful. For example, if you use any IDE for python development you will likely find it faster to prototype new algorithms by building a local copy.
- If you are aiming to adapt a set of legacy code in a compiled language (i.e. C/C++ or FORTRAN) working with a local copy will almost certainly be advantageous.
If you are working on a new algorithm to be used with MsPASS we recommend you do initial prototyping of the algorithm as a python function. Best practice is to design a test script to drive the function initially without the baggage of interacting with MongoDB and one of the parallel schedulers (spark or dask).
When you are confident your function is stable or when the function needs to utilize MongoDB you should consider a stage of testing without using the parallel schedulers. Unless you are a MongoDB expert, for local development we recommend local utilizing an instance of MongoDB running in the MsPASS container with docker. Instructions for launching the MsPASS container and docker are found here. We note that most will want to run MongoDB mounting a local file system to contain the data files maintained by MongoDB. Use this incantation found in the URL noted above:
docker run --name MsPASS -d -p 27017:27017 --mount src=`pwd`,target=/home,type=bind mspass/mspass
where details will vary with your situation - see the above URL link for context. The key point is you can use that step to avoid the painful process of installing MongoDB on your system. We also emphasize MongoDB can be run from the container for both local serial and parallel jobs.
MsPASS assumes it will be running under some flavor of Unix. This includes macOS and various flavors of Unix running on all current-generation HPC systems. This may work on newer versions of Windows that coexist with Ubuntu, but at this writing, it is unknown if that is feasible.
You will need to be sure the system has the following elements installed before trying to install MsPASS from source code:
- A C++ compiler that supports the C++17 and higher standards. Unless you have something really ancient this C++17 standard should automatically be supported. On linux systems that will not be a problem if you use any reasonably recent version of the gcc compiler suite. For MacOS there are complexities discussed in the following section.
- We use some open source packages that have components written in FORTRAN. For that reason you will need to have a FORTRAN compiler installed on your local system. That is another reason MacOS installs are more complicated - Apple clang does not appear to support FORTRAN in newer releases.
- You will need the cross-platform build system called cmake. If you do not have it installed already, it can be downloaded here There are binary packages available for most flavors of modern unix and MacOS In the worst case it can be built from source code. On macOS, I (glp) found it necessary to launch the CMake MacOS "Application" icon, which is a GUI front end for CMake on MacOS. Follow the instructions found by clicking on the Menu item Tools->How to Install for Command Line Use. If all goes well that process will add directory where the cmake command line tool is installed to your shell path.
- You will need to be running a version of python 3. MsPASS will not work with a python 2 interpreter. We suggest anaconda as it includes prebuilt versions of most scientific libraries, but we recommend you still use pip3 as the package manager even if you install anaconda. In our experience, the "individual" version is sufficient for development work on a local copy of MsPASS.
If you don't already have "homebrew" installed do so. There are many sources for how to do that on the web for standard mac machines. Once you have homebrew installed you need only execute the following commands to install the latest gcc compiler suite:
brew update
brew upgrade
brew info gcc
brew install gcc
brew cleanup
which should install the full gcc compiler suite, including gfortran, on your system.
The newest Apple computers have an optional ARM64 processor. If you aren't sure check the "About this Mac" entry under the Apple icon and if it says "Chip Apple M1" you will have some complexities to deal with. At the time of this writing (Sept 2021) the procedure here worked, but some of this complexity is likely to disappear as more ARM64 machines enter the pipeline.
As in the intel processor case the problem is that the standard Xcode compiler, clang, does not play well with any common fortran compiler. Hence, we need install the gcc compiler suite including gfortran. The main complication added to above is that the "homebrew (brew)" command line tool has to be set up as described here. Then you can run the following (modified) version of the procedure describe in this article.:
arch -x86_64 brew update
arch -x86_64 brew upgrade
arch -x86_64 brew info gcc
arch -x86_64 brew install gcc
arch -x86_64 brew cleanup
That should install the full gcc compiler suite including gfortran.
If you haven't already done so, download the source code for mspass from Github. In a terminal window running a Unix shell cd to the directory where you plan to install mspass. Then enter the standard command:
git clone [email protected]:mspass-team/mspass.git
which will give you a working copy of the repository. In the future, we expect to have formal releases and this procedure will change.
MsPASS uses Spark/Dask to process data in parallel, which means if you are running a local copy, you need to set up Spark and/or Dask if you are going to be do a local test with the parallel schedulers. For the installation, we recommend you follow the instructions on the official website. Since we are using Python to drive the workflow, PySpark is the one we will be using in MsPASS.
For PySpark installation, you could refer to this page. It includes instructions for installing PySpark by using pip, conda, downloading manually, and building from the source. We do not currently recommend using anaconda with macos as we have had issues getting the components to play together.
For Dask installation, it’s pretty much the same as installing PySpark and here is the link you could refer to. Actually, if you are using pip to install Dask, you could ignore this step because it will be downloaded in the following section, which installs all the dependencies we rely on in MsPASS through the requirements list.
Those installations procedures should cause no more problem than a typical python package install. That is, they usually work but if you have worked with python you will be familiar with the issue of incompatible modules by different already installed packages. For that reason, we strongly urge you to install pyspark and/or dask with the "-user" option of pip. In addition, you might need to configure some files or parameters in your machines before you can run with MsPASS. For example, users who don’t have localhost defined in /etc/hosts in their machines should set the SPARK_LOCAL_IP environment variable to localhost or whatever your hostname is. You might also need to set other environment variables like SPARK_HOME if it is not set properly during installation.
Feel free to report an issue if you are unable to resolve to the issue section of github for this package.
MsPASS is entirely based on open-source packages so the setup process from source code will not encounter any licensing issues. Also, we rely on some widely used python libraries in data processing and engineering in seismology like numpy and obspy. A convenient way to see the full dependencies is in the file requirements.txt found at the root of the MsPASS source tree.
Rather than install the required packages one by one, users can simply run the following command by pip at the root of the MsPASS directory
pip3 install -r requirements.txt
which will recursively install libraries specified and also the dependencies.
For users using Anaconda, make sure you are running in the right virtual environment and then enter the following similar command:
conda install --file requirements.txt
In MsPASS, we are using pybind11 to bind the C++ code and APIs so that we could access properties and call functions in python. MsPASS uses the pybind11 package to bind C++ or C code for use by the python interpreter. For the present all C/C++ code is bound to a single module we call mspasspy.ccore. For those who are interested in how pybind11 works in MsPASS, this particular example illustrates how we adapt an existing C++ algorithm to the MsPASS framework.
Here is a figure that could show the structure of where the C/C++ code lives relative to python code and pybind11 binding code more concisely and clearly.
For installation, we recommend you follow the instructions on the pybind11 wiki page here
In MsPASS, we are using some environment variable to check or validate the path. For example in SchemaBase class and some pybind11 bundle code, we check if the environment variable MSPASS_HOME
exists or not to correctly read the YAML file. What you need to do is to run the following command:
export MSPASS_HOME=/mspass
Here /mspass is an example path. In your case, you should replace it with the absolute path where you download MsPASS through git.
First, the user should recognize that we use cmake as a configuration tool. You will use the traditional Unix make utility to drive compilation and linking needed to build MsPASS from the source code you just downloaded. We use a common organization for larger packages in which we keep the source code isolated from the binaries. We recommend calling the top of this tree "build" and placing it at the top of the mspass source code tree. Specifically, assuming your shell is at the top of the mspass tree (i.e. in the mspass directory) type the following into your terminal window:
cd cxx
mkdir build
cd build
cmake ..
It's essential that you need to create the build
directory or some other convenient name to run cmake.
The last command should bring up a curses based terminal window form. On the first entry type c in the window to do an initial configuration. When that completes, you may want to edit some of the configuration parameters, but you will almost certainly want to change CMAKE_INSTALL_PREFIX to an appropriate location for your system. When finished type c in the window. You will then see the output of cmake as it checks for required features and builds Unix makefiles. If it fails you will need to do the usual sorting through the error log to troubleshoot the problem. When cmake run successfully, in the same top-level "build" directory type run
make
and wait for compilation to complete. As always in compiling a package this may require some troubleshooting if anything fails.
First of all, we need to explain a bit about CMake. Cmake is a tool for managing the build process of software. In other words, it prepares a list of commands to be performed to generate the executable. Under Linux, we usually use CMake to generate a GNU make file which then uses gcc/g++ to compile the C++ source code and create the executable. For those who are newbies to CMake, we recommend that you could start by reading the documentation of CMake first.
Then what are .so files? A file with .so file extension is a shared library file. It contains compiled code that can be linked to a program at run-time. It is the Linux equivalent of a Windows DLL(dynamic link library).
In MsPASS, we use CMake to compile and bundle into .so files so that other programs could link the shared library and call the methods inside. Here is a figure showing how MsPASS organizes the CMake files.
- compiler creates .o files in the build directory
- the .o files are packed with ar into static library files
- the .so files are created by linking against the binding code and the .a libraries.
Some issues may happen to users: