-
Notifications
You must be signed in to change notification settings - Fork 12
Compiling MsPASS from source code
Building a local copy on your local machine is recommended only if you are planning a serious development effort that will extend the package. In particular, if you have an addition or change for the git repository working with a local copy is fairly essential.
If the additions/changes you are planning are purely python we suggest you consider an alternative before undertaking the full build. We supply a development version of the standard docker container that you can obtain by running the following command:
docker pull mspass/mspass:dev
This container is similar to the standard run container with two main additions. First, the container has a number of standard debugging tools including gdb and pdb. Secondly, the C++ code in this container is build in debug mode so debugging the C++ code with gdb using line numbers and symbols is possible.
Before proceeding through these instruction it is important to decide if you are going to do this build with anaconda or the stock python installation with the pip
command. You should recognize the mspass container is currently built on anaconda, but uses pip
to manage the assemble the python packages it needs to run. That does NOT, however, mean that that is necessarily the best route for you in a local development setup. In fact, we strongly recommend using conda and setting up an environment for doing any MsPASS developments. The reason is simply that MsPASS has a large number of dependencies and we have found it easy to get inconsistent behavior from unexpected module collisions. Users familiar with python will recognize this as one of python's great weaknesses if not managed correctly. This is why for working with a large package like MsPASS anaconda environments are currently the best way we know to manage the problem. Furthermore, the version of python may matter so it is a good idea to create a bare environment to start that forces use of the same python version as that running in the container. At the moment our default is 3.10 so you should use an incantation similar to the following (changing the numbers if necessary):
conda create -n mspass_py310 python=3.10 anaconda
conda activate mspass_py310
You should then do any python related steps below (including compiling MsPASS) in a terminal where the environment you just created is set with the activate
command.
Anyone planning serious development with MsPASS aiming to extend the package will find a local copy useful for testing and debugging. Cases of particular note that would benefit from building a local copy are:
- If you want to use tools that are not part of the standard development docker container you will definitely find building a local copy helpful. For example, if you use any IDE for python development you will likely find it faster to prototype new algorithms by building a local copy.
- If you are aiming to adapt a set of legacy code in a compiled language (i.e. C/C++ or FORTRAN) working with a local copy will almost certainly be advantageous.
If you are working on a new algorithm to be used with MsPASS we recommend you do initial prototyping of the algorithm as a python function. Best practice is to design a test script to drive the function initially without the baggage of interacting with MongoDB and one of the parallel schedulers (spark or dask).
When you are confident your function is stable or when the function needs to utilize MongoDB you should consider a stage of testing without using the parallel schedulers. Unless you are a MongoDB expert, for local development we recommend local utilizing an instance of MongoDB running in the MsPASS container with docker. Instructions for launching the MsPASS container and docker are found here. We note that most will want to run MongoDB mounting a local file system to contain the data files maintained by MongoDB. Use this incantation found in the URL noted above:
docker run --name MsPASS -d -p 27017:27017 --mount src=`pwd`,target=/home,type=bind mspass/mspass
where details will vary with your situation - see the above URL link for context. The key point is you can use that step to avoid the painful process of installing MongoDB on your system. We also emphasize MongoDB can be run from the container for both local serial and parallel jobs.
MsPASS assumes it will be running under some flavor of Unix. This includes macOS and various flavors of Unix running on all current-generation HPC systems. This may work on newer versions of Windows that coexist with Ubuntu, but at this writing, it is unknown if that is feasible.
You will need to be sure the system has the following elements installed before trying to install MsPASS from source code:
- A C++ compiler that supports the C++17 and higher standards. Unless you have something really ancient this C++17 standard should automatically be supported. On linux systems that will not be a problem if you use any reasonably recent version of the gcc compiler suite. For MacOS there are complexities discussed in the following section.
- We use some open source packages that have components written in FORTRAN. For that reason you will need to have a FORTRAN compiler installed on your local system. That is another reason MacOS installs are more complicated - Apple clang does not appear to support FORTRAN in newer releases.
- You will need the cross-platform build system called cmake. If you do not have it installed already, it can be downloaded here There are binary packages available for most flavors of modern unix and MacOS In the worst case it can be built from source code. On macOS, I (glp) found it necessary to launch the CMake MacOS "Application" icon, which is a GUI front end for CMake on MacOS. Follow the instructions found by clicking on the Menu item Tools->How to Install for Command Line Use. If all goes well that process will add directory where the cmake command line tool is installed to your shell path.
- You will need to be running a version of python 3. MsPASS will not work with a python 2 interpreter. We suggest anaconda as it includes prebuilt versions of most scientific libraries, but we recommend you still use pip3 as the package manager even if you install anaconda. In our experience, the "individual" version is sufficient for development work on a local copy of MsPASS.
If you don't already have "homebrew" installed do so. There are many sources for how to do that on the web for standard mac machines. Once you have homebrew installed you need only execute the following commands to install the latest gcc compiler suite:
brew update
brew upgrade
brew info gcc
brew install gcc
brew cleanup
which should install the full gcc compiler suite, including gfortran, on your system.
The newest Apple computers have an optional ARM64 processor. If you aren't sure check the "About this Mac" entry under the Apple icon and if it says "Chip Apple M1" you will have some complexities to deal with. At the time of this writing (Sept 2021) the procedure here worked, but some of this complexity is likely to disappear as more ARM64 machines enter the pipeline.
As in the intel processor case the problem is that the standard Xcode compiler, clang, does not play well with any common fortran compiler. Hence, we need install the gcc compiler suite including gfortran. The main complication added to above is that the "homebrew (brew)" command line tool has to be set up as described here. Then you can run the following (modified) version of the procedure describe in this article.:
arch -x86_64 brew update
arch -x86_64 brew upgrade
arch -x86_64 brew info gcc
arch -x86_64 brew install gcc
arch -x86_64 brew cleanup
That should install the full gcc compiler suite including gfortran.
If you haven't already done so, download the source code for mspass from Github. In a terminal window running a Unix shell cd to the directory where you plan to install mspass. Then enter the standard command:
git clone git@github.com:mspass-team/mspass.git
which will give you a working copy of the repository. In the future, we expect to have formal releases and this procedure will change.
MsPASS uses Spark/Dask to process data in parallel, which means if you are running a local copy, you need to set up Spark and/or Dask if you are going to be do a local test with the parallel schedulers. For the installation, we recommend you follow the instructions on the official website. Since we are using Python to drive the workflow, PySpark is the one we will be using in MsPASS. Dask has not such complication because it is a pure python package.
For PySpark installation, you could refer to this page. It includes instructions for installing PySpark by using pip, conda, downloading manually, and building from the source. We do not currently recommend using anaconda with macos as we have had issues getting the components to play together.
For Dask installation, it’s pretty much the same as installing PySpark and here is the link you could refer to. Actually, if you are using pip to install Dask, you could ignore this step because it will be downloaded in the following section, which installs all the dependencies we rely on in MsPASS through the requirements.txt
.
Those installations procedures should cause no more problem than a typical python package install. That is, they usually work but if you have worked with python you will be familiar with the issue of incompatible modules by different already installed packages. For that reason, we strongly urge you to install pyspark and/or dask with the "-user" option of pip. In addition, you might need to configure some files or parameters in your machines before you can run with MsPASS. For example, users who don’t have localhost defined in /etc/hosts in their machines should set the SPARK_LOCAL_IP environment variable to localhost or whatever your hostname is. You might also need to set other environment variables like SPARK_HOME if it is not set properly during installation.
Feel free to report an issue if you are unable to resolve to the issue section of github for this package.
MsPASS is based entirely on open-source packages so the setup process from source code should not encounter any licensing issues. Also, we rely on some widely used python libraries in data processing and engineering in seismology like numpy and obspy. A convenient way to see the full dependencies inspect the file called requirements.txt found at the root of the MsPASS source tree.
Rather than install the required packages one by one, users can simply run the following command by pip at the root of the MsPASS directory
pip3 install -r requirements.txt
which will recursively install libraries specified and also the dependencies.
For users using Anaconda, make sure you are running in the right virtual environment and then enter the following similar command:
conda install --file requirements.txt
Note that at present there are some packages in the requirements.txt
file that are not available through the conda channel. If you get the following line:
PackagesNotFoundError: The following packages are not available from current channels:
followed by a list of package names do the following:
- Copy requirements.txt to a different name (e.g.
requirements_conda
). - Edit the file you just created removing the packages listed following the line shown above.
- Rerun
conda install
with using your edited file. - Install the offending packages with pip.
In MsPASS, we are using pybind11 to bind the C++ code to be used with python. For the present all C/C++ code is bound to a set of modules with the prefix mspasspy.ccore. For those who are interested in how pybind11 works in MsPASS, this particular example illustrates how we adapt an existing C++ algorithm to the MsPASS framework.
Here is a figure that illustrates the structure of where the C/C++ code is found in the directoy tree relative to python code and the pybind11 binding code.
The pybind code should be installed automatically as part of the cmake process described below.
In MsPASS, we are using some environment variable to check or validate the path. For example in the SchemaBase class and some C++ functions bound with pybind11. The ShemaClass, for exaple, checks if the environment variable MSPASS_HOME
exists and uses it to read the YAML file defining the specified schema. bash users should run the following command:
export MSPASS_HOME=/mspass
while csh/tcsh users would use
setenv MSPASS_HOME /mspass
Here in both cases /mspass is an example path. In your case, you should replace it with the absolute path where you downloaded MsPASS through git.
First, the user should recognize that we use cmake as a configuration tool. The configuration files run cmake in a mode where it generates unix makefiles that are used to drive the final compilation and linking. We use a common organization for larger packages in which we keep the source code isolated from the binaries. We recommend calling the top of this tree "build" and placing it at the top of the mspass source code tree. Specifically, assuming your shell is at the top of the mspass tree (i.e. in the mspass directory) type the following into your terminal window to use cmake to configure the build:
cd cxx
mkdir build
cd build
ccmake .. # cmake can also be used but it will use a default configuration you may not want
It's essential that you create the build
directory or some other convenient name to run cmake as cmake will create a series of subdirectories under build that define the build configuration and are used to hold the compiled libraries.
The last command should bring up a curses based terminal window form. On the first entry type c in the window to do an initial configuration. When that completes, you may want to edit some of the configuration parameters. If you need to access the C/C++ include files and static libraries, you will almost certainly want to change CMAKE_INSTALL_PREFIX to an appropriate location for your system. The default install location is /usr/local which nearly always requires root access and is often a bad idea anyway because so many open source packages default to that location and it can get bloated.
When finished defining the configuration again type c in the window and then g to "generate" the configuration. You will then see the output of cmake as it checks for required features and builds Unix makefiles. This step can take awhile as it usually needs to download and compile several large open source packages. If it fails you will need to do the usual sorting through the error log to troubleshoot the problem.
When cmake or ccmake completes successfully, in the same top-level "build" directory type run
make
and wait for compilation to complete. As always in compiling a package this may require some troubleshooting if anything fails.
If you are going to be compiling your own C/C++ code or want to link the the C/C++ libraries of MsPASS you will need to to make sure you have CMAKE_INSTALL_PREFIX set to an appropriate directory and then in the build directory type
make install
That will install the following standard directories under the directory CMAKE_INSTALL_PREFIX defines: (a) include will contain the include files needed for C/C++ compilations, (b) lib contains static libraries (unix ".a" files) you can use to link C/C++ programs that need to use MsPASS libraries, and (c) data contains default yaml and pf data files used by some MsPASS modules.
Note if you are using anaconda we reiterate that you should do all the steps in this section with the anaconda environment you set up for mspass set with activate
. If you don't the build may be incompatible with your environment.
The final step is to install python components using pip3. The way we set up MsPASS is with a setup.py file. The setup.py file is what describes our package, and tells setuptools how to package, build and install it. It is python code, so you can add anything custom you need to it. What setup.py does is the following:
- define version and package metadata
- include a list of packages, dependencies, and other files
- specify a list of extensions to be compiled. Users can simply run the following command by pip at the root of the MsPASS directory to install MsPASS:
pip3 install --user ./ -v
If you have root access and are building MsPASS on a machine used only by yourself, you can drop the --user flag. The --user flag causes the package to be installed under ~/.local so it will only effect your personal environment.
Depending on the machine, it can take several minutes to compile and build the entire package.
First of all, we need to explain a bit about CMake. Cmake is a tool for managing the build process of software. In other words, it prepares a list of commands to be performed to generate the executable. Under Linux, we usually use CMake to generate a GNU make file which then uses gcc/g++ to compile the C++ source code and create the executable. For those who are newbies to CMake, there is extensive documentation online for CMake. We note, however, that like many generic packages it is easy to be overcome by information overload from the core documentation. Like many modern large packages the best advice is to examine our CMakeList.txt configuration files in the directory chain and use the manual to explain some of the incantations required to make it work.
The cmake/make process and the related setup.py file create a set of unix ".so" files that are what the python interpreter loads when you issue an "import mspasspy.ccore.XXX" command in python. Then what are .so files? A file with .so file extension is a shared library file. It contains compiled code that can be dynamically loaded and linked to a program at run-time. Windows uses this same concept in what Microsoft calls a DLL (dynamic link library).
In MsPASS, we use CMake to compile and bundle into .so files so that other programs could link the shared library and call the methods inside. Here is a figure showing how MsPASS organizes the CMake files.
- compiler creates .o files in the build directory
- the .o files are packed with ar into static library files
- the .so files are created by linking against the binding code and the .a libraries.
Some issues may happen to users: