Skip to content

Latest commit

 

History

History
514 lines (386 loc) · 22.3 KB

Hyrax_BES_Configuration.adoc

File metadata and controls

514 lines (386 loc) · 22.3 KB

BES Configuration

When you launch your Hyrax Data Server, the BES loads the bes.conf located in $prefix/etc/bes/, which instructs to read all the configuration files defined for each module, located in $prefix/etc/bes/modules/. There are 27 configuration files! For example, the $prefix/etc/bes/modules/h5.conf file declares all configuration options for the HDF5 handler.

Note
By default $prefix is in /usr/local in Linux, or simply / if you followed the docker installation.

The last final directive in the bes.conf file is to read the $prefix/etc/bes/site.conf file, if it exists. And so, when the default configurations do not suit your needs, or that of your data users, the configuration of the BES can be customized by creating a site.conf and re-defining configuration parameters there. Any configuration reset in the site.conf file will override those set in the bes.conf file.

Note
By default, there is no site.conf file, and thus Hyrax uses the default configurations.

The main advantages of having a separate site.conf file are:

  • The bes.conf is static (unaltered), providing a way to check the default configurations.

  • The site.conf file consolidates all of the used configurations into a single file. This is preferable over making changes across multiple files.

  • The site.conf file persists through Hyrax updates.

To learn how to create and configure such site.conf, along with many examples, jump to Custom Module Configuration subsection below.

Basic format of parameters

One way in which the parameters are set in the BES configuration file, is:

Name=Value1
Name+=Value2

The above assigns both Value1 and Value2 to Name, due to the += operator. If instead of += you have =, then Name would be overwritten in the second line, taking only the value of Value2.

The bes.conf file includes all .conf files in the modules directory with the following:

BES.Include=modules/.*\.conf$
Note
Regular expressions can be used in the Include parameter to match a set of files.

And if you would like to include another configuration file you would use the following:

BES.Include=/path/to/configuration/file/blee.conf

Another way to define configuration parameters, is by using key/value pairs. This applies to many of the parameters avaiable, but not all. For example:

SupportEmail = support@opendap.org

BES.ServerAdministrator = email:support@opendap.org
BES.ServerAdministrator += organization:OPeNDAP Inc.
BES.ServerAdministrator += street:165 NW Dean Knauss Dr.
BES.ServerAdministrator += city:Narragansett
BES.ServerAdministrator+=region:RI
BES.ServerAdministrator+=postalCode:02882
BES.ServerAdministrator+=country:US
BES.ServerAdministrator+=telephone:+1.401.575.4835
BES.ServerAdministrator+=website:http://www.opendap.org
Note
parser for the configuration ignores spaces around the '=' and '+=' operators.

Custom Module Configuration via site.conf

The site.conf is a special configuration file that persists through Hyrax updates, and here you can store custom module configurations. Below we provide instructions on how to customize a module’s configuration with site.conf

Configuration Instructions

The following instructions work generally for any way you install Hyrax. In addition, we provide instructions for using site.conf when running Hyrax via Docker.

  1. Create an empty site.conf in $prefix\etc\bes with the following command:

    cd $prefix/etc/bes/
    sudo touch site.conf
  2. Locate the default .conf file for the module that you would like to customize in $prefix/etc/bes/modules.

  3. Copy the configuration parameters that you would like to customize from the module’s configuration file into the site.conf.

  4. Save your updates to site.conf.

  5. Restart the server.

Note
Included in $prefix/etc/bes/ is a template site.conf file, called site.conf.proto. To take advantage of this template you can simply copy it with the following command cp site.conf.proto site.conf. Then, uncomment the configuration parameters that you want to modify and update them.

Instructions When Running Hyrax via Docker

If you will be running Hyrax with Docker, you can first create an empty site.conf file on your machine, and point to it when running hyrax. This will allow the site.conf file to persist through any Hyrax update/restart, and even in accidental removing of Hyrax. For this, follow the (similar) steps as before:

  1. Create a local site.conf file.

  2. Run Hyrax via Docker adding the following line in Step 2 of the (see Docker Hub installation instructions):

    --volume $prefix/path_to_your_configuration/site.conf:/etc/bes/site.conf \
  3. Activate’s the docker container’s bash shell, by running:

    docker exec -it hyrax bash

    This will allow you to navigate the docker container, and therefore Hyrax’s directory.

  4. Resume Steps 2-4 in general configuration instructions.

  5. Restarting the server is optional / no longer needed. When running the server again, make sure to do Step 2.

Example: Pointing to data

There are two general ways to point to data, which depends on your preferred way to install and run Hyrax. When installing/running Hyrax via Docker, Step 2 describes the instruction to point data to Hyrax. Namely, add the following line to your docker run command:

--volume /full/path/data/root/directory:/usr/share/hyrax

By default, Hyrax will read data from /usr/share/hyrax. The /full/path/data/root/directory should be the root directory of your data catalog.

When installing Hyrax from Source or (pre-compiled) Binaries, you will have to set the value

BES.Catalog.catalog.RootDirectory=/full/path/data/root/directory
BES.Data.RootDirectory=/dev/null

in the site.conf.

The next step, is to (re)configure any mapping between data source names and data handlers. This is usually taken care of for you already, so you probably won’t have to set this parameter unless you would like to set a new configuration. Each data handler module (netcdf, hdf4, hdf5, freeform, etc…​) will have this set depending on the extension of the data files for the data.

For example, in the nc.conf, for the netcdf data handler module, you’ll find the line:

BES.Catalog.catalog.TypeMatch+=nc:.*\.nc(\.bz2|\.gz|\.Z)?$;
Note
When the BES is asked to perform some commands on a particular data source, it uses regular expressions to figure out which data handler should be used to carry out the commands. The value of the BES.Catalog.catalog.TypeMatch parameter holds the set of regular expressions. The value of this parameter is a list of handlers and expressions in the form handler expression. The regular expressions used by the BES are like those used by grep on Unix and are somewhat cryptic, but once you see the pattern it’s not that bad.

For example, in the following 3 examples, the TypeMatch parameter is being told the following:

  1. Any data source with a name that ends in .nc should be handled by the netcdf (nc) handler.

    BES.Catalog.catalog.TypeMatch+=nc:.*\.nc(\.bz2|\.gz|\.Z)?$;
  2. Any file with a .hdf, .HDF, or .eos suffix should be processed using the HDF4 handler (note that case matters)

    BES.Catalog.catalog.TypeMatch+=h4:.*\.(hdf|HDF|eos)(\.bz2|\.gz|\.Z)?$;
  3. Data sources ending in .dat should use the FreeForm handler.

    BES.Catalog.catalog.TypeMatch+=ff:.*\.dat(\.bz2|\.gz|\.Z)?$;

If you fail to configure this correctly, the BES will return error messages stating that the type information has to be provided. It won’t tell you this, however when it starts, only when the OLFS (or some other software) makes a data request. This is because it is possible to use BES commands in place of these regular expressions, although the Hyrax won’t.

NetCDF-4 files and the HDF5 Handler

In the NetCDF example above, although not explicitly, the .nc suffix refers to NetCDF-3 files (i.e. NetCDF classic). NetCDF-3 is an older data model, and as such does not incorporate many of the DataTypes now widely used by the scientific community. As a result, data producers opt to use instead the Enhanced Data Model, i.e. the NetCDF-4. Unfortunately, both NetCDF3 and NetCDF4 data file formats have identical suffix, .nc.

Despite there becoming a common practice to assign the .nc4 suffix to NetCDF4 files, you can expect to find many NetCDF-4 files with a .nc suffix. Since Hyrax’s netcdf handler only covers the NetCDF3 model, any attributes or variable types that are only part of the NetCDF-4 data model will not be properly handled by Hyrax’s data server. At worst, Hyrax will be unable to serve the dataset.

To successfully serve NetCDF4 data, the HDF5 handler should be assigned to any such file. The reason behind this successfull approach is that the NetCDF-4 uses HDF5 library as its backend. However, in the case where your data has both NetCDF3 and NetCDF4, we strongly recommend to rename any NetCDF4 to include the .nc4 suffix. This will facilitate the mapping between NetCDF4 data and HDF5 handler. To find out whether your .nc data file is NetCDF3 or NetCDF4, you can use ncdump.

The mapping assigning the HDF5 handler to any .nc4 file should be defined in the site.conf file as follows:

BES.Catalog.catalog.TypeMatch+=h5:.*\.nc4(\.bz2|\.gz|\.Z)?$;

Below, we provide a concrete example of a site.conf file when serving NetCDF-4 datasets with Groups. Groups are part of both NetCDF4 and HDF5 data models.

Example: Groups in NetCDF4 and HDF5

By default, the Group representation on a dataset is flattened to accomodate CF 1.7 conventions. In addition, the default NC-handler that is used for any .nc4 dataset is based on "Classic NetCDF model" (netCDF-3), which does not incorporate many of the Enhanced NetCDF model (netCDF4) features. As a result, to serve .nc4 data that may contain DAP4 elements not present in DAP2 (see diagram for comparison with DAP2), or serve H5 datasets with unflattened Group representation, one must make the following 2 changes to the default configuration:

  1. Set H5.EnableCF=false and H5.EnableCFDMR=true.

  2. Assign the h5 handler when serving .nc4 data via Hyrax.

To enable these changes the site.conf must have the following parameters:

BES.Catalog.catalog.TypeMatch=
BES.Catalog.catalog.TypeMatch+=csv:.*\.csv(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=reader:.*\.(dds|dods|data_ddx|dmr|dap)$;
BES.Catalog.catalog.TypeMatch+=dmrpp:.*\.(dmrpp)(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=ff:.*\.dat(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=gdal:.*\.(tif|TIF)$|.*\.grb\.(bz2|gz|Z)?$|.*\.jp2$|.*/gdal/.*\.jpg$;
BES.Catalog.catalog.TypeMatch+=h4:.*\.(hdf|HDF|eos|HDFEOS)(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=ncml:.*\.ncml(\.bz2|\.gz|\.Z)?$;

BES.Catalog.catalog.TypeMatch+=h5:.*\.(HDF5|h5|he5|H5)(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=h5:.*\.nc4(\.bz2|\.gz|\.Z)?$;

H5.EnableCF=false
H5.EnableCFDMR=true

Including and Excluding files and directories

Finally, you can configure the types of information that the BES sends back when a client requests catalog information. The Include and Exclude parameters provide this mechanism, also using a list of regular expressions (with each element of the list separated by a semicolon). In the example below, files that begin with a dot are excluded. These parameters are set in the dap.conf configuration file.

The Include expressions are applied to the node first, followed by the Exclude expressions. For collections of nodes, only the Exclude expressions are applied.

BES.Catalog.catalog.Include=;
BES.Catalog.catalog.Exclude=^\..*;

Example: Administrator parameters

The following steps detail how you can update the BES’s server administrator configuration parameters with your organization’s information:

  1. Locate the existing server administrator configuration in /etc/bes/bes.conf:

    BES.ServerAdministrator=email:support@opendap.org
    BES.ServerAdministrator+=organization:OPeNDAP Inc.
    BES.ServerAdministrator+=street:165 NW Dean Knauss Dr.
    BES.ServerAdministrator+=city:Narragansett
    BES.ServerAdministrator+=region:RI
    BES.ServerAdministrator+=postalCode:02882
    BES.ServerAdministrator+=country:US
    BES.ServerAdministrator+=telephone:+1.401.575.4835
    BES.ServerAdministrator+=website:http://www.opendap.org
    Tip
    When adding parameters to the ServerAdministrator configuration, notice how, following the first line, we use += instead of just
    to add new key/value pairs. += indicates to the BES that we are adding new configuration parameters, rather than replacing those that were already loaded. Had we used just + in the above example, the only configured parameter would have been website.
  2. Copy the above block of text from its default .conf file to site.conf.

  3. In site.conf, update the block of text with your organization’s information; for example…​

    BES.ServerAdministrator=email:smootchy@woof.org
    BES.ServerAdministrator+=organization:Mogogogo Inc.
    BES.ServerAdministrator+=street:165 Buzzknucker Blvd.
    BES.ServerAdministrator+=city: KnockBuzzer
    BES.ServerAdministrator+=region:OW
    BES.ServerAdministrator+=postalCode:00007
    BES.ServerAdministrator+=country:MG
    BES.ServerAdministrator+=telephone:+1.800.555.1212
    BES.ServerAdministrator+=website:http://www.mogogogo.org
  4. Save your changes to site.conf.

  5. Restart the server.

Administration & Logging

In the bes.conf or site.conf file, the BES.ServerAdministrator parameter is the address used in various mail messages returned to clients. Set this so that the email’s recipient will be able to fix problems and/or respond to user questions. Also set the log file and log level. If the BES.LogName is set to a relative path, it will be treated as relative to the directory where the BES is started. (That is, if the BES is installed in /usr/local/bin but you start it in your home directory using the parameter value below, the log file will be bes.log in your home directory.)

BES.ServerAdministrator=webmaster@some.place.edu
BES.LogName=./bes.log
BES.LogVerbose=no

Because the BES is a server in its own right, you will need to tell it which network port and interface to use. Assuming you are running the BES and OLFS (i.e., all of Hyrax) on one machine, do the following:

User and Group Parameters

In the bes.conf or site.conf file, the BES must be started as root. One of the things that the BES does first is to start a listener that listens for requests to the BES. This listener is started as root, but then the User and Group of the process is set using parameters in the BES configuration file:

BES.User=user_name
BES.Group=group_name

You can also set these to a user id and a group id. For example:

BES.User=#172
BES.Group=#14

Setting the Networking Parameters

In the bes.conf or site.conf configuration file, we have settings for how the BES should listen for requests:

BES.ServerPort=10022
# BES.ServerUnixSocket=/tmp/opendap.socket

The BES.ServerPort tells the BES which TCP/IP port to use when listening for commands. Unless you need to use a different port, use the default. Ports with numbers less than 1024 are special, otherwise you can use any number under 65536. That being said, stick with the default unless you know you need to change it.

In the default bes.conf file we have commented the ServerUnixSocket parameter, which disables I/O over that device. If you need UNIX socket I/O, uncomment this line, otherwise leave it commented. The fewer open network I/O ports, the easier it is to make sure the server is secure.

If both ServerPort and ServerUnixSocket are defined, the BES listens on both the TCP port and the Unix Socket. Local clients on the same machine as the BES can use the unix socket for a faster connection. Otherwise, clients on other machines will connect to the BES using the BES.ServerPort value.

Note
The OLFS always uses only the TCP socket, even if the UNIX socket is present.

Debugging Tip

In bes.conf, use the BES.ProcessManagerMethod parameter to control whether the BES acts like a normal Unix server. The default value of multiple causes the BES to accept many connections at once, like a typical server. The value single causes it to accept a single connection (process the commands sent to it and exit), greatly simplifying troubleshooting.

BES.ProcessManagerMethod=multiple

Controlling how compressed files are treated

Compression parameters are configured in the bes.conf configuration file.

The BES will automatically recognize compressed files using the bz2, gzip, and Unix compress (Z) compression schemes. However, you need to configure the BES to accept these file types as valid data by making sure that the filenames are associated with a data handler. For example, if you’re serving netCDF files, you would set BES.Catalog.catalog.TypeMatch so that it includes nc:.*\.(nc|NC)(\.gz|\.bz2|\.Z)?$;. The first part of the regular expression must match both the filename and the '.nc' extension, and the second part must match the suffix, indicating the file is compressed (either .gz, .bz2 or .Z).

When the BES is asked to serve a file that has been compressed, it first must decompress it before passing it to the correct data handler (except for those formats which support 'internal' compression, such as HDF4). The BES.CacheDir parameter tells the BES where to store the uncompressed file. Note that the default value of /tmp is probably less safe than a directory that is used only by the BES for this purpose. You might, for example, want to set this to <prefix>/var/bes/cache.

The BES.CachePrefix parameter is used to set a prefix for the cached files so that when a directory like /tmp is used, it is easy for the BES to recognize which files are its responsibility.

The BES.CacheSize parameter sets the size of the cache in megabytes. When the size of the cached files exceeds this value, the cache will be purged using a least-recently-used approach, where the file’s access time is the 'use time'. Because it is usually impossible to determine the sizes of data files before decompressing them, there may be times when the cache holds more data than this value. Ideally this value should be several times the size of the largest file you plan to serve.

Loading Software Modules

Virtually all of the BES’s functions are contained in modules that are loaded when the server starts up. Each module is a shared-object library. The configuration for each of these modules is contained in its own configuration file and is stored in a directory called modules. This directory is located in the same directory as the bes.conf file: $prefix/etc/bes/modules/.

By default, all .conf files located in the modules are loaded by the BES per this parameter in the bes.conf configuration file:

BES.Include=modules/.*\.conf$

So, if you don’t want one of the modules to be loaded, simply change its name to, say, nc.conf.sav and it won’t be loaded.

For example, if you are installing the general purpose server module (the dap-server module) then a dap-server.conf file will be installed in the modules directory. Also, most installations will include the dap module, allowing the BES to serve OPeNDAP data. This configuration file, called dap.conf, is also included in the modules directory. For a data handler, say netcdf, there will be an nc.conf file located in the modules directory.

Each module should contain within it a line that tells the BES to load the module at startup:

BES.modules+=nc
BES.module.nc=/usr/local/lib/bes/libnc_module.so

Module specific parameters will be included in its own configuration file. For example, any parameters specific to the netcdf data handler will be included in the nc.conf file.

If you would like symbolic links to be followed when retrieving data and for viewing catalog entries, then you need to set the following two parameters: the BES.FollowSymLinks parameter and the BES.RootDirectory parameter. The BES.FollowSymLinks parameter is for non-catalog containers and is used in conjunction with the BES.RootDirectory parameter. It is not a general setting. The BES.Catalog.catalog.FollowSymLinks is for catalog requests and data containers in the catalog. It is used in conjunction with the BES.Catalog.catalog.RootDirectory parameter above. The default is set to No in the installed configuration file. To allow for symbolic links to be followed you need to set this to Yes.

The following is set in the bes.conf file:

BES.FollowSymLinks=No|Yes

And this one is set in the dap.conf file in the modules directory:

BES.Catalog.catalog.FollowSymLinks=No|Yes

Parameters for Specific Handlers

Parameters for specific modules can be added to the BES configuration file for that specific module. No module-specific parameters should be added to bes.conf.