Skip to content

Latest commit

 

History

History
277 lines (203 loc) · 10.6 KB

README.md

File metadata and controls

277 lines (203 loc) · 10.6 KB

Build Status

lib_cpp_h5_writer

This library is used for creating C++ based stream writer for H5 files. It focuses on the functionality and performance needed for high performance detectors integrations.

Key features:

  • Get data from ZMQ stream (Array-1.0 protocol) - htypes specification
  • Write Nexus complaint H5 file (User specified) - nexus format
  • Specify additional zmq stream parameters to write to file.
  • Receive additional parameters to write to file via REST api.
  • Interaction with the writer over REST api (stop, kill, get_statistics).

Table of content

  1. Quick start using the library
  2. Build
    1. Conda build
    2. Local build
  3. Basic concepts
    1. ProcessManager
    2. ZmqReceiver
    3. H5Writer
    4. H5Format
    5. WriterManager
    6. RingBuffer
  4. REST interface
  5. Examples

Quick start for using the library

To create your own stream writer you need to specify:

  • The H5 file format you want to write.
  • The mapping of REST input variables to your H5 format.
  • Additional H5 format fields with default values or calculated fields (based on input or default values).
  • The mapping between the stream header metadata and your H5 file format.
  • Additional metadata that is transfer in the stream message header.

Under sf/ and csaxs/ you can see examples of this. Feel free to use any of this folders as a template.

IMPORTANT: We are using a monorepo for this project (all implementations should live in this git repository). To create a new implementation, please add a folder to the root of the proejct (like sf/ and csaxs/).

The minimum you need to implement your own writer is:

  • Writer runner (example: csaxs/csaxs_h5_writer.cpp)
  • File format (example: csaxs/CsaxsFormat.cpp)
  • Build file (example: csaxs/Makefile)

Writer runner

Example: csaxs/csaxs_h5_writer.cpp

The runner is the actual executable you will run to create files. In the writer runner you:

  • Specify and parse input parameters.
  • Prepare your system for writing (creating folders, switch process user etc.)
  • Instantiate the file format object.
  • Define the parameters that come in the stream header.
  • Start the writer (mostly boilerplate code, if you do not need any special implementations).

File format

Example: csaxs/CsaxsFormat.cpp

This is a class that extends the H5Format class. You need to specify:

  • input_value_type (REST API value name to type mapping)
  • default_values (Fields in the file format that have default values)
  • dataset_move_mapping (Move datasets to another place in the file if needed)
  • file_format (The hierarchical structure of your H5 format) It is best to specify all the values above in the class constructor. Some values (all except file_format) can be empty, but they should not be null.

The current cSAXS and SF formats are quite simple. As a reference, you can check the old cSAXS file format implementation: csaxs_cpp_h5_writer

Build file

Example: csaxs/Makefile

If you want to use Makefiles, you can basically copy one from an existing implementation (csaxs/) and rename the executable. In case you want something more sophisticated you will have to provide it yourself.

In addition, you can deploy your writer also as an anaconda package - you will need to include the conda-recipe folder in this case as well (see csaxs/conda-recipe).

Build

You need your compiler to support C++11.

The easiest way to build the library is via Anaconda. If you are not familiar with Anaconda (and do not want to learn), you can also install all the dependencies directly in your os.

The base library is located in lib/. Change you current directory to lib/ and:

  • make (build the library for production)
  • make clean (clean the previous build)
  • make deploy (deploy library to your local conda environemnt)
  • make debug (build library with debug prints in the standard output)
  • make perf (build the library with performance measurements in the standard output)
  • make test (create tests)

The usual procedure would be:

  • make test (build the tests)
  • ./bin/execute_tests (execute the tests)
  • make deploy (deploy the library)

You can then start building your executable. It is also a good idea to automate the base library build from your executable build system (see csaxs/Makefile, lib target for example).

Conda build

If you use conda, you can create an environment with the needed library by running:

conda create -c paulscherrerinstitute --name <env_name> make cppzmq==4.3.0 hdf5==1.10.4 boost==1.61.0 gtest==1.8.1

After that you can just source you newly created environment:

conda activate <env_name> 

and start linking your builds against the libraries. To do that you can use the environament variables Anaconda sets:

-L${CONDA_PREFIX}/lib (for linking libraries you have installed with Anaconda)

To run you executables inside the Anaconda environment, you will need also to export the lib/ path in your env variables:

export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib

Local build

If you decide not to use Anaconda, you will have to install the following libraries in your system:

  • make
  • cppzmq ==4.3.0
  • hdf5 ==1.10.4
  • boost ==1.61.0

Basic concepts

In this chapter we will describe the basic concepts you need to get a hold off in order to use the library. In case more advanced knowledge is needed, please feel free to browse the code. The most important components are discussed in subchapters below.

General overview

The process and thread management is taken care by the ProcessManager. The process manager initializes, starts and stops the 3 threads discussed below.

The writer has 3 threads:

  • ZMQ receiving thread (listens for incoming ZMQ stream messages).
  • H5 writer thread (writes the received data to disk).
    • H5Writer is the base writer implementation that can be extended at will.
  • REST thread (listens to incoming REST requests).

The communication bridges between threads are:

In order to have a central place where to set fine tunning parameters, the config.cpp file is used.

The ZMQ thread receives data from the stream, it extracts it and packs it (with additional metadata) into the ring buffer. Meanwhile, the H5 thread is listening for data in the ring buffer. When new data arrives, it writes this data down into temporary datasets (for performance reasons we write the file format in the end).

When the end of the writing is triggered (via the REST api, when the desired number of frames are received, or when the user terminates the process), an attempt to write the file format is performed. If the format writing is successful, the temporary datasets are moved to their final place in the file format. If the format writing step fails for any reason, the data will remain in the temporary datasets and the user will need to fix the file manually (the goal is to preserve the data as much as possible).

ProcessManager

Not yet here :(

ZmqReceiver

The stream receiver that gets your data from the stream. This is PSI specific, and currently supports only the Array-1.0 protocol. You pass the ZmqReceiver you would like to use in your writer runner, so it should be easy to implement your own if needed.

The protocol specification can be found here: htypes specification

Stream header values

In addition to the image in the stream, the receiver can pass to the writer also data defined in the header of the stream, for example:

  • pulse_id (The pulse id for the current image)
  • source (source of the currect image)
  • etc.

This fields are specific to your input stream, and you specify them in your writer runner. You can define both scalars and arrays (see csaxs/sf_h5_writer.cpp, variable header_values for an example).

The allowed data types for this values are:

  • "uint8"
  • "uint16"
  • "uint32"
  • "uint64"
  • "int8"
  • "int16"
  • "int32"
  • "int64"
  • "float32"
  • "float64"

This stream header parameters need to be specified when constructing your ZmqReceiver instance:

auto header_values = shared_ptr<unordered_map<string, HeaderDataType>>(new unordered_map<string, HeaderDataType> {
    {"frame", HeaderDataType("uint64")}, // Scalar for frame number
    {"module_number", HeaderDataType("int64", n_modules)} // Array of n_modules elements for module_number.
});

// Pass the header_values to the ZmqReceiver constructor.
ZmqReceiver receiver(connect_address, n_io_threads, receive_timeout, header_values);

Read the H5Writer chapter to see where this data is written in the H5 file. Knowing where the data is written is important to properly setup the dataset_move_mapping in the file format. See chapter H5Format for more info.

H5Writer

Not yet here :(

H5Format

The H5Format is the base class you need to extend to implement your file format. It specifies that the following variables need to be set:

  • input_value_type (REST API value name to type mapping)
  • default_values (Fields in the file format that have default values)
  • dataset_move_mapping (Move datasets to another place in the file if needed)
  • file_format (The hierarchical structure of your H5 format)

We will discuss each one in details in this chapter.

input_value_type

Not yet here :(

default_values

Not yet here :(

dataset_move_mapping

Not yet here :(

file_format

Not yet here :(

WriterManager

Not yet here :(

RingBuffer

Not yet here :(

REST interface

Not yet here :(

Examples

Not yet here :(