Skip to content

Toolbox MVP

Christina Holt edited this page Jan 2, 2024 · 11 revisions

Note: This document is a living document meant to reflect the plans for the Unified Workflow project as of Sept 2022. It is not intended to be documentation for the existing tools. Where appropriate, we will link to the up-to-date documentation.

Overview

The Unified Workflow Team is creating a toolbox of standalone tools that can be applied to common technical challenges faced in the implementation of any numerical weather prediction workflow. These tools will be Python-based and will follow modern best practices to determine design and interoperability. Taking this approach, the team will be able to deliver tools on a semi-regular cadence in an effort to add value for UFS Apps as early as possible in the development cycle.

Potential Users

  • UFS Apps Teams: global_workflow, SRW, HAFS, UFS Weather Model
  • UW Team

The UW Tools

The tools that the UW Team plans to prioritize aim to simplify and unify the approach to the general problems (listed below) faced by users of NWP systems. Here are a few high-level requirements of such a toolbox.

Requirements

The general requirements for all tools are as follows. Specific requirements for each tool are specified in the next section.

  • Each tool has its own user interface.
  • Each tool follows Python standards and design best practices for object-oriented coding. This includes but is not limited to
  • Each tool can be called as a utility from existing bash and Python scripts in workflows via a CLI and API, respectively.
  • Only approved 3rd-party libraries will be adopted, and must be installable via pip (for NCO use).
  • Documentation should be included for each tool.
  • Each tool is 100% tested by unit and functional tests.
  • Tools must interact with a variety of methods of configuration, including global environment variables from the user's shell.
  • Each tool fails, as necessary, with informative user feedback and exits with 0 exit status when successful.
  • Each tool performs appropriate logging at various levels – verbose, normal, quiet, etc.

The Tools

Configuration file management

These CLI tools are available in the v1.0.0 release of the uwtools package with submodes under uw config.

  • These tools will give users the ability to treat configuration files as templates (using Jinja2-type templating) or in the case of dictable (stored in key/value pairs) configuration files, take full control over the key/value contents by providing any key/value pair to update, add, or remove settings to meet their needs.
  • The existing UFS Weather Model configuration files are summarized here: UFS Weather Model Configuration File Types.
  • Benefits
    • Apps can source configuration files directly from the UFS Weather model regression tests, reducing the overhead of maintaining copies in App code
    • All configuration settings can be stored in a single language (YAML is the standard), reducing the need for users to learn which syntax to use for which type of configuration data.

Schema tool for configuration validation

This CLI tool is available in the v1.0.0 release of the uwtools package as uw config validate.

  • This tool will provide the basic infrastructure for using the 3rd-party Schema package JSON Schema to validate that a configuration meets the needs of a given Schema-enabled tool. For a component to be Schema-enabled, it will need to provide its requirements for each of its configuration options in a JSON file that defines acceptable entries, data types, bounds checking, etc.
  • Benefits
    • Users can perform cheap sanity checks to see whether their configuration file should work with the tool that is being configured
    • When applied to the model configuration files, no model compilation would be required, and no HPC resources would be required.
    • Unifies the approach for validating the configuration of any tool in the UW Toolbox.

Batch script creation

Over the course of the design of the project, the creation of a batch script has become a duty handled by drivers, and is not available as a generic standalone tool through the CLI.

  • This tool enables a user to generate a batch script as configured by the Application.
  • Benefits
    • Users of Rocoto-based systems can readily provide "sandbox" cases to colleagues for easier debugging, along with a job submission script identical to settings used by Rocoto.
    • Provides a unified approach to build these files for Apps that do not use Rocoto.

Data ingest

Status: Still in the works.

  • Along with a file mover that knows how to move files between various common data archives (disk, HPSS, NODD, cloud storage, FTP, etc.), this tool will interface to a database (not necessarily relational) of the known file locations for data needed by UFS Applications.
  • Benefits
    • Unifies the language around how external data is retrieved
    • Unifying the knowledge base of common datasets reduces the maintenance.
    • Testing data retrieval can be decoupled from workflow tests, reducing HPC resources needed to test end-to-end applications.
    • Decouples dataset definition from tooling for easier user interactions.

Workflow definition creation

Status: A Rocoto YAML interface will be available in uwtools v2.0.0

  • Several workflow engines are used among the UFS Apps (ecFlow, Rocoto, and cylc), each communicating data related to user account settings, platform settings, resource requirements, and task dependencies in different languages. This tool will interface a common definition of such information (YAML is standard) into the necessary files for running a workflow with the desired engine.
  • Benefits
    • Reduced overhead in learning nuances of different workflow engines.
    • Ease of switching between workflow engines for the same App (often required in R2O transition)
  • Run scripts
    • This set of tools unifies the way in which each app runs a given UFS component (weather model, DA, pre-processing, etc.).
    • Should be designed around the concepts of strict, contractual interfaces and flexible configuration.
    • Benefits
      • Can be replaced one by one without impact on existing workflows (requires an appropriate user interface and following the NCO-required J-Jobs/ex-script relationships)
      • Unification of run scripts (in addition to unification of configuration) provides the same look and feel throughout all UFS Apps.
      • Provides a foundation for interchangeable parts when coupled with the standardization of component configuration.

Basic Tool Design

Many of the standalone tools will follow a very basic and similar design pattern as in the diagram below. In most cases, a user might provide some template or base file, this could be a template or namelist for the configuration management, a YAML file of known data locations for data retrieval, or a YAML file for defining a default cold start workflow. For batch script creation, this file wouldn't even be necessary.

The user will need to provide a Configuration Object (a Python dictionary) for all tools, and this will be standardized through the use of an interface that builds the dictionary from a variety of sources – a dictable file(s), user environment variables, and command-line arguments.

Once the user provides the desired Configuration Object and optional "base" file, the tool parses the file, gathers any configurables from it or from its knowledge base of required configurables, and verifies that the Configuration Object provided the necessary ones. Then, it renders the necessary output (in memory) and provides an option to write the result to a file.

Template Tool - Generic Tool Design

User Interface

Note: This section was used as a design guide during development. Each command line tool has its own set of required and optional arguments. Please reference either the User Guide or the -h option on the command line to see the most current, specific details.

The tools that follow this design will attempt to standardize the command line arguments. They will almost always need these common arguments:

  • -i, --input to describe the input template or base file
  • -o, --output to describe the path to the output file
  • -c, --config to indicate path(s) to an input config file
  • -s, --schema to describe any additional validation or default file
  • --dry-run to write the result to standard out. do not write the output to a file
  • config.[key]=value positional arguments to specify command-line arguments that should be wrapped into the Configuration Object
  • -h, help
  • --values-needed to write to standard output which keys have completed values, which keys have unfilled jinja2 templates, and which keys are set to empty
  • --input-file-type to convert the given input file to the provided type. Accepts YAML, INI, or NML.
  • --config-file-type to convert the given config file to the provided type. Accepts YAML, INI, or NML.
  • --output-file-type to convert the given output file to the provided type. Accepts YAML, INI, or NML.
  • -q, --quiet to prevent logging messages from printing out.
  • -v, --verbose to print all logging messages.

Discussion and Feedback

Discussion and feedback pages for the wiki can be found here.