rulemanager

Management system for rule-driven statistical production systems.

Status: EXPERIMENTAL

In a rule-driven production system, statisticians do not touch the data. In stead, data processing is fully automated, where all data transformations are driven by externally defined rules. Generally we distinguish between two types of rules:

Data validation rules. These are rules that express demands on a data set. A rule-driven system can ingest these rules and apply the transformations necessary to make a data set satisfy those rules.
Data transformation rules. these are rules that describe how to alter existing values, fill in missing data, derive new variables, or derive new data structures (e.g. aggregation, modeling).

The system in this repository offers a minimal subset of rule management features that focus on CRUD and reproducibility of production. The idea is to offer a data structure and an extensible API that offer fundamental operations on rules and rule sets that can be combined to build applications.

See also: MPJ. van der Loo, KO ten Bosch and E. de Jonge, Rule Management UNECE Expert Meeting on Statistical Data Editing 2022.

Installation

You need to have git and R installed. If you are on Windows, you also need to have Rtools installed.

Open a command-line interface (e.g. Powershell on Windows, bash on Linux/Mac).

git clone https://SNStatComp/rulemanager
cd rule manager
make doc
make install

User stories

The R package rulemanager aims to support the following user stories.

As a statistics producer, I want to

Create, Update, and Delete rules so I can fix my current understanding of a statistical domain in the form of a formal ruleset.
Select a set of rules so I can apply them to my data.
Determine the order of rule execution so I have full control over data processing and validation.
Be able to trace the evolution of my rules and rule sets so I can (1) give full account of my production runs, and (2) reproduce production runs.
Temporarily remove a rule from one or more rule sets so I can handle exceptional and transient data circumstances. This temporary removal should be documented.
Temporarily update a rule from one or more rulesets so I can handle exceptional and transient data circumstances. This temoporary update should be documented.

As a statistical organization, I want to

Promote reuse of rules.
Promote transparency and learning accross production systems, by comparing
Compare and contrast production systems; benchmark production systems.

General considerations

A rule repository can hold rules for multiple production systems. In such a repository it is not important that rules are mutually consistent (non-contradictory) and irredundant. When a set of rules is selected to be applied to data, this consistency becomes important.

Design considerations and conditions for this package

Build an API that supports basic operations/user stories. These can then be used to build applications that support specific workflows.
(Todo) Independence of database implementation. By default SQLite is used but all fundamental operations are polymorphic and can be extended to other databases.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
UNECE2022		UNECE2022
pkg		pkg
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rulemanager

Installation

User stories

General considerations

Design considerations and conditions for this package

About

Releases

Packages

Languages

SNStatComp/rulemanager

Folders and files

Latest commit

History

Repository files navigation

rulemanager

Installation

User stories

General considerations

Design considerations and conditions for this package

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages