Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPF Guide cleanup and link fixes #3700

Merged
merged 3 commits into from
Jun 12, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 52 additions & 51 deletions docs/source/guides/opf.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Here are some examples of applications using the OPF interface:

[__Metrics__](../api/opf/metrics.html) take input values and predictions and output scalar representations of the quality of the predictions. Different metrics are suitable for different problems.

__Clients__ take input data and feed it through encoders, models, and metrics and store or report the resulting predictions or metric results.
[__Clients__](../api/opf/clients.html) take input data and feed it through encoders, models, and metrics and store or report the resulting predictions or metric results.

## What does the OPF do?

Expand All @@ -38,68 +38,65 @@ Each of these 3 components is in a separate set of modules. Metrics and writing

## What doesn’t the OPF do?

- The OPF does not create models. It is up to the client code to figure out how many models to run, and to instantiate the correct types of models
- The OPF does not create models. It is up to the client code to figure out how many models to run, and to instantiate the correct types of models.
- The OPF does not run models automatically. All the models in the OPF operate under a “push” model. The client is responsible for getting records from some data source, feeding records into the model, and handling the output of models.

## Models

### The Model Interface

The OPF defines the abstract "Model" interface for the implementation of any online learning model. Implementers typically subclass the [base class](../api/opf/models.html#model) provided. All models must implement the following methods:
The OPF defines the abstract "Model" interface for the implementation of any online learning model. Implementers typically subclass the [base class](../api/opf/models.html#nupic.frameworks.opf.model.Model) provided. All models must implement the following methods:

- **\_\_init\_\_(modelDescription, inferenceType)**
- **[`__init__(inferenceType)`](../api/opf/models.html#nupic.frameworks.opf.model.Model)**

Constructor for the model. Must take a modelDescription dictionary, which contains all the parameters necessary to instantiate the model, and an InferenceType value (see below). *A model’s \_\_init\_\_() method should always call the \_\_init\_\_() method of the superclass.*
Constructor for the model. Must take an [`InferenceType`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.InferenceType) value (see below). *A model’s ``__init__()`` method should always call the `__init__()` method of the superclass.*

- **run(inputRecord)**
- **[`run(inputRecord)`](../api/opf/models.html#nupic.frameworks.opf.model.Model.run)**

The main function for the model that does all the computation required for a new input record. Because the OPF only deals with online streaming models, each record is fed to the model one at a time
Returns: A populated ModelResult object (see below)
The main function for the model that does all the computation required for a new input record. Because the OPF only deals with online streaming models, each record is fed to the model one at a time. Returns: A populated ModelResult object (see below)

- **getFieldInfo()**
- **[`getFieldInfo()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.getFieldInfo)**

Returns a list of metadata about each of the translated fields (see below about translation). Each entry in the list is a FieldMetaInfo object, which contains information about the field, such as name and data type
Returns: A list of FieldMetaInfo objects

- **finishLearning()**
- **[`finishLearning()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.finishLearning)**

This is a signal from the client code that the model may be placed in a permanent "finished learning" mode where it will not be able to learn from subsequent input records. This allows the model to perform optimizations and clean up any learning-related state Returns: Nothing

- **resetSequenceStates()**
- **[`resetSequenceStates()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.resetSequenceStates)**

Signals the model that a logical sequence has finished. The model should not treat the subsequent input record as subsequent to the previous record.
Returns: Nothing

- **mapInputRecord()** - not used

- **getRuntimeStats()** – [can be a no-op]
- **[`getRuntimeStats()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.getRuntimeStats)** – [can be a no-op]

Get runtime statistics specific to this model. Examples include “number of records seen” or “average cell overlap”

Returns: A dictionary where the keys are the statistic names, and the values are the
statistic values

- **\_getLogger()** – [used by parent class]
- **`_getLogger()`** – [used by parent class]

Returns: The logging object for this class. This is used so that that the operations in the superclass use the same logger object.

It also provides the following functionality, common to all models:

- **enableLearning()/disableLearning()**
- **[`enableLearning()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.enableLearning) / [`disableLearning()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.disableLearning)**

Set’s the learning flag for the model. This can be queried internally and externally using the isLearningEnabled() method

- **enableInference(inferenceArgs=None)/disableInference()**
- **[`enableInference(inferenceArgs=None)`](../api/opf/models.html#nupic.frameworks.opf.model.Model.enableInference) / [`disableInference()`](../api/opf/models.html#nupic.frameworks.opf.model.Model.disableInference)**

Enables/Disables inference output for this model. Enabling inference takes an optional argument inferenceArgs, which is a dictionary with extra parameters that affect how inference is performed. For instance, an anomaly detection model may have a boolean parameter “doPrediction”, which toggles whether or not a prediction is computed in addition to the anomaly score.

The inference state of a model can be queried internally and externally using the isInferenceEnabled() method. The inference arguments can be queried using the getInferenceArgs() method.

- **save(saveModelDir)**
- **[`save(saveModelDir)`](../api/opf/models.html#nupic.frameworks.opf.model.Model.saveModelDir)**

Save the model state via pickle and saves the resulting object in the saveModelDir directory.

- **\_serializeExtraData(extaDataDir)/\_deSerializeExtraData(extraDataDir)**
- **`_serializeExtraData(extaDataDir)` / `_deSerializeExtraData(extraDataDir)`**

If there is state that cannot be pickled and needs to be saved separately, this can be done by overriding these methods (implemented as no-ops by default).

Expand All @@ -115,9 +112,9 @@ Certain field types need to be converted into primitive input types. For example
#### Encoding
Additionally, for some model types (such as the CLA model), the translated inputs are quantized (put into buckets) and converted into binary vector representation. This process is called **_encoding_** and is handled by [encoders](Encoders) (specific encoders for different data types exist). Most models may not need to encode the input (or, more likely, they will just need to quantize the input).

### Model Output: The ModelResult Object
### Model Output: The [`ModelResult`](../api/opf/results.html#nupic.frameworks.opf.opf_utils.ModelResult) Object

The ModelResult object is the main data container in the OPF. When a record is fed to a model, it instantiates a new ModelResult instance, which contains model input and inferences, and is shuttled around to the various OPF modules. Below is a description of each of the ModelResult attributes. They default to **None** when the ModelResult is instantiated, and must be populated by the Model object.
The [`ModelResult`](../api/opf/results.html#nupic.frameworks.opf.opf_utils.ModelResult) object is the main data container in the OPF. When a record is fed to a model, it instantiates a new [`ModelResult`](../api/opf/results.html#nupic.frameworks.opf.opf_utils.ModelResult) instance, which contains model input and inferences, and is shuttled around to the various OPF modules. Below is a description of each of the ModelResult attributes. They default to **None** when the ModelResult is instantiated, and must be populated by the Model object.

- **rawInput**: This is the exact record that is fed into the model. It is a dictionary-like object where the keys are the input field names, and the values are input values of the fields. All the input values maintain their original types.
- **sensorInput**: The translated input record, as well as auxiliary information about the input (See below)
Expand All @@ -137,9 +134,9 @@ binary numpy arrays, one for each field in dataRow.

### Inference Elements

The concept of InferenceElements is a key part of the OPF. A model's inference may have multiple parts to it. For example, a model may output both a prediction and an anomaly score. Models output their set of inferences as a dictionary that is keyed by the enumerated type InferenceElement. Each entry in an inference dictionary is considered a separate inference element, and is handled independently by the OPF.
The concept of [`InferenceElement`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.InferenceElement)s is a key part of the OPF. A model's inference may have multiple parts to it. For example, a model may output both a prediction and an anomaly score. Models output their set of inferences as a dictionary that is keyed by the enumerated type [`InferenceElement`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.InferenceElement). Each entry in an inference dictionary is considered a separate inference element, and is handled independently by the OPF.

Data structures related to inference elements are located in [**opf_utils.py**](https://github.com/numenta/nupic/blob/master/nupic/frameworks/opf/opf_utils.py).
Data structures related to inference elements are located in [*Inference Utilities*](../api/opf/utils.html#inference-utilities).

#### Inference Data Types

Expand All @@ -164,9 +161,9 @@ In order to compute metrics and write output, the OPF needs to know which input

> Snippet 1: Mapping inferences to input

In this example, we can see that the “_prediction_” inference element is associated with SensorInput.dataRow, and the “_classification_” inference element is associated with SensorInput.category.
In this example, we can see that the “_prediction_” inference element is associated with [`SensorInput`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.SensorInput)`.dataRow`, and the “_classification_” inference element is associated with [`SensorInput`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.SensorInput)`.category`.

This association is used to compute metrics and to determine which parts of the input to write to output. For example, to compute error, the value of “_prediction_” will be compared to the value of SensorInput.dataRow, and the value of “_classification_” will be compared to value of SensorInput.category
This association is used to compute metrics and to determine which parts of the input to write to output. For example, to compute error, the value of “_prediction_” will be compared to the value of [`SensorInput`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.SensorInput)`.dataRow`, and the value of “_classification_” will be compared to value of [`SensorInput`](../api/opf/utils.html#nupic.frameworks.opf.opf_utils.SensorInput)`.category`.

![Inference elements](../_static/opf-figure3.png)

Expand Down Expand Up @@ -211,52 +208,56 @@ Below is an example of how this shifting occurs to compute errors:

> Figure 4: Shifting

## Metrics
You can use the [`InferenceShifter`](../api/opf/results.html#inferenceshifter) to shift inferences:

```python
from nupic.data.inference_shifter import InferenceShifter as shifter

shiftedModelResult = shifter.shift(modelResult)
```

The 2nd responsibility of the OPF is to compute metrics on a model's output. Typically, this is some form of error metric, but in truth it can be any kind of score computed from the information in the input record and the output inferences. Metric calculations are handled by the PredictionMetricManager, which is instantiated with a series of MetricSpec objects (see below). The MetricsManager also handles shifting all the inferences appropriately before they are fed into their respective metrics modules
## [Metrics](../api/opf/metrics.html)

The 2nd responsibility of the OPF is to compute metrics on a model's output. Typically, this is some form of error metric, but in truth it can be any kind of score computed from the information in the input record and the output inferences. Metric calculations are handled by the [Prediction Metric Manager](../api/opf/metrics.html#module-nupic.frameworks.opf.prediction_metrics_manager), which is instantiated with a series of [`MetricSpec`](../api/opf/metrics.html#nupic.frameworks.opf.metrics.MetricSpec) objects (see below). The [`MetricsManager`](../api/opf/metrics.html#nupic.frameworks.opf.prediction_metrics_manager.MetricsManager) also handles shifting all the inferences appropriately before they are fed into their respective metrics modules

### Metric Specs

A metric calculation is specified by creating a MetricSpec object. This is a container object that contains 4 fields:
A metric calculation is specified by creating a [`MetricSpec`](../api/opf/metrics.html#nupic.frameworks.opf.metrics.MetricSpec) object. This is a container object that contains 4 fields:

- inferenceElement
- metric
- field (optional)
- params (optional)
- `inferenceElement`
- `metric`
- `field` (optional)
- `params` (optional)

Here is an example MetricSpec:

MetricSpec( inferenceElement=InferenceElement.multiStepBest,
metric="aae",
field="foo",
params = {"window" : 200 } )
```python
MetricSpec( inferenceElement=InferenceElement.multiStepBest,
metric="aae",
field="foo",
params = {"window" : 200 } )
```

This means that we are calculating the average absolute error ("aae") on the multiStepBest inference element, for the entry that corresponds to the field "foo", and with an optional parameter "window" set to 200.
This means that we are calculating the average absolute error ("aae") on the `multiStepBest` inference element, for the entry that corresponds to the field `foo`, and with an optional parameter `window` set to 200.

### MetricLabels
### Metric Labels

Metrics need to be able to be uniquely identified, so that the experiment can indicate which metric should be optimized and which should be written to output. To this end, metric specs can return a "metric label", which is a "human readable" (barely) string that contains all the information to uniquely identify the metric. The metric label for the above metric spec would be:

"multiStepBest:aae:window=200:field=foo"
multiStepBest:aae:window=200:field=foo

### Metrics Calculation Modules (metrics.py)
### Metrics Calculation Modules

The modules that actually calculate metrics are located in metrics.py. They all inherit the abstact base class Metric, and they must define the following methods.
The modules that actually calculate metrics are located in [*Available Metrics*](../api/opf/metrics.html#available-metrics). They all inherit the abstract base class [`MetricsIface`](../api/opf/metrics.html#nupic.frameworks.opf.metrics.MetricsIface), and they must define the following methods.

- **addInstance( prediction, groundTruth, record)**: This is the method where a new inference-groundTruth pair is passed to the metric. Additionally, the raw input record is
- **[`addInstance(prediction, groundTruth, record)`](../api/opf/metrics.html#nupic.frameworks.opf.metrics.MetricsIface.addInstance)**: This is the method where a new inference-groundTruth pair is passed to the metric. Additionally, the raw input record is
also passed to the metric calculator. The module is responsible for calculating the metric
and storing the relevant information here.
- **getMetric()**
- **[`getMetric()`](../api/opf/metrics.html#nupic.frameworks.opf.metrics.MetricsIface.getMetric)**
- Returns a dictionary with the metric value and any auxillary information. The
metric's value is stored under the key 'value' (confusing, right?)
- Ex. { 'value': 10.3, 'numIterations': 1003}
- Ex. `{ 'value': 10.3, 'numIterations': 1003}`

## Output

Types: Different inference value types are handled differently. The OPF distinguishes between 3 types: lists, dicts, and other. Lists are assumed to be associated with the model's getFieldInfo() output. An individual element is always output as a string, no matter it's actual type. Dicts are the most general, and separate columns are created for each key. Each entry in a dictionary is output as a string, no matter its type.

#### Outputting Inferences
- type inference on inference values
- writing to a file (re: column creation)
- writing to a db
- Types
Types: Different inference value types are handled differently. The OPF distinguishes between 3 types: lists, dicts, and other. Lists are assumed to be associated with the model's [`getFieldInfo`](../api/opf/models.html#nupic.frameworks.opf.model.Model.getFieldInfo) output. An individual element is always output as a string, no matter it's actual type. Dicts are the most general, and separate columns are created for each key. Each entry in a dictionary is output as a string, no matter its type.
6 changes: 3 additions & 3 deletions src/nupic/data/inference_shifter.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

class InferenceShifter(object):
"""
Shifts time for :class:`~.nupic.frameworks.opf.opfutils.ModelResult` objects.
Shifts time for :class:`~.nupic.frameworks.opf.opf_utils.ModelResult` objects.
This is useful for plotting results with the predictions at the same time step
as the input data.
"""
Expand All @@ -45,9 +45,9 @@ def shift(self, modelResult):
iteration was learn-only, then we would not have a T(i) prediction in our
FIFO and would not be able to emit a meaningful input/prediction pair.

:param modelResult: A :class:`~.nupic.frameworks.opf.opfutils.ModelResult`
:param modelResult: A :class:`~.nupic.frameworks.opf.opf_utils.ModelResult`
instance to shift.
:return: A :class:`~.nupic.frameworks.opf.opfutils.ModelResult` instance that
:return: A :class:`~.nupic.frameworks.opf.opf_utils.ModelResult` instance that
has been shifted
"""
inferencesToWrite = {}
Expand Down
2 changes: 1 addition & 1 deletion src/nupic/frameworks/opf/model_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def create(modelConfig, logLevel=logging.ERROR):

:param modelConfig: (dict)
A dictionary describing the current model,
`described here <../../docs/quick-start/example-model-params.rst>`_.
`described here <../../quick-start/example-model-params.html>`_.

:param logLevel: (int) The level of logging output that should be generated

Expand Down