From 1f5dcebe6ec60f135a612ab39a35a1b1ad5bfda0 Mon Sep 17 00:00:00 2001 From: Kalman Meth Date: Wed, 7 Sep 2022 17:13:49 +0300 Subject: [PATCH 1/2] update README with timebased TopK --- README.md | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ebe46e124..3cfbff05d 100644 --- a/README.md +++ b/README.md @@ -506,13 +506,13 @@ parameters: - "dstIP" - "srcIP" operation: "avg" - recordKey: "value" + operationKey: "value" ``` The output fields of the aggregates stage are: - `name` - `operation` -- `record_key` +- `operation_key` - `by` - `aggregate` - `total_value`: the total aggregate value @@ -652,6 +652,63 @@ Output fields that set `splitAB: true` (like in `Bytes`) are split into 2 fields aggregate values separately based on direction A->B and B->A respectively. When `splitAB` is absent, its default value is `false`. +### Timebased TopK + +It is sometimes desirable to return only a subset of records, such as those connections that use the most bandwidth. +This information is often relevant only for recently reported records. +This stage enables the reporting of records for the top (or bottom) K entries that have recently been processed. +The specification of the Timebased TopK details is placed in the `extract` stage of the pipeline. + +For Example, assuming a set of flow-logs, with a single sample flow-log that looks like: +``` +{"srcIP": "10.0.0.1", +"dstIP": "20.0.0.2", +"srcSubnet": "10.0.0.0/16", +"bytes": 4096, +``` + +It is possible to request the entries indexed by subnet with the top number of bytes. +There may be multiple records with the same index (e.g. same srcIP or same subnet, as the case may be). +The time interval over which to select the TopK may be specified. +It may further be specified what operation to perform on the multiple entries of the same index that fall within the allowed time inerval. +The allowed operations are: `sum`, `min`, `max`, `avg`, `diff`, `last`. +To obtain the bottom K entries instead of the Top K entries, set `reversed` to `true`. + +A sample configuration record looks like this: + +```yaml +pipeline: + - name: timebased1 + follows: +parameters: + - name: timebased1 + extract: + type: timebased + timebased: + rules: + - name: "Top 3 Sum of bytes per source subnet over last 10 seconds" + operation: sum + operationKey: bytes + recordKey: srcSubnet + topK: 3 + reversed: false + timeInterval: 10s +``` + +The output fields of the aggregates stage are: +- `name` +- `operation` +- `operation_key` +- `record_key`; the field specified in the rules upon which to perform the operation +- `key`; the value of the record_key +- `operation_result`; (computed sum, max, min, etc, as the case may be) + +In addition there is a field with the +"$record_key": "$key" +representing the original map entry in the input flow-log. + +These fields are used by the next stage (for example `prom` encoder). + ### Prometheus encoder The prometheus encoder specifies which metrics to export to prometheus and which labels should be associated with those metrics. From 8cb1c0cc319bffede0f3185d90c478e1cd164b4d Mon Sep 17 00:00:00 2001 From: Kalman Meth Date: Thu, 8 Sep 2022 15:26:50 +0300 Subject: [PATCH 2/2] added missing { --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 3cfbff05d..f304650c8 100644 --- a/README.md +++ b/README.md @@ -661,10 +661,12 @@ The specification of the Timebased TopK details is placed in the `extract` stage For Example, assuming a set of flow-logs, with a single sample flow-log that looks like: ``` -{"srcIP": "10.0.0.1", -"dstIP": "20.0.0.2", -"srcSubnet": "10.0.0.0/16", -"bytes": 4096, +{ + "srcIP": "10.0.0.1", + "dstIP": "20.0.0.2", + "srcSubnet": "10.0.0.0/16", + "bytes": 4096, +} ``` It is possible to request the entries indexed by subnet with the top number of bytes.