New decode_csv_fields processor #11753

adriansr · 2019-04-10T21:57:15Z

This patch introduces a new processor, ~~decode_csv_field~~ decode_csv_fields that decodes
rows of CSV-formatted data into a string array, one element per column.

processors:
- decode_csv_fields:
    fields:
      message: csv
    separator: ,
    overwrite_keys: false
    ignore_missing: false
    trim_leading_space: false
    fail_on_error: true

This patch introduces a new processor, `decode_csv_field` that decoded rows of CSV-formatted data into a string array, one element per column. processors: - truncate_fields: field: message target: csv separator: , overwrite_keys: false ignore_missing: false trim_leading_space: false

andrewkroh

A while back I had hacked together a lookup table processor that could load data from a CSV file. I found it useful to have a setting that allowed me to directly name specific numbered columns like

- processors:
    - decode_csv_field:
        field: message
        target: "user"
        columns:
          # Target Field Name -> CSV Column Number
          email: 0 # Write column 0 to user.email.
          name:  2 # Write column 2 to user.name. (Column 1 is ignored.)

When columns is not specified then I'd have it write an array of strings to the target like you have.

andrewkroh

The code LGTM.

Can you add this processor to here.

And also add a changelog entry.

libbeat/processors/actions/decode_csv_field.go

libbeat/processors/actions/decode_csv_field_test.go

adriansr · 2019-04-11T08:36:56Z

@andrewkroh
I originally devised it using the columns mapping too, the problem is that it won't easily fit my current use-case, as I have 50+ columns and need to inspect one of the first columns (a.k.a "type") to decide which mapping to use for the rest.

So I thought maybe it was better to have this processor to decode to an array and then add a generic "extract_array" processor or do this inside an ingest pipeline. I'm planning to discuss this on today's sync with Beats team.

This will help me move the CSV processor out of "actions" and improve overall reusability.

So that it can be included selectively by different Beats.

libbeat/processors/checks/checks.go

libbeat/processors/decode_csv_field/decode_csv_field_test.go

libbeat/processors/decode_csv_field/decode_csv_field.go

libbeat/processors/checks/checks.go

andrewkroh · 2019-04-13T01:07:21Z

This is looking good. I think it just needs a section added to the asciidocs now.

libbeat/processors/decode_csv_fields/decode_csv_fields_test.go

libbeat/processors/decode_csv_fields/decode_csv_fields.go

adriansr · 2019-04-16T10:47:08Z

@andrewkroh I've modified the processor a little bit to help align with the rest. Do you mind reviewing again? Now it has docs.

Main change is the rename to *_fields and the ability to process more than one field at the same time (I've copied the conf style from your dns processor). This adds also the new flag fail_on_error, which is also common on another processors.

There is currently no mechanism to inject this reference config on selected Beats.

andrewkroh

LGTM

libbeat/docs/processors-using.asciidoc

Co-Authored-By: adriansr <adrisr83@gmail.com>

* Missing changelog entry for #11753 * Update csv processor to support `when` clause * Docs fixes for csv processor

This patch introduces a new processor, `decode_csv_fields` that decodes rows of CSV-formatted data into a string array, one element per column. processors: - decode_csv_fields: fields: message: csv separator: , overwrite_keys: false ignore_missing: false trim_leading_space: false

adriansr added 2 commits April 10, 2019 23:54

Beware of dog

c7e727c

adriansr requested review from a team as code owners April 10, 2019 21:57

adriansr added discuss Issue needs further discussion. enhancement needs_docs review labels Apr 10, 2019

andrewkroh reviewed Apr 10, 2019

View reviewed changes

libbeat/processors/actions/decode_csv_field.go Outdated Show resolved Hide resolved

libbeat/processors/actions/decode_csv_field_test.go Outdated Show resolved Hide resolved

Fix imports

b64cce9

adriansr mentioned this pull request Apr 11, 2019

New processor extract_array #11761

Merged

adriansr added 6 commits April 11, 2019 15:09

Register with script processor

dd21c37

Improve String description of processor

d3794ab

Improve tests config / check String output

6bfdda2

Refactor checks into their own package

9e1ea56

This will help me move the CSV processor out of "actions" and improve overall reusability.

Move decode_csv_field processor to own package

692f013

So that it can be included selectively by different Beats.

Register csv processor in filebeat

6f8f1ca

houndci-bot reviewed Apr 12, 2019

View reviewed changes

libbeat/processors/checks/checks.go Show resolved Hide resolved

libbeat/processors/checks/checks.go Show resolved Hide resolved

libbeat/processors/checks/checks.go Show resolved Hide resolved

libbeat/processors/checks/checks.go Show resolved Hide resolved

houndci-bot reviewed Apr 12, 2019

View reviewed changes

Document new public methods

9e0a78b

adriansr force-pushed the feature_csv_processor branch from 4e87509 to 9e0a78b Compare April 12, 2019 09:39

Add csv processor to journalbeat

8ea4f15

adriansr added 3 commits April 15, 2019 22:34

More separator tests

a342cd0

Rename to decode_csv_fields to align with others

6eaa978

Docs

18d72c5

houndci-bot reviewed Apr 16, 2019

View reviewed changes

libbeat/processors/decode_csv_fields/decode_csv_fields_test.go Show resolved Hide resolved

libbeat/processors/decode_csv_fields/decode_csv_fields.go Show resolved Hide resolved

Mark processor as experimental

90dd9ab

adriansr changed the title ~~New decode_csv_field processor~~ New decode_csv_fields processor Apr 16, 2019

Get rid of refence config documentation

5f0e505

There is currently no mechanism to inject this reference config on selected Beats.

andrewkroh approved these changes Apr 22, 2019

View reviewed changes

libbeat/docs/processors-using.asciidoc Outdated Show resolved Hide resolved

Update libbeat/docs/processors-using.asciidoc

b37f380

Co-Authored-By: adriansr <adrisr83@gmail.com>

adriansr removed discuss Issue needs further discussion. needs_docs labels Apr 26, 2019

adriansr merged commit e03993f into elastic:master Apr 26, 2019

adriansr added a commit to adriansr/beats that referenced this pull request Apr 26, 2019

Missing changelog entry for elastic#11753

1bb074c

adriansr mentioned this pull request Apr 26, 2019

Fixes for the decode_csv_fields processor #11947

Merged

adriansr added a commit that referenced this pull request Apr 26, 2019

Fixes for the decode_csv_fields processor (#11947)

ad93ae4

* Missing changelog entry for #11753 * Update csv processor to support `when` clause * Docs fixes for csv processor

droberts195 mentioned this pull request Oct 27, 2021

[Text Structure][ML] Improve multi-line start pattern recognition when no timestamps are present elastic/elasticsearch#79708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New decode_csv_fields processor #11753

New decode_csv_fields processor #11753

adriansr commented Apr 10, 2019 •

edited

Loading

andrewkroh left a comment

andrewkroh left a comment

adriansr commented Apr 11, 2019

andrewkroh commented Apr 13, 2019

adriansr commented Apr 16, 2019 •

edited

Loading

andrewkroh left a comment

New decode_csv_fields processor #11753

New decode_csv_fields processor #11753

Conversation

adriansr commented Apr 10, 2019 • edited Loading

andrewkroh left a comment

Choose a reason for hiding this comment

andrewkroh left a comment

Choose a reason for hiding this comment

adriansr commented Apr 11, 2019

andrewkroh commented Apr 13, 2019

adriansr commented Apr 16, 2019 • edited Loading

andrewkroh left a comment

Choose a reason for hiding this comment

adriansr commented Apr 10, 2019 •

edited

Loading

adriansr commented Apr 16, 2019 •

edited

Loading