Skip to content

Latest commit

 

History

History

v1

RSA2ELK

Converts Netwitness log parser configuration to Logstash configuration

Disclamer: Vincent Maury or Elastic cannot be held responsible for the use of this script! Use it at your own risk

Introduction (the why)

The purpose of this tool is to convert an existing configuration made for RSA Netwitness Log Parser software (ingestion piece of the RSA SIEM) into a Logstash configuration that can ingest logs to Elasticsearch.

RSA uses one configuration file per device source (product). For example, one file will handle F5 ASM, another one will handle F5 APM, etc.

Please note that RSA released the configuration files for 300 devices on github with the Apache 2.0 license. So if you are not an RSA user, you can still pass any of these configuration files to the rsa2elk tool to generate the corresponding Logstash pipeline.

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

This piece of python has no other pre-requisite than Python 3. It should work on any platform (tested on Windows so far). No need for additional library.

Running the script

Just clone this repository and run the script.

git clone https://github.com/blookot/rsa2elk
python rsa2elk.py -h

The script has several options:

  • -h will display help.
  • -i or --input-file FILE to enter the absolute patch to the RSA XML configuration file. Alternative is url.
  • -u or --url URL to enter the URL to the RSA XML configuration file. if no file or url is provided, this program will run on a sample XML file located in the RSA repo.
  • -o or --output-file FILE to enter the absolute path to the Logstash .conf file (default: logstash-[device].conf).
  • -c or --check-config runs on check of the generated configuration with logstash -f (default: false).
  • -l or --logstash-path to enter the absolute path to logstash bin executable (default is my local path!).
  • -n or --no-grok-anchors removes the begining (^) and end ($) anchors in grok (default: false, ie default is to have them).
  • -a or --add-stop-anchors adds hard stop anchors in grok to ignore in-between chars, see explanation below. Should be set as a serie of plain characters, only escaping " and \. Example: \"()[] (default: "").
  • -s or --single-space-match to only match 1 space in the log if there is 1 space in the RSA parser (default: false, ie match 1-N spaces aka [\s]+).
  • -p or --parse-url adds a pre-defined filter block (see filter-url.conf) to parse URLs into domain, query, etc (default: false).
  • -q or --parse-ua adds a pre-defined filter block (see filter-ua.conf) to parse User Agents (default: false).
  • -r or --remove-parsed-fields removes the event.original and message fields if correctly parsed (default: false).
  • -d or --debug to enable debug mode, more verbose (default: false).

Customize input & output

The tool mostly generates the filter part of the Logstash configuration. The input and output sections are copied from the input.conf and output.conf files that you can customize.

Note: the filter-url.conf file adds a section at the end of the Logstash configuration to deal with urls. The filter-ua.conf parses user agents. Both files can be customized, partially commented... In particular, the user-agent parsing can be resource intensive.

Output

You can grab the logstash-[device].conf file (or custom name you defined) generated by this script.

When the check-config flag has been activated, this configuration file is automatically tested by Logstash. The output of Logstash can be checked in the output-logstash-[device]-configtest.txt file that is created in the same directory than the rsa xml file input.

Understanding the tool (the how)

RSA Netwitness Log Parser is the piece of software ingesting data in the Netwitness platform. It comes with a nice UI (see the user guide). Elastic also provides 2 ways to ingest data into Elasticsearch: Logstash - as an ETL - and the Elasticsearch ingest pipelines. This tool focuses on Logstash, as a way to ease ingest (capturing data via syslog, files, etc and writing to elasticsearch or other destinations) but the plan is to port this tool to the Elasticsearch ingest pipeline (leveraging Filebeat as syslog termination).

The syntax

The syntax of the XML configuration file is specific to RSA and falls into 2 parts mainly:

  • headers, describing headers of logs, capturing the first fields that are common to many types of messages. These headers then point (using the messageid field) to the appropriate message parser
  • messages, parsing the whole log line, extracting fields, computing the event time (EVNTTIME function), concatenating strings and fields to generate new ones (STRCAT function), setting additional fields with static or dynamic values, etc

In both, the content attribute describes how the log is parsed. The syntax supports alternatives {a|b}, field extraction <fld1> and static strings.

The transform.py module does the core of the conversion by reading this content line character after character and computing the corresponding grok pattern. The whole idea of the grok pattern is to capture fields with any character but the one after the field. For example, <fld1> <fld2> in RSA will result in ?<fld1>[^\s]*)[\s]+(?<fld2>.*) in grok. Note that the [\s]+ in the middle is quite permissive because many products use several spaces to tabularize their logs. The -s flag can be used to change this behavior to strictly match the log according to the exact number of spaces in the RSA configuration. This flag will replace the [\s]+ by a simple \s.

RSA can also handle missing fields when reading specific characters. For example, this RSA parser <fld1> "<fld99>" will match both aaa "zzz" (where fld1='aaa') and aaa bbb "zzz" (where fld1='aaa bbb'). The -a flag will let the user input specific characters that will serve as anchors, so that when they are found, the grok will jump over the unexpected fields. Using the above example, the grok will look like (?<fld1>[^\s]*)[\s]+(?<anchorfld>[^\"]*)\"(?<fld99>[^\"]+)\". Please note that we are adding a anchorfld field to capture the possible characters before the anchor, so for aaa bbb "zzz", the anchorfld field will only have 'bbb'). Which is what you would expect I think ;-)

Note: dissect (see documentation) is faster and easier to read but doesn't support alternatives. Could be an improvement though.

RSA meta fields to Elastic Common Schema (ECS)

RSA uses specific field names in the configuration files that map to meta keys, as described here. Elastic also defined a set of meta fields called ECS, see documentation. The rsa2ecs.txt file is used to map RSA meta fields to ECS naming (as well as field types).

TODO

There are still a few ideas to improve this rsa2elk:

  • for content lines that don't use alternatives, generate a dissect instead of a grok
  • input a custom ecat.ini (RSA customers)
  • input a custom table-map.xml and table-map-custom.xml (RSA customers)
  • support additional custom enrichment with external files (RSA customers)
  • generate the Elasticsearch index mapping (template) based on the ecs map
  • use the DEVICEMESSAGES in the XML file to set the device name and group
  • port this converter to Elasticsearch ingest pipeline (see documentation), specially since Elasticsearch 7.5 added an enrichment processor

Authors

  • Vincent Maury - Initial commit - blookot

License

This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details

Acknowledgments

  • First things first, I should thank RSA for sharing such content and helping the community with great resources!
  • Many thanks to my Elastic colleagues for their support, in particular @andsel, @jsvd and @yaauie from the Logstash team, as well as @webmat and @melvynator for the ECS mapping
  • Thanks also to my dear who let me work at nights and week-ends on this project :-*