Support VPC Flow Logs v6 fields for TGW #59

bbayles · 2022-11-09T18:39:21Z

This PR is an alternative to #58.

I've added support for each of the version 6 fields shown here. This should cause flowlogs_reader to be able to interpret logs with Transit Gateway records.

tejasnadgoud · 2022-11-10T17:31:35Z

flowlogs_reader/flowlogs_reader.py

@@ -144,6 +162,24 @@ def __init__(self, event_data, EPOCH_32_MAX=2147483647):
            ('pkt_dst_aws_service', str),
            ('flow_direction', str),
            ('traffic_path', int),
+            ('resource_type', str),


The logs we have do not necessarily abide by these aws tgw headers. The case we have is a one of a kind where tgw logs are mixed with other vpc flowlogs. For example: the value for account-id is set to TransitGateway . In ideal case it would be the resource-type that should have been set to that value.

The library should support mixed logs.

This PR should allow the user to do:

all_flow_logs = S3FlowLogsReader(...) no_tgw_logs = (x for x in all_flow_logs if x.account_id != 'TransitGateway')

This would filter out the TGW logs.

BThacker · 2022-11-10T19:16:27Z

I think this will help as well.

The fields are mixmatched across the board.

Here is an example of one of the VPCFlowLogs s3 files, that shows how they are mixmatched when a Transit gateway logs are injected in them. The fields not aligning is an issue for much more than just account_id and the default fields for TGW seem to be much wider. No idea why AWS seems to allow the logs to contain both.

version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status
2 999999999999 eni-99999999999999999 10.10.10.10 10.10.10.11 65532 389 17 1 84 9999999999 9999999999 ACCEPT OK
6 TransitGateway 999999999999 tgw-99999999999999999 tgw-attach-99999999999999999 999999999999 999999999999 - vpc-99999999999999999 - subnet-99999999999999999 - eni-99999999999999999 usw2-az4 usw2-az3 tgw-attach-99999999999999999 10.10.10.10 10.10.10.11 9769 123456 6 7 1234 9999999999 8888888888 OK IPv4 0 0 0 0 - us-west-2 ingress - -

bbayles · 2022-11-10T22:13:05Z

Maybe there are two different things writing logs to the same location? And they have different formats configured?

I've added a check_column_count keyword to filter out rows with the wrong number of columns.

BThacker · 2022-11-11T16:38:27Z

Maybe there are two different things writing logs to the same location? And they have different formats configured?

I've added a check_column_count keyword to filter out rows with the wrong number of columns.

That is what seems to happen. If you turn on TGW logging and point it to the same S3 bucket, it will combine the logs and you end up with this result.

BThacker · 2022-11-30T15:10:22Z

flowlogs_reader/flowlogs_reader.py

@@ -364,13 +400,15 @@ def __init__(
        include_accounts=None,
        include_regions=None,
        thread_count=0,
+        check_column_count=False,


I believe this should default to True, no? This would prevent anything currently relying on this from needing to instantiate the class differently to take advantage of the added functionality. I can't think of a scenario in which we wouldn't want to check the column count (although future scenarios may arise so I like the added option of a toggle), just think it should be True. Thoughts?

I set it to False by default because it means we have to process every row twice, and because it's just a heuristic - as the comment notes, having correct number of columns isn't the same as having the same columns as expected.

BThacker · 2022-12-13T19:03:36Z

@bbayles after some more discussion internally we've decided to drop the desire to check the column count and loop over each record twice. It turns out this is an edge case created specifically for one scenario and not something that can commonly happen by sending flow logs and transit gateway logs to S3. However we would still like to keep your changes for the future handling of TGW records. Can you remove the check column count and just leave the extra slots? If so we can approve and roll this up.

Our new direction for handling this specific case will just be to skip over the record if it valueerrors and continuing parsing the rest of the file. See https://github.com/obsrvbl/flowlogs-reader/tree/try_catch_valueerror

This reverts commit 81682d9.

This reverts commit 14f12f0.

bbayles · 2022-12-13T20:10:32Z

I've reverted the last two commits - thanks.

bbayles added 2 commits November 9, 2022 12:37

Support VPC Flow Logs v6 fields for TGW

81081de

Fix long line

8bcccb5

tejasnadgoud reviewed Nov 10, 2022

View reviewed changes

bbayles added 2 commits November 10, 2022 16:07

Reject obviously-wrong CSVs

14f12f0

Add check_column_count kwarg

81682d9

BThacker reviewed Nov 30, 2022

View reviewed changes

bbayles mentioned this pull request Dec 13, 2022

New direction for handling failed flowrecords #60

Merged

bbayles added 2 commits December 13, 2022 14:09

Revert "Add check_column_count kwarg"

8e74449

This reverts commit 81682d9.

Revert "Reject obviously-wrong CSVs"

1c3b287

This reverts commit 14f12f0.

mjschultz approved these changes Dec 19, 2022

View reviewed changes

mjschultz merged commit 704ca58 into obsrvbl-oss:master Dec 19, 2022

bbayles deleted the tgw-support branch December 20, 2022 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support VPC Flow Logs v6 fields for TGW #59

Support VPC Flow Logs v6 fields for TGW #59

bbayles commented Nov 9, 2022

tejasnadgoud Nov 10, 2022

bbayles Nov 10, 2022

BThacker commented Nov 10, 2022 •

edited

Loading

bbayles commented Nov 10, 2022

BThacker commented Nov 11, 2022

BThacker Nov 30, 2022

bbayles Nov 30, 2022

BThacker commented Dec 13, 2022

bbayles commented Dec 13, 2022

Support VPC Flow Logs v6 fields for TGW #59

Support VPC Flow Logs v6 fields for TGW #59

Conversation

bbayles commented Nov 9, 2022

tejasnadgoud Nov 10, 2022

Choose a reason for hiding this comment

bbayles Nov 10, 2022

Choose a reason for hiding this comment

BThacker commented Nov 10, 2022 • edited Loading

bbayles commented Nov 10, 2022

BThacker commented Nov 11, 2022

BThacker Nov 30, 2022

Choose a reason for hiding this comment

bbayles Nov 30, 2022

Choose a reason for hiding this comment

BThacker commented Dec 13, 2022

bbayles commented Dec 13, 2022

BThacker commented Nov 10, 2022 •

edited

Loading