-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support VPC Flow Logs v6 fields for TGW #59
Conversation
@@ -144,6 +162,24 @@ def __init__(self, event_data, EPOCH_32_MAX=2147483647): | |||
('pkt_dst_aws_service', str), | |||
('flow_direction', str), | |||
('traffic_path', int), | |||
('resource_type', str), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logs we have do not necessarily abide by these aws tgw headers. The case we have is a one of a kind where tgw logs are mixed with other vpc flowlogs. For example: the value for account-id
is set to TransitGateway
. In ideal case it would be the resource-type
that should have been set to that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library should support mixed logs.
This PR should allow the user to do:
all_flow_logs = S3FlowLogsReader(...)
no_tgw_logs = (x for x in all_flow_logs if x.account_id != 'TransitGateway')
This would filter out the TGW logs.
I think this will help as well. The fields are mixmatched across the board. Here is an example of one of the VPCFlowLogs s3 files, that shows how they are mixmatched when a Transit gateway logs are injected in them. The fields not aligning is an issue for much more than just account_id and the default fields for TGW seem to be much wider. No idea why AWS seems to allow the logs to contain both.
|
Maybe there are two different things writing logs to the same location? And they have different formats configured? I've added a |
That is what seems to happen. If you turn on TGW logging and point it to the same S3 bucket, it will combine the logs and you end up with this result. |
flowlogs_reader/flowlogs_reader.py
Outdated
@@ -364,13 +400,15 @@ def __init__( | |||
include_accounts=None, | |||
include_regions=None, | |||
thread_count=0, | |||
check_column_count=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should default to True, no? This would prevent anything currently relying on this from needing to instantiate the class differently to take advantage of the added functionality. I can't think of a scenario in which we wouldn't want to check the column count (although future scenarios may arise so I like the added option of a toggle), just think it should be True. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set it to False
by default because it means we have to process every row twice, and because it's just a heuristic - as the comment notes, having correct number of columns isn't the same as having the same columns as expected.
@bbayles after some more discussion internally we've decided to drop the desire to check the column count and loop over each record twice. It turns out this is an edge case created specifically for one scenario and not something that can commonly happen by sending flow logs and transit gateway logs to S3. However we would still like to keep your changes for the future handling of TGW records. Can you remove the check column count and just leave the extra slots? If so we can approve and roll this up. Our new direction for handling this specific case will just be to skip over the record if it valueerrors and continuing parsing the rest of the file. See https://github.com/obsrvbl/flowlogs-reader/tree/try_catch_valueerror |
I've reverted the last two commits - thanks. |
This PR is an alternative to #58.
I've added support for each of the version 6 fields shown here. This should cause
flowlogs_reader
to be able to interpret logs with Transit Gateway records.