Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the limitations of Audit Logging for policy rules #6225

Merged

Conversation

antoninbas
Copy link
Contributor

Starting with Antrea v1.13, logging is best-effort, which means that logging cannot typically be used for compliance purposes.

For older Antrea versions, traffic for which logging was enabled would be dropped after a certain rate was reached, creating issues for production workloads.

@antoninbas antoninbas added the kind/documentation Categorizes issue or PR as related to a documentation. label Apr 15, 2024
Copy link
Contributor

@qiyueyao qiyueyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just three minor questions.

docs/antrea-network-policy.md Show resolved Hide resolved
would drastically restrict the number of possible DNS requests in the cluster,
which in turn would cause a lot of errors in applications which rely on DNS:

```yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more helpful to add a link to ACNP with log settings example here for comparison? And maybe change that previous example to a more suitable example to log (Drop), as opposed to this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am having a hard time understanding what you are suggesting. What difference are we trying to highlight by comparing with the acnp-with-log-setting example? And which example would you want to update to use the Drop action (and why)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that acnp-with-log-setting example does not suffer much from disrupting application workloads, rather than this allow-dns example, so I was referring to this difference.

I was wondering if we need to change acnp-with-log-setting to Drop as mentioned below especially when the policy rule uses the Allow action. (Quick question is it no longer the case that only the first packet of an Allow connection will be logged?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that acnp-with-log-setting example does not suffer much from disrupting application workloads, rather than this allow-dns example, so I was referring to this difference.

This policy applies to traffic from the application frontend to the DB layer. If it is a large scale application with a high number of connections, it could suffer from the same issue. In practice, for this specific case, the number of connections between the frontend and the DB is likely to stay "small", as the application is likely to use a connection pool instead of creating a new connection for each user session. But that really depends on the type of application.

The difference between these policies is more about the workloads to which they apply. I don't think there is a difference in how they are implemented. So they can both suffer from this issue in the same way.

In practice, users are more likely to experience this issue with CoreDNS because: 1) it seems common to enable logging for DNS requests, 2) even medium clusters usually have a large volume of DNS requests.

So I may still be missing your point.

I was wondering if we need to change acnp-with-log-setting to Drop as mentioned below especially when the policy rule uses the Allow action. (Quick question is it no longer the case that only the first packet of an Allow connection will be logged?)

With recent versions of Antrea (since v1.13), logging is fine with the Allow action, so why should we change the example? especially when the policy rule uses the Allow action is because the side effect of enabling logging with older Antrea versions is that flows are dropped after a certain rate limit. If the action is Allow, this is clearly a bigger deal than if the action is Drop anyway.

We only log the first packet of an Allow connection. That has not changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanations! Now I understand, no need for difference, and If the action is Allow, this is clearly a bigger deal than if the action is Drop anyway. makes total sense.

prior to v1.13**, especially when the policy rule uses the `Allow` action.

Note that v1.12 patch versions starting with v1.12.2 also do not suffer from
this issue, as we backported the fix to the v1.12 release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a personal feeling, do we usually include this backport detail in a readme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is helpful as users are more likely to find this information here, rather than look at the changelogs. I would be comfortable removing it after a few releases, but some users are running Antrea minor versions for a while, even after we stop maintaining them here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks!

tnqn
tnqn previously approved these changes Apr 16, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Qiyue's comments also make sense to me.

Starting with Antrea v1.13, logging is best-effort, which means that
logging cannot typically be used for compliance purposes.

For older Antrea versions, traffic for which logging was enabled would
be dropped after a certain rate was reached, creating issues for
production workloads.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas force-pushed the better-documentation-for-audit-logging branch from 4bed1c1 to 51364ad Compare April 17, 2024 17:48
Copy link
Contributor

@qiyueyao qiyueyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@antoninbas
Copy link
Contributor Author

/skip-all

@antoninbas antoninbas merged commit f684c2d into antrea-io:main Apr 17, 2024
49 of 52 checks passed
@antoninbas antoninbas deleted the better-documentation-for-audit-logging branch April 17, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to a documentation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants