Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http2: refactor flood checks into a separate class. #13098

Merged
merged 7 commits into from
Sep 18, 2020

Conversation

yanavlasov
Copy link
Contributor

Commit Message:
http2: refactor flood checks into a separate class.

Additional Description:
This change is a prerequisite for addressing flood checks outside of the dispatch context and later extending them to the upstream servers as well. The change should have no functional impact.

Risk Level: Medium (codec refactor)
Testing: Unit, integration tests
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Yan Avlasov yavlasov@google.com

Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
@yanavlasov yanavlasov marked this pull request as ready for review September 15, 2020 00:52
@yanavlasov yanavlasov requested a review from asraa September 15, 2020 02:45
Signed-off-by: Yan Avlasov <yavlasov@google.com>
@antoniovicente
Copy link
Contributor

/assign @antoniovicente

if (outbound_control_frames_ > max_outbound_control_frames_ && dispatching_downstream_data_) {
stats_.outbound_control_flood_.inc();
throw FrameFloodException("Too many control frames in the outbound queue.");
auto result = protocol_constraints_.status();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we skip the constraints status check if dispatching_downstream_data_ is false?

Also, is there a chance that status flips due to upstream data and then the next downstream data update triggers the error condition?

Copy link
Contributor Author

@yanavlasov yanavlasov Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.
Yes the upstream data can flip status and next downstream data read will trigger connection error. This is how DATA and HEADER flood checks work, see for instance these tests. This is about to change though in the next PR where upstream protocol violations will cause connection to be disconnected by scheduling a callback.

if (outbound_control_frames_ > max_outbound_control_frames_ && dispatching_downstream_data_) {
stats_.outbound_control_flood_.inc();
throw FrameFloodException("Too many control frames in the outbound queue.");
auto result = protocol_constraints_.status();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting a call to protocol_constraints_.checkOutboundQueueLimits(); in here, but that's not what I'm seeing. Would it make sense to call checkOutboundQueueLimits or rename this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

releaseOutboundFrame();
}

Status ProtocolConstraints::checkOutboundQueueLimits() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatching_downstream_data_ is no longer considered in this computation. Is that a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still checked in the codec class. I wanted the constraints checker to be free of that detail. The dispatching_downstream_data_ is going away soon when flood checks are applied to upstream servers too.


Status ProtocolConstraints::checkOutboundQueueLimits() {
if (outbound_frames_ > max_outbound_frames_) {
stats_.outbound_flood_.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're over counting these stats since you end up calling checkOutboundQueueLimits() from the increment method in cases where you're outputting to upstream instead of downstream.

Possible solution: remove status_ and call checkOutboundQueueLimits() from the downstream connection's checkOutboundQueueLimits() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check* methods are not called if the status has flipped. I have moved status check into the methods to make more readable. Also tests verify stats counts to make sure there is no overcounting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is related processing in the upstream direction. What prevents outbound flooding in the upstream direction from triggering and incrementing this stat? I think we agree that it is possible to call ProtocolConstraints::checkOutboundQueueLimits in the upstream direction, and the overflow condition can happen in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was a problem for outbound frame checks. Should be fixed now.

Signed-off-by: Yan Avlasov <yavlasov@google.com>
@yanavlasov
Copy link
Contributor Author

#12279

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm misunderstanding something. It seems that some of the changes you're doing are leaking into ClientConnectionImpl which handles the upstream direction.

outbound_control_frames_ <= max_outbound_control_frames_);
// Protocol constrain violations should set the nghttp2_callback_status_ error, and return at
// the statement above.
ASSERT(protocol_constraints_.status().ok());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to fail this ASSERT in the upstream direction? Should you be calling ASSERT(checkProtocolConstraintsStatus()); instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not be anymore. Both inbound and outbound frame limits are only checked in the server codec now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be possible now, since the flood checks that can flip status are only done by the server codec.


Status ProtocolConstraints::checkOutboundQueueLimits() {
if (outbound_frames_ > max_outbound_frames_) {
stats_.outbound_flood_.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is related processing in the upstream direction. What prevents outbound flooding in the upstream direction from triggering and incrementing this stat? I think we agree that it is possible to call ProtocolConstraints::checkOutboundQueueLimits in the upstream direction, and the overflow condition can happen in that case.

break;
}

status_.Update(checkInboundFrameLimits());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be using status_ for both the inbound and outbound directions. Is that a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by design. The idea is that only the first violation is recorded and the connection is disconnected at that point. In this case unwinding a deep stack and possibly triggering other errors would not mask the first problem.

RETURN_IF_ERROR(incrementOutboundFrameCount(is_outbound_flood_monitored_control_frame));
auto releasor =
protocol_constraints_.incrementOutboundFrameCount(is_outbound_flood_monitored_control_frame);
RETURN_IF_ERROR(checkProtocolConstraintsStatus());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the call to incrementOutboundFrameCount be done by a ServerConnectionImpl override, and have the ClientConnectionImpl version be a no-op at least until a ClientConnectionImpl version is fully implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was a premature change on my part.

@yanavlasov
Copy link
Contributor Author

Yes, it does look like I have leaked the check into the client connection as well. I will fix it.

Signed-off-by: Yan Avlasov <yavlasov@google.com>
Signed-off-by: Yan Avlasov <yavlasov@google.com>
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise. It's great to see progress towards generalizing these security checks.

@@ -281,7 +281,7 @@ Network::FilterStatus ConnectionManagerImpl::onData(Buffer::Instance& data, bool
const Status status = codec_->dispatch(data);

ASSERT(!isPrematureResponseError(status));
if (isBufferFloodError(status)) {
if (isBufferFloodError(status) || isInboundFramesWithEmptyPayloadError(status)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be appropriate to ASSERT(status.ok()); on line 292, after the end of this if block?

Question is related to the ASSERT(!isPrematureResponseError(status)); on the previous line, is there a danger of us ignoring some of error status that is added in the future? Granted, this seems like a pre-existing issue.

Copy link
Contributor Author

@yanavlasov yanavlasov Sep 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is a good suggestion. Fixed

Signed-off-by: Yan Avlasov <yavlasov@google.com>
@yanavlasov yanavlasov merged commit 68967b7 into envoyproxy:master Sep 18, 2020
@yanavlasov yanavlasov deleted the flood-disconnect branch October 3, 2020 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants