Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THREESCALE-8000 + THREESCALE-8007 + THREESCALE-8252 (clear context policy) #1328

Merged
merged 2 commits into from
Mar 15, 2022

Conversation

samugi
Copy link
Contributor

@samugi samugi commented Feb 24, 2022

This PR adds compatibility with this OpenResty change.
It takes care of clearing the local context at the beginning of every separate request.

We use the context to determine routing to Products (when APICAST_PATH_ROUTING is enabled) or to Backends (when using APIaaP). Both those routing scenarios can be affected by the bug when connections are reused (see also KCS1 and KCS2)

@samugi samugi requested a review from a team as a code owner February 24, 2022 15:29
@eloycoto
Copy link
Contributor

umm, maybe I do not understand the full context here, but I do not think that context should be clear at all. Maybe what we should do it drop support of SSL_context on ssl_certificate phase and keep as was before the change.

@samugi
Copy link
Contributor Author

samugi commented Feb 25, 2022

umm, maybe I do not understand the full context here, but I do not think that context should be clear at all. Maybe what we should do it drop support of SSL_context on ssl_certificate phase and keep as was before the change.

The idea is based on the problem being that our code does not expect the context to be shared across different requests but rather for it to be scoped by request (this is true at least for the path routing algorithm and the routing policy).

The option we are exploring here would address this by resetting the context at the beginning of each request (which is what this policy does), that should be equivalent to what happens when the ssl_certificate phase is not executed at all (i.e. there will be nothing in the current context when the first policy executes for a certain request).

Another option (which I believe is what you are suggesting?) would be to avoid interacting with the context in all the ssl_certificate phases across all policies but that seemed more of a breaking change due to this and this for example.

@samugi samugi changed the title [WIP] THREESCALE-8000 + THREESCALE-8007 (clear context policy) [WIP] THREESCALE-8000 + THREESCALE-8007 + THREESCALE-8252 (clear context policy) Feb 26, 2022
@samugi
Copy link
Contributor Author

samugi commented Feb 27, 2022

After some more testing I found another problem with the same root cause: THREESCALE-8252.

@eloycoto I just committed a slightly modified version of the policy that I believe to be a safer option: it resets the current context at the end of every request (in the log phase of the last policy in the chain) rather than at the beginning. This way the reset of the context will happen when no other policy will need to access to it, and so it will be reinitialized correctly in the executor on the next request. If in the future we will want some parts of the context to be shared through different requests it will be easy to tune the policy accordingly to leave those.

This commit also takes care of THREESCALE-8252 because the original_request will be populated as normal here.

I added integration tests to confirm all 3 issues are covered.

@eloycoto
Copy link
Contributor

umm, I do not think that will work @samugi , mainly because requests can be concurrent, and that can be a problem. SSL context should be disabled.

@samugi
Copy link
Contributor Author

samugi commented Feb 27, 2022

umm, I do not think that will work @samugi , mainly because requests can be concurrent, and that can be a problem. SSL context should be disabled.

@eloycoto does that mean concurrent requests could share the same context? I thought that wouldn't be possible especially since we are relying on context.current for things like routing and some objects in particular (e.g. context.current.service) require to be scoped by request but perhaps I'm missing something.

In what scenario would properties of context.current be reused after a policy chain completes its execution for a certain request? Is there any test that you would be expecting to fail?

Also I thought we were relying on the SSL context for a few things now, am I wrong? I thought this to be the case for the TLS policy for example, which requires the service to have already been determined during the ssl_certificate phase. I'll double check though, maybe we are storing the service in the context for other reasons there, I haven't looked into that yet.

Thanks for your feedback

@eloycoto
Copy link
Contributor

SSL certificate only happens once, on ssl_handshake, and from there many more requests to come.

So, no set context on ssl_certificate and initialize on rewrite phase where the service will be in there.

I mean, if you change this line should be ok:

https://github.com/3scale/APIcast/blob/master/gateway/src/apicast/policy/find_service/find_service.lua#L62

@kevprice83
Copy link
Member

@eloycoto I already suggested the same and @samugi gave some reasoning why that wouldn't be a good idea. I forget exactly what that was now but I suspect it is something related to the fact that removing the context in the ssl_certificate_phase could end up breaking the new behaviour introduced in OpenResty that @samugi mentioned at the top of this PR. My thinking here was that we never needed that behaviour previously and therefore we would not need it now and given that we have a very specific use case regarding the context in APIcast probably we can just unset it during that phase and initialise it in the next (rewrite) phase.

@samugi could you remind why else you thought that was a bad idea because I cannot remember now. I think I remember you preferred to keep the behaviour introduced in OpenResty and only improve how we manage the management of the context object right?

I'd say in general that's the best approach and keep compatibility with as much OpenResty features as possible but I also think this was a breaking change in OpenResty that they didn't handle very well and as a result we have to cope with this somehow.

One advantage to Samuele's proposal I'd say is that this behaviour could be configurable and so in future if we need to "enable" that OpenResty "feature" again then skipping or disabling the policy would be easy don't you think @eloycoto ?

@samugi
Copy link
Contributor Author

samugi commented Mar 1, 2022

I discussed this with @kevprice83 this morning, here is a recap of our discussion:

We believe we have 3 2 ways of addressing this problem:
1. The current proposal here (clearing the context in the log phase of the last policy in the chain).

Pros:

Safe because it clears the context at the end of the policy chain when no other policy could need to access it
Modular: can be reverted or altered by removing / modifying the policy
Allows to take advantage of the OpenResty change should we need to share the context from ssl_* phases in the future

Cons:

We would execute some code (the claring context part) even when it's not strictly necessary (e.g. in the event the ssl_certificate phase was never executed)

  1. Clearing the context right after these lines

    Pros:

    • The change would only apply when strictly necessary, during the ssl_certificate phase

    Cons:

    • Potentially in the future we might want to access the context of the ssl_certificate phase in a policy that comes after find_service and that would be problematic because:
      • the context would have been cleared already
      • any change in the context executed after find_service would then be shared with multiple requests (potentially producing the same issue we see now)
    • Does not allow to take advantage of the OpenResty change should we need to share the context from ssl_ phases in the future
  2. Clearing the context in the clear_context policy, from the last position in the policy chain, but in the ssl_certificate phase (rather than the log phase)

    Pros:

    • The change would only apply when strictly necessary, during the ssl_certificate phase
    • Modular: can be reverted or altered by removing / modifying the policy

    Cons:

    • Does not allow to take advantage of the OpenResty change should we need to share the context from ssl_ phases in the future

What would you think of the proposals above @eloycoto ?

EDIT: After another quick sync with Kevin we deleted option 1 (current implementation) after we realized this to actually be problematic for concurrent requests (possibly what Eloy meant in his comment up here). I will wait for further comments before applying either option 2 or 3 (or anything else we will agree on).

Also one more thing just to clarify a point from the previous comments: when you said "SSL context should be disabled." and "So, no set context on ssl_certificate and initialize on rewrite phase where the service will be in there.", did you in fact mean to suggest the context should be cleared during the ssl_certificate phase or do you actually meant that we should completely stop accessing / sharing a context in the ssl_certificate phase of any policy?

@kevprice83
Copy link
Member

Would be good to get some feedback from @sergioifg94 & @eguzki on that last comment maybe?

Copy link
Member

@kevprice83 kevprice83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to run the test requests concurrently to see the outcome. I'd expect one of them to fail in which case that would make it clearer that we then must clear the context at the end of the ssl_certificate_phase in the last policy in the policy chain. Other than that I think doing this as a separate policy is a good approach. Maybe using the concurrent library here?

httpc:close()
}
--- no_error_log
[error]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline required at end of integration test files I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



=== TEST 36: Multiple requests that reuse the same TCP connection.
Routing must work as expected to the right backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Routing policy must choose correct backend based on current configuration & request parameters"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

version = 1.1
})
assert(res2, "Request failed"..(err2 or ""))
--check for incorrect routing THREESCALE-8007:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Check if incorrect routing as reported in THREESCALE-8007 is happening:" might be a more explicit comment :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…es care of concurrent requests scenarios. Adapted some tests to match hosts correctly since now those policy chains (that only include policies that don't implement ssl_certificate) no longer use the context.service defined in find_service.lua/ssl_certificate (that was obtained using the server_name of the SNI during the handshake), so they need to specify correct Host headers like in a real request scenario.
@samugi
Copy link
Contributor Author

samugi commented Mar 4, 2022

I think it would be useful to run the test requests concurrently to see the outcome. I'd expect one of them to fail in which case that would make it clearer that we then must clear the context at the end of the ssl_certificate_phase in the last policy in the policy chain. Other than that I think doing this as a separate policy is a good approach. Maybe using the concurrent library here?

for this one I used request_pipeline. I then verified that in fact the tests fail when clearing the context in the log phase (which wouldn't handle concurrent requests) and pass when the context is cleared at the end of the ssl_certificate phase (as we expected).

@samugi samugi requested a review from kevprice83 March 4, 2022 10:59
Copy link
Member

@kevprice83 kevprice83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussions and previous feedback this looks good to me!

@eloycoto
Copy link
Contributor

eloycoto commented Mar 5, 2022

yeah, make sense that ssl_certificate, because all connections do the ssl_handshake before it. I have another thing in mind here, use with tls_termination policy too, because clear the context is ssl_certificate happens before ssl_handshake on tls_termination.

Please raise this issue to the QE team, @DavidRajnoha is no longer working on this, but I'm sure that @mganisin can make a test plan for this. But overall, it's testing that two services in the same APICAST landed in the right place.

@kevprice83 kevprice83 changed the title [WIP] THREESCALE-8000 + THREESCALE-8007 + THREESCALE-8252 (clear context policy) THREESCALE-8000 + THREESCALE-8007 + THREESCALE-8252 (clear context policy) Mar 8, 2022
@eguzki eguzki merged commit 263b6a6 into 3scale:master Mar 15, 2022
@samugi
Copy link
Contributor Author

samugi commented Apr 19, 2022

there seems to be a problem when the tls policy is in the policy chain and executes this code which exits and does not permit the clearing context code to be executed. See: #1333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants