-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic reload first implementation #2152
Conversation
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign @aledbf |
CLA signed |
internal/ingress/controller/nginx.go
Outdated
@@ -602,6 +615,31 @@ func (n *NGINXController) OnUpdate(ingressCfg ingress.Configuration) error { | |||
|
|||
cfg.SSLDHParam = sslDHParam | |||
|
|||
for _, server := range ingressCfg.Servers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this feature in stages. Please remove this section. For this PR SSL should be handled by NGINX, not LUA.
@valeriano-manassero first, thank you for doing this. That said, the scope of this PR is huge. I propose to do this in stages, where the first one tackles only the upstream part, i.e. do not reload nginx when a new pod is added or removed from a service. For this, your route template you should have only |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: valeriano-manassero Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@valeriano-manassero please check the CI error (you need to fix the style of the go code) |
As every request will go through lua code, performance will decrease. LuaJIT can provide better performance. |
@oilbeater using LuaJIT already: |
@aledbf regarding CI errors, I fixed code style. Now I'm getting error during TestStore function. For me it's confusing since I don't get this error while making coverage on my envirnonment. |
Codecov Report
@@ Coverage Diff @@
## master #2152 +/- ##
=========================================
+ Coverage 36.45% 36.75% +0.3%
=========================================
Files 69 69
Lines 4861 4957 +96
=========================================
+ Hits 1772 1822 +50
- Misses 2819 2859 +40
- Partials 270 276 +6
Continue to review full report at Codecov.
|
@aledbf As you suggested, I implemented first stage (no reload only on the upstream part). |
rootfs/etc/nginx/template/nginx.tmpl
Outdated
@@ -871,7 +906,6 @@ stream { | |||
{{ end }} | |||
|
|||
{{/* Add any additional configuration defined */}} | |||
{{ $location.ConfigurationSnippet }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a mistake, I don't how it happened. Fixing in next commit.
] | ||
} | ||
{{ end }} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can serialize the content of $backends in Go and avoid the template part? (with this are sure the JSON is valid)
@valeriano-manassero this looks really good. I just added two comments. |
@valeriano-manassero questions: Can you implement some kind of hash to provide sticky sessions This is just to provide the same features we have now in the upstream section Edit: this is a shorter version of the lua code https://github.com/canhnt/lua-sticky-session/blob/master/lua/sticky_session.lua#L43-L51 |
This would be really great to have so we provide the same features present in the upstream now |
@aledbf I'm on my way to improve code as you suggested. @oilbeater sorry, reading again your suggestion I understand now what you mean. I will set the Dockerfile to generate bytecode of my custom scripts and I will use them in nginx conf. Thank you all. |
Pushed all implementations as suggested. @aledbf I do not write the routes file anymore but I need the template to get a custom json to push in POST on /nginx_update |
@valeriano-manassero apologies for the delay but I had to solve some issues with the nginx image. |
For those interested in helping with this feature the image |
if !n.isForceReload() && n.runningConfig.Equal(&pcfg) { | ||
glog.V(3).Infof("skipping backend reload (no changes detected)") | ||
return nil | ||
if !n.cfg.DynamicReload { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration could change, like an update to the configmap. This should be evaluated and reload NGINX even with the dynamic reload feature enabled.
To test this you can add/change enable-vts: "true"
in the configuration configmap. You will see the diff of the nginx.conf (using the flag --v=2
) but there's no NGINX reload.
end | ||
|
||
assert(b.set_current_peer(selected_endpoint.hostname, selected_endpoint.port)) | ||
if (selected_endpoint.failtimeout ~= 0) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some logs like ngx.log(ngx.DEBUG, using endpoint <host>:<port>)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
glog.Infof("NGINX dynamic update not ready\n") | ||
} else { | ||
updateOK = true | ||
glog.Infof("NGINX dynamic update OK\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add something like
if glog.V(2) {
glog.Infof("Updating nginx endpoints with\n%v\n",content)
}
so with can see the content used by lua
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return fmt.Errorf("%v\n%v", err, string(o)) | ||
} | ||
} else { | ||
glog.Infof("NGINX reload not needed, executing live update only\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be reading the code wrong, but why is live updated needed in this case? It seems like the existing configuration is identical to the new one(referring to line 677).
Similarly if we are reloading at 683, why don't we break? What's the point of continuing with live update after reload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm right, there's the case where config file does not change, but live update needed to include new backends.
rootfs/etc/nginx/lua/update.lua
Outdated
ngx.req.read_body() | ||
local backends_json = ngx.req.get_body_data() | ||
local shared_memory = ngx.shared.shared_memory | ||
shared_memory:set("CFGS", backends_json, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if setting to the dictionary fails? We still return 200 and controller will think everything went well.
@@ -0,0 +1,112 @@ | |||
local json = require "json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a plain Lua JSON lib? To my knowledge cjson
is the recommended library to use for JSON encoding/decoding for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know cjson
cannot be used with LuaJIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been using it with LuaJIT, also Openresty includes it by default https://openresty.org/en/lua-cjson-library.html
IMHO instead of trying to match what LB algorithms Nginx supports we should focus on more general picture that is building strong foundation for dynamically configurable ingress-nginx. We can eventually get to a point where even new Ingress creation and removal would not require Nginx reload rather than only making endpoints and certificates dynamically reconfigurable. However this also means we will have more and more Lua middleware if we go down this path. Therefore I suggest following:
FWIW I have also been working on a similar feature at Shopify#16. |
ngx.status = 503 | ||
ngx.exit(ngx.status) | ||
end | ||
local cfgs = json.decode(cfgs_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might have significant performance penalty, because it is being done for every request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True but I not found alternatives. Only shared dicts are visible worker wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here https://github.com/Shopify/ingress/pull/16/files#diff-b00d77a6df9c8c05a483044b08e6bc50R87 is how I did it. I'm curious what do you think about that.
Basically periodically in every worker read the config from shared dictionary, decode it and store it locally.
Then the balancer algorithm will use the parsed/decoded config from local cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good implementation, congrats!
I understand proposed solution is not perfect but my intent was to add an experimental feature to be improved later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @valeriano-manassero! I made a PR at #2174. If you agree with the ideas there and fine with the implementation I suggest we try to get it merged and continue adding the different LB algorithms you implemented here on top of it.
It already provides a mechanism to easily and cleanly add support for more LB algorithms but there's still room for improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR link was incorrect in my comment above, it is now pointing to the right PR.
Not sure about this. Right now the ingress controller is "just" nginx. You can see the nginx.conf file and it does not requires more knowledge than that. If we go full lua then the users must learn a new thing.
This is not clear. One of the goals of this PR is to see exactly how this would work.
This is what this PR is doing, avoid reloads for endpoint changes.
This is one of the things why I need to see this PR working fully. |
|
||
if cookie_value == nil then | ||
local endpoints_roundrobin = ngx.shared.endpoints_roundrobin | ||
local ep_index = endpoints_roundrobin:get(http_host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should load balancing states not be scoped to backend/upstream instead? If I read this correctly you are load balancing per http_host rather than upstream. Which for example means given two domains and 5 endpoints, if I send a request to first domain and another request to the second domain they will both be proxied to the same endpoint.
@ElvinEfendi I forgot to mention one important point. Even if the ingress controller switches to lua 100% this does not mean there could not be a disruption in the traffic (websockets), just fewer reloads. If the pod is removed before the new version of the deployment is serving traffic is not possible to provide real blue/green deployments. |
local ep_index = endpoints_roundrobin:get(http_host) | ||
if ep_index == nil then | ||
selected_endpoint = backend.endpoints[1] | ||
endpoints_roundrobin:set(http_host, 1, 600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you reset the load balancing state after specifically 600s? Should not those be reset only when list of respective endpoints change?
endpoints_roundrobin:set(http_host, 1, 600) | ||
else | ||
selected_endpoint = backend.endpoints[new_index] | ||
endpoints_roundrobin:set(http_host, new_index, 600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be race condition between Nginx workers accessing/writing to this shared dictionary. IMHO the correct way to implement this is to use some kind of locking mechanism.
@pingles are you using EWMA( The default load balancing algorithm in dynamic mode is Round Robin. |
This PR adds a "dynamic-reload" mode.
If
--dynamic-reload=true
is added to the run command, NGINX will use LUA and will not reload if a server name or a backend (virtualhost) is added or deleted.This is a first implementation based on concepts expressed here: #1905