Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for additional LB options #71

Open
splitice opened this issue Mar 31, 2022 · 12 comments
Open

Support for additional LB options #71

splitice opened this issue Mar 31, 2022 · 12 comments

Comments

@splitice
Copy link
Contributor

e.g weight, max_fails, fail_timeout

This does raise the question as to how we specify these (equally?) to each server member

@holstvoogd
Copy link

+1

I've been using max_fails=0 to prevent all our applications going down when one is misbehaving.

Yes, we have a bit of a weird set up on AWS ECS where we use nginx for ssl termination & caching in front of a ALB that routes to multiple applications. When one application becomes unreachable, nginx would mark the upstream server (i.e. the ALB) as down and server 503's or something for 10s.

Unfortunatly, setting proxy_next_upstream off does not prevent this.

@splitice
Copy link
Contributor Author

splitice commented May 6, 2022

all the options that are on round robin shouldnt be that hard as jdomain extends round robin.

@holstvoogd Could you please explain your proxy_next_upstream comment? From my understanding that should behave the same with jdomain as with a round robin (without any weight, max_fails or fail_timeout).

@holstvoogd
Copy link

@splitice It was just a note/observatiuon that proxy_next_upstream none; does not prevent upstreams from being marked as down when they error/timeout. I missed that the first time I looked into how to prevent that behaviour and had to remind myself when looking into this 'issue'.

@nicholaschiasson
Copy link
Owner

Agree this would be a good addition. I don't know if it's possible since I haven't looked at the roundrobin code in a while, but ideally we could leverage the builtin module to parse any arguments not exposed by jdomain directly. Otherwise, I am not opposed to parsing them manually along with the rest of the arguments.

@splitice
Copy link
Contributor Author

@nicholaschiasson

By the way do you have any data on this module and how it interacts with other balancers (e.g least_conn)?

What might be required to make jdomain compatible with additional underlying balancers?

@nicholaschiasson
Copy link
Owner

@splitice

Not exactly what 'data' you'd be interested in, however I do have some (admittedly weak) tests defined which address the various other upstream-context directives, including least_conn and hash. (See t/004.compatibility_nginx.t)

If my memory serves, the way it's implemented is that jdomain, upon initialization, extends the current loadbalancer alg specified for that upstream block or roundrobin as default if none is specified, which basically means that in order to use an underlying LB algorithm other than roundrobin, you just need to declare that before any instance of a jdomain directive inside your upstream block.

@splitice
Copy link
Contributor Author

splitice commented May 19, 2022

In the situation:

upstream upstream_test {
	server 127.0.0.2;
	least_conn;
	jdomain example.com;
}

Assuming example.com expands to 4 servers how does least_conn behave?

5 total hosts each balanced for least_conn?

I did some testing with least_conn nearly a year ago and it didnt seem to be working. Perhaps I made a mistake in my testing. Perhaps I need to repeat that testing.


Regarding option parsing. It looks like we can get most of it just by passing uscf->flags of the underlying lb through (or'ed with anything additional supported by this module) and then building a fake server line from the jdomain line and parsing with ngx_http_upstream_server (instead of allocating our own server). This would potentially have the advantage of better compatibility with 3rd party modules.

Disadvantage of course is that string and array manipulation is ugly. Possibly slightly slower than our current method (configuration time performance).

How I would do it:

  1. take the jdomain ngx_conf_t (we are done with it anyway)
  2. remove options we handled
  3. send the line to ngx_http_upstream_server once for each max_ips
  4. Handle the return as an error

@nicholaschiasson
Copy link
Owner

Hi @splitice ,

Regarding testing of least_conn, I was taking a look at the test case I defined for it in the repo and I would say it's definitely a weak test. I recall not putting a lot of effort into defining that one since it was not 100% straightforward with the test framework to simulate concurrent requests and I didn't spend the time to figure it out.

The behaviour of the LB the way it's being tested in the repo currently, it does basically the same thing as regular round robin, since the upstream is only getting one connection at a time. In the future it would be great to harden that test case, and perhaps also define a test case for least_time.

@splitice
Copy link
Contributor Author

BTW I did one of the possible options in #84

To anyone who needs additional options, thats how you add it. The only way that I can see for now is to manually bring each over.

@kartikrao
Copy link

+1
This would help us as well, we have a similar setup to @holstvoogd, with NGINX talking to an ALB, a single timeout causes a flood of 502's.

@TOLIE-OFFICIAL
Copy link

e.g weight, max_fails, fail_timeout

This does raise the question as to how we specify these (equally?) to each server member

seems that this module do not support other arguements e.g weight, max_fails, fail_timeout, will it support in the future ? thx
@nicholaschiasson @splitice

@TOLIE-OFFICIAL
Copy link

+1 This would help us as well, we have a similar setup to @holstvoogd, with NGINX talking to an ALB, a single timeout causes a flood of 502's.

do u guys have any solutions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants