README

Limit upstream module for nginx

=OVERVIEW===============================================================

  When you use nginx as a proxy, there is a problem you may have to face:
a upstream server is able to achieve high QPS within a relatively low resource
stress. For example, mysql works well at the speed of 8000-9000 QPS within 60 parallel
connections, but only 2000 QPS when the number of connections increases to 500.

  I do not find any module that will solve the problem because they all face client,
not the upstream. I tried the limit_req, but failed. when I write down "limit_req rate=8000/s",
I find there are over 2000 active connections between nginx and mysql,
and when I write down "limit_req rate=60/s", Jesus, it is really 60 QPS for my nginx.

  So, I have a try, and the limit_upstream module comes. This module will limit the
number of connections to a upstream server identified by its ip address. With it, I
could control the upstream connections. When it reaches the set threshold, new requests
will suspend and wait until a active upstream request finishes and wake them up,
otherwise, they will die when they get timeout. Of course, I can also specify the length
of waiting queue, and the waiting timeout, as limit_req do.

  In this module, each worker suspend its requests when upstreams' counters reach the
threshold, and resume them or cancel them all by itself. The only things they share
are the global upstreams' counters. A special case for a request be allowed to go to
upstream even when the threshold is reached is that, no request related to that upstream
is being processed at that time for the worker. as a result, the maximum number of
established connections to a upstream server is ('limit' + 'worker_processes').

=MANUAL=================================================================

Example:

http {
    server {
        location           =/test {
            proxy_pass      http://pool;
        }
    }

    limit_upstream_zone test 10m;

    upstream pool {
        server 10.232.36.98:3111;
        ...

        limit_upstream_conn limit=260 zone=test backlog=10000 timeout=180s;
        limit_upstream_log_level error;
    }
}

syntax:  limit_upstream_zone zone_name size;
default: -
context: http

  Define a shared memory pool for the module.

syntax:  limit_upstream_conn zone=zone_name limit=limit [backlog=length] [timeout=timeout | nodelay];
default: -
context: upstream

  Set the zone used, the maximum allowed for each server of the upstream,
the backlog length, and the timeout for each suspended request.
The default for 'backlog' is 1000, and the default for 'timeout' is 1000 ms.

  When backlog for a server is full, it may use other server, Otherwise,
it wait in the queue. Timeout will cause 408 return status code.

syntax:  limit_upstream_log_level [ error | warn | notice | info ];
default: limit_upstream_log_level notice;
context: http, upstream

  Define the log level of ordinary logs for the module.
However it is absolutely "WARN" for timeout or discard.

=INSTALL===============================================================

patch -p2 < nginx.patch (nginx 1.0.X, 1.2.X)
or
patch -p1 < nginx.1.4.4.patch (nginx 1.4.X)

./configure --add-module=/path/to/module
make
make install

=PATCH=================================================================

  The nginx.patch is based on nginx 1.0.11, and it is compatible with nginx 1.2.0.
However, I did not test it on the other versions :)
  The nginx.1.4.4.patch is based on nginx 1.4.4, and tested only under nginx 1.4.4.

=BUGS==================================================================

* bugfixed    2014.6.5
  Local variable 'pc' is only used for printing debug infomation, so there is a
warning when compiling nginx without the configuration '--with-debug'.

* bugfixed    2013.11.11
  The worker crushes after it processes a request. The problem occurs under
nginx 1.4.3, because nginx sets u->sockaddr to NULL before my module uses it.

* bugfixed    2012.7.9
  There is a critical bug that may make upstreams stuck. The description is
if a request uses a server, and the server fails, then the request tries another
server with the help of ngx_http_upstream_next, however, the counter of the
former server is not decreased, then the whole system fails to work after a number
of such fails.

=CHANGES===============================================================

  1.2.1 [2013.6.5] fixed a bug variable 'pc' is unused.

  1.2.0 [2013.11.11] fixed a bug occured under nginx 1.4.4, new patch for nginx 1.4.4

  1.1.0 [2012.7.17] bugfixed, and optimize for efficiency, 10% faster.

  1.0.0 [2012.5.31] initial version, including some bug fixes.

=USED WITH UPSTREAM KEEPALIVE==========================================

  Once nginx upstream keepalive module returns NGX_DONE, it means there is already
one setup connection, so my module let the request pass, no matter it overpasses the
threshold or not. There is a problem with this case (when the number reaches, nginx
trends to distribute upstream connections for newcomers among workers at the proportion
of connections already setup, and it is unfair). I think set keepalive as formula below,
  keepalive = limit_conn / worker_processes
may resolve the problem.

=MISUSE================================================================

1. someone uses it like this:

    limit_upstream_zone test 10m;

    upstream bb {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=8 zone=test backlog=2000 timeout=1500ms;
    }

    upstream cc {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=16 zone=test backlog=1000 timeout=500ms;
    }

  It will absolutely fail to work. The correct is:

    limit_upstream_zone test_high 10m;
    limit_upstream_zone test_low 10m;

    upstream bb {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=8 zone=test_low backlog=2000 timeout=1500ms;
    }

    upstream cc {
        server 127.0.0.1:3111;
        server 127.0.0.1:3112;

        limit_upstream_conn limit=16 zone=test_high backlog=1000 timeout=500ms;
    }