Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache) in-memory and shared dict caching strategy #1688

Merged
merged 15 commits into from
Nov 3, 2016
Merged

Conversation

subnetmarco
Copy link
Member

@subnetmarco subnetmarco commented Sep 27, 2016

This PR implements both a per-worker and a per-process caching strategy, as opposed to the previous per-process only strategy.

  • Previously the data would be cached per-process in a shared dictionary in JSON format. When a lookup is being executed, the worker would check if the entity exists in the shared dictionary. If it doesn't exist it looks it up in the datastore, otherwise it deserializes the JSON data into a Lua object.
  • With this PR the data is cached both as a Lua object in memory per-worker, and in the shared dictionary for the other workers like it used to do before. When a lookup is being executed, the worker first checks its local memory, if the entity doesn't exist it then checks the shared dictionary, and if it's still missing only then it looks it up in the datastore and stores it both locally and in the shared dictionary. If the entity exists locally, the worker just uses the data. If it exists in the shared dictionary, it deserialized the JSON data into a Lua object.

The performance is much faster because the JSON serialization/deserialization is not done every time.

Full changelog

  • New per-process and per-worker caching strategy to dramatically improve Kong performance.

Benchmark

Benchmarking of the proxy functionality of an API without any plugin, to a local API that returns hello world.

Plain nginx (~524.81 req/s with one worker):

worker_processes 1;
error_log logs/error.log info;
daemon off;

events {
  worker_connections 1024;
}

http {
  server {
    listen 9000;

    location / {
      proxy_pass http://127.0.0.1:3000/;
    }
  }
}
$ wrk -c100 -t20 -d30s http://127.0.0.1:9000/
Running 30s test @ http://127.0.0.1:9000/
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.55ms    3.27ms  36.03ms   89.39%
    Req/Sec   268.78     90.23   373.00     89.29%
  15795 requests in 30.10s, 2.39MB read
Requests/sec:    524.81

With this PR (~520.87 req/s with one worker):

$ wrk -c100 -t20 -d30s -H "Host: testapi" http://127.0.0.1:8000/
Kong started
Running 30s test @ http://127.0.0.1:8000/
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   185.25ms  321.63ms   1.16s    82.29%
    Req/Sec   142.87     91.43   290.00     52.99%
  15670 requests in 30.08s, 3.71MB read
Requests/sec:    520.87

Kong 0.9.2 (~281.41 req/s with one worker):

$ wrk -c100 -t20 -d30s -H "Host: api500" http://127.0.0.1:8000/
Kong started
Running 30s test @ http://127.0.0.1:8000/
  20 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   353.57ms   40.76ms 555.74ms   83.13%
    Req/Sec    15.52      9.33    50.00     69.93%
  8471 requests in 30.10s, 1.96MB read
Requests/sec:    281.41

@Tieske
Copy link
Member

Tieske commented Sep 27, 2016

Awesome improvement!

One huge caveat 🚫 ; we used to get a COPY of data from the cache. Now that it is cached in Lua, we get the ACTUAL CACHE. So whatever the code consuming the data does, it may no longer change anything on this data, as it would actually change the data in the cache!

we might want to set some metatables that enforce read-only behaviour on the cache data. Maybe with a debug flag, such that it will only check read-only behaviour when the debug flag is set.

@subnetmarco
Copy link
Member Author

@Tieske addressed that with f665ae6

Changing the value of a table returned by the cache will return an error.

@subnetmarco
Copy link
Member Author

subnetmarco commented Sep 27, 2016

@thibaultcha Using pl.tablex.readonly() creates a weird behavior. After setting up a table as readonly, the returning value is {}, although it's fields can be still accessed independently. Ie:

local pl_tablex = require "pl.tablex"

local t = {
  hello = "world"
}
print(require("inspect")(t))

t = pl_tablex.readonly(t)
print(require("inspect")(t))

print(t.hello)

returns:

{
  hello = "world"
}
{}
world

While we would expect:

{
  hello = "world"
}
{
  hello = "world"
}
world

This behavior breaks the Admin API when we return a JSON encoded value at https://github.com/Mashape/kong/blob/feat/cache/kong/tools/responses.lua#L120 since it always returns {}.

@subnetmarco subnetmarco added pr/wip A work in progress PR opened to receive feedback and removed pr/status/needs review labels Sep 27, 2016
@subnetmarco
Copy link
Member Author

I have disabled the readonly cache value because it turns out it's harder to implement right. The readonly wouldn't work for the child tables (unless iterating over each value, which is not performant over large tables) and because it actually hides the content of the table since it sets a new metatable on top of an {} table.

Another solution would be to deep copy the value, but again, on large tables it's not feasible.

@subnetmarco
Copy link
Member Author

Unless you have a better solution this PR is complete.

@subnetmarco subnetmarco added pr/status/needs review and removed pr/wip A work in progress PR opened to receive feedback labels Sep 28, 2016
wait_max = 0.5, -- max wait time before discarding event
}
if not ok then
ngx.log(ngx.ERR, "failed to start event system: ", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a hard error, not just logging. If this fails, Kong will not be stable, so it must error out and stop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is init_worker, and not init. There is no clean way to shutdown a worker and even less to shutdown all of the other workers, too. We could send a sigterm signal to the master process, but we're already out of the CLI at this point, so error handling might no be consistent with the CLI (shutting down other services) out of the box.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could still log this at a higher log level to give it more importance though, like ngx.CRIT or even maybe ngx.ALERT.

@subnetmarco subnetmarco added this to the 0.10 milestone Oct 3, 2016
@subnetmarco subnetmarco merged commit 1999d92 into next Nov 3, 2016
@subnetmarco subnetmarco deleted the feat/cache branch November 3, 2016 23:53
@subnetmarco subnetmarco restored the feat/cache branch November 8, 2016 21:40
@thibaultcha thibaultcha deleted the feat/cache branch November 29, 2016 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants