Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we don't schedule reloading configuration when reloading is disabled #1468

Merged

Conversation

coderbydesign
Copy link
Contributor

Details of the issue as well as local reproducers are documented in #1467.

This change ensures that when boot.init_worker checks to see whether or not
the configuration needs to be reloaded, it also checks to confirm that the boot.ttl()
value set from APICAST_CONFIGURATION_CACHE is a positive number, in order to avoid
an immediate and infinite recursive loop of refreshing the configuration.

This should still preserve the initial fix [1] where this regression appears to
have been introduced, as we should never schedule reloading with the configuration settings of:

APICAST_CONFIGURATION_LOADER: boot
APICAST_CONFIGURATION_CACHE: -1

We should also never have a situation where we'd need to account for interval = 0
since we fail out in boot.init in that case [2].

Tested locally as documented in the issue, however based on the comment in the initial
fix PR, I've not added any tests in code. Happy to do so if there's now a good way
to accomplish that!

[1] #1399
[2]

if boot.ttl() == 0 then
ngx.log(ngx.EMERG, 'cache is off, cannot store configuration, exiting')
os.exit(0)
end

…sabled

Details of the issue as well as local reproducers are documented in 3scale#1467.

This change ensures that when `boot.init_worker` checks to see whether or not
the configuration needs to be reloaded, it also checks to confirm that the `boot.ttl()`
value set from `APICAST_CONFIGURATION_CACHE` is a positive number, in order to avoid
an immediate and infinite recursive loop of refreshing the configuration.

This should still preserve the initial fix [1] where this regression appears to
have been introduced, as we should never schedule reloading with the configuration settings of:

```
APICAST_CONFIGURATION_LOADER: boot
APICAST_CONFIGURATION_CACHE: -1
```

We should also never have a situation where we'd need to account for `interval = 0`
since we fail out in `boot.init` in that case [2].

Tested locally as documented in the issue, however based on the comment in the initial
fix PR, I've not added any tests in code. Happy to do so if there's now a good way
to accomplish that!

[1] 3scale#1399
[2] https://github.com/3scale/APIcast/blob/7e7eaf6f1d584c78d999c0d09f5b65203161c402/gateway/src/apicast/configuration_loader.lua#L146-L149
@tkan145
Copy link
Contributor

tkan145 commented Jun 10, 2024

LGTM, @eguzki need your feedback as well

@coderbydesign coderbydesign changed the title Ensure we don't schedule reloading configuration when reloading is dsabled Ensure we don't schedule reloading configuration when reloading is disabled Jun 10, 2024
@eguzki
Copy link
Member

eguzki commented Jun 11, 2024

Thanks @coderbydesign for the fix.

Your fix is indeed right. I overlooked completely the use case for APICAST_CONFIGURATION_CACHE=-1.

I want to propose an alternative fix, more readable IMHO:

diff --git a/gateway/src/apicast/configuration_loader.lua b/gateway/src/apicast/configuration_loader.lua
index f1eff77e..ec128fe9 100644
--- a/gateway/src/apicast/configuration_loader.lua
+++ b/gateway/src/apicast/configuration_loader.lua
@@ -204,22 +204,15 @@ function boot.init_worker(configuration)
     schedule(interval, handler, ...)
   end
 
-  -- Check whether the reserved boot configuration is fresh or stale.
-  -- If it is stale, refresh configuration
-  -- When a worker process is (re-)spawned,
-  -- it will start working with fresh (according the ttl semantics) configuration
-  local boot_reserved_hosts = configuration:find_by_host(boot_reserved_domain, false)
-  if(#boot_reserved_hosts == 0)
-  then
-    -- the boot configuration has expired, load fresh config
-    ngx.log(ngx.INFO, 'boot time configuration has expired')
-    -- ngx.socket.tcp is not available at the init or init_worker phases,
-    -- it needs to be scheduled (with delay = 0)
-    schedule(0, handler, configuration)
-  elseif(interval > 0)
-  then
-    ngx.log(ngx.DEBUG, 'schedule new configuration loading')
-    schedule(interval, handler, configuration)
+  if interval > 0 then
+      -- Check whether the reserved boot configuration is fresh or stale.
+      -- If it is stale, refresh configuration
+      -- When a worker process is (re-)spawned,
+      -- it will start working with fresh (according the ttl semantics) configuration
+      local boot_reserved_hosts = configuration:find_by_host(boot_reserved_domain, false)
+      ngx.log(ngx.DEBUG, 'schedule new configuration loading')
+      local curr_interval = #boot_reserved_hosts == 0 and 0 or interval
+      schedule(curr_interval, handler, configuration)
   else
     ngx.log(ngx.DEBUG, 'no scheduling for configuration loading')
   end

Verification steps

I have tested locally succesfully the following use cases:

Common initial setup for all use cases

make development
make dependencies 
cat <<EOF >config.json
{
  "services": [
    {
      "backend_version": "1",
      "proxy": {
        "hosts": [
          "example.com"
        ],
        "api_backend": "https://echo-api.3scale.net",
        "backend": {
          "endpoint": "http://127.0.0.1:8081",
          "host": "backend"
        },
        "proxy_rules": [
          {
            "http_method": "GET",
            "pattern": "/",
            "metric_system_name": "hits",
            "delta": 1,
            "parameters": [],
            "querystring_parameters": {}
          }
        ],
        "policy_chain": [
          {
            "name": "apicast.policy.apicast"
          }
        ]
      }
    }
  ]
}
EOF
  • APICAST_CONFIGURATION_CACHE=-1
APICAST_LOG_LEVEL=debug APICAST_WORKERS=1 THREESCALE_CONFIG_FILE=config.json APICAST_CONFIGURATION_LOADER=boot APICAST_CONFIGURATION_CACHE=-1  ./bin/apicast

You should see the (debug) log line about no scheduling for configuration loading:

2024/06/11 21:51:42 [debug] 249417#249417: *1 configuration_loader.lua:217: init_worker(): no scheduling for configuration loading

Even if you kill the worker process, the re-spawned worked process does not load config and uses the configuration at boot time

docker exec -ti -u root apicast_build_0-development-1 /bin/bash
bash-4.4# ls /proc | grep '[0-9]' | xargs -I {} sh -c '[ -a /proc/{}/cmdline ] && grep -q -P "nginx: worker process" /proc/{}/cmdline && echo killing pid {} && kill {}'

APICAst logs confirm the re-spawned worked process does not load config

2024/06/11 21:54:26 [notice] 249403#249403: signal 17 (SIGCHLD) received from 249417
2024/06/11 21:54:26 [notice] 249403#249403: worker process 249417 exited with code 0
2024/06/11 21:54:26 [notice] 249403#249403: start worker process 249558
2024/06/11 21:54:26 [notice] 249403#249403: signal 29 (SIGIO) received
2024/06/11 21:54:26 [debug] 249558#249558: *2 executor.lua:26: init_worker(): executor phase: init_worker
2024/06/11 21:54:26 [debug] 249558#249558: *2 policy_chain.lua:214: init_worker(): policy chain execute phase: init_worker, policy: Load Configuration, i: 1
2024/06/11 21:54:26 [debug] 249558#249558: *2 configuration_loader.lua:217: init_worker(): no scheduling for configuration loading
  • APICAST_CONFIGURATION_CACHE=30
APICAST_LOG_LEVEL=debug APICAST_WORKERS=1 THREESCALE_CONFIG_FILE=config.json APICAST_CONFIGURATION_LOADER=boot APICAST_CONFIGURATION_CACHE=30  ./bin/apicast

You should see the (debug) log line about schedule new configuration loading:

2024/06/11 21:55:48 [debug] 249600#249600: *1 configuration_loader.lua:213: init_worker(): schedule new configuration loading

And every 30 secs,

2024/06/11 21:56:48 [info] 249600#249600: *3 configuration_loader.lua:194: auto updating configuration, context: ngx.timer

The functionality implemented in https://github.com/3scale/APIcast/pull/1399/files still applies:

  • killing the worker before the first 30 secs expired, the configuration is not being loaded by the worker process immediately and waits TTL (30secs) for loading it.
  • killing the worker after the first 30 sec expired, the configuration is being loaded by the worker process immediately.

@coderbydesign
Copy link
Contributor Author

Thanks for the eyes, @eguzki! I'm going to apply this patch locally and test for sanity as well, but then am happy to update this PR with the proposed change. I was actually curious if that entire conditional could be reworked, but didn't want to change too much without all the historical context. What you've explained makes total sense.

Simplifies the conditional to reload configuration based on the interval set by
`APICAST_CONFIGURATION_CACHE`.

Also conditionally sets the interval to schedule.
@coderbydesign
Copy link
Contributor Author

Thanks for your patience on this, @eguzki. When you're able, see the latest update based on your suggestions.

@eguzki
Copy link
Member

eguzki commented Jun 26, 2024

changes looks look to me.

Re-running the test as we have some (hateful) flaky tests

@eguzki eguzki merged commit b75d3b8 into 3scale:master Jun 26, 2024
14 checks passed
@eguzki
Copy link
Member

eguzki commented Jun 26, 2024

@tkan145 we might want to backport this fix to 3scale-2.15-stable branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants