Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cfg] can not cleanup after unsuccessful start #918

Closed
krizhanovsky opened this issue Feb 18, 2018 · 1 comment
Closed

[Cfg] can not cleanup after unsuccessful start #918

krizhanovsky opened this issue Feb 18, 2018 · 1 comment
Assignees
Labels

Comments

@krizhanovsky
Copy link
Contributor

Conditions

Load too many server groups using the script from #76 to exhaust system memory on Tempesta FW start (it's good to use VM with small memory to reproduce the problem quickly). Need fixed #908 to avoid deadlock on stopping modules.

Description

In the conditions described above sock_srv start terminates with -ENOMEM error code and the system starts to stop all modules, but hangs in tfw_sg_wait_release(). It seems groups unregistering doesn't work properly.

[60281.321763] [tdb] Start Tempesta DB
[60281.923959] [tempesta] Initializing Tempesta FW kernel module...
[60282.036189] [tempesta] Registering new scheduler: hash
[60282.064816] [tempesta] Registering new scheduler: http
[60282.096349] [tempesta] Registering new scheduler: ratio
[60282.123114] [tempesta] Prepearing for the configuration processing.
[60475.161529] [tempesta] Configuration processing is completed.
[60475.985783] [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff93f4b9c00000
[60476.015806] TCP: request_sock_TCP: Possible SYN flooding on port 9090. Sending cookies.  Check SNMP counters.
[60541.289607] [tempesta] ERROR: Unable to start module 'sock_srv': -12
[60541.295307] [tempesta] Stopping all modules...
[60546.793950] [tdb] Close table 'filter0.tdb'
[60546.798267] [tempesta] modules are stopped
[60546.801731] [tempesta] Warning: Configuration parsing has failed. Clean up...
[60554.508073] [tempesta] Warning: pending for server callbacks to complete for 5s, 17 server groups still exist
[60559.512278] [tempesta] Warning: pending for server callbacks to complete for 5s, 17 server groups still exist
[60564.516044] [tempesta] Warning: pending for server callbacks to complete for 5s, 17 server groups still exist
[60569.520133] [tempesta] Warning: pending for server callbacks to complete for 5s, 17 server groups still exist
.......
@krizhanovsky
Copy link
Contributor Author

Somehow the issue is hard to reproduce, even on the old code as of #76 fixes. To emulate the issue I used following patch against current master (25fb836):

diff --git a/tempesta_fw/sock_srv.c b/tempesta_fw/sock_srv.c
index cc4ea72..81223a3 100644
--- a/tempesta_fw/sock_srv.c
+++ b/tempesta_fw/sock_srv.c
@@ -2158,6 +2158,7 @@ tfw_sock_srv_start(void)
        list_for_each_entry(sg_cfg, &sg_cfg_list, list) {
                if ((r = tfw_cfgop_start_sg_cfg(sg_cfg)))
                        return r;
+               return -ENOMEM;
                tfw_srv_loop_sched_rcu();
        }

Config file is

listen 192.168.100.4:80;
server 127.0.0.1:9090;
cache 0;

and there is no listening server on port 9090. Tempesta starts with following log:

[  286.965038] [tdb] Start Tempesta DB
[  286.973375] [tempesta] Initializing Tempesta FW kernel module...
[  286.985851] [tempesta] Registering new scheduler: hash
[  286.997272] [tempesta] Registering new scheduler: http
[  287.008170] [tempesta] Registering new scheduler: ratio
[  287.018002] [tempesta] Prepearing for the configuration processing.
[  287.018877] [tempesta] Configuration processing is completed.
[  287.025225] [tdb] Opened table /opt/tempesta/db/filter.tdb: size=16777216 rec_size=20 base=ffff8ac967c00000
[  287.028339] [tempesta] ERROR: Unable to start module 'sock_srv': -12
[  287.029760] [tempesta] Stopping all modules...
[  287.036662] [tdb] Close table 'filter0.tdb'
[  287.037085] [tempesta] modules are stopped
[  287.037456] [tempesta] Warning: Configuration parsing has failed. Clean up...
[  292.072012] [tempesta] ERROR: Got stuck in releasing of server groups! 1 server groups was not released.
[  292.072982] [tempesta] New configuration is cleaned.
# lsmod |grep tempesta
tempesta_fw          4612096  3 tfw_sched_ratio,tfw_sched_hash,tfw_sched_http
tempesta_db            36864  2 tempesta_fw,tfw_sched_hash
tempesta_tls          458752  1 tempesta_fw

This is the first problem: we don't cleanup the system state after unsuccessful start. Since the state is dirty, I run --stop and get the crash:

[  423.912600] [tempesta] Warning: Trying to stop an inactive system
[  423.915747] [tempesta] Un-registering scheduler: hash
[  423.962794] [tempesta] Un-registering scheduler: http
[  424.006624] [tempesta] Un-registering scheduler: ratio
[  424.048289] [tempesta] exiting...
[  424.048856] kmem_cache_destroy tfw_srv_conn_cache: Slab cache still has objects
[  424.049791] CPU: 3 PID: 1529 Comm: rmmod Tainted: G           O    4.9.35-kdump #145
[  424.051005] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[  424.052008]  ffffb4a600467e38 ffffffffb2eaa4a7 ffff8ac92f66d900 ffff8ac92f66d900
[  424.052008]  ffffb4a600467ea8 ffffffffb2d48373 ffff8ac92edf7900 00ffffffb2e0878c
[  424.052008]  ffffb4a600467e58 ffffb4a600467e58 ffffb4a600467e68 ffffb4a600467e68
[  424.052008] Call Trace:
[  424.052008]  [<ffffffffb2eaa4a7>] dump_stack+0x4d/0x66
[  424.052008]  [<ffffffffb2d48373>] kmem_cache_destroy+0x233/0x240
[  424.052008]  [<ffffffffc0636bcc>] tfw_sock_srv_exit+0x1c/0x30 [tempesta_fw]
[  424.052008]  [<ffffffffc062eacd>] tfw_exit+0x2d/0x60 [tempesta_fw]
[  424.052008]  [<ffffffffb2cd3dfe>] SyS_delete_module+0x17e/0x240
[  424.052008]  [<ffffffffb2c02068>] ? exit_to_usermode_loop+0x68/0x80
[  424.052008]  [<ffffffffb314ee24>] entry_SYSCALL_64_fastpath+0x17/0x98
[  424.117784] [tdb] Shutdown Tempesta DB
[  424.156120] BUG: unable to handle kernel paging request at ffffffffc06066d0
[  424.157118] IP: [<ffffffffc06066d0>] 0xffffffffc06066d0
[  424.157827] PGD 69608067 [  424.158163] PUD 6960a067
PMD 6b30c067 [  424.158704] PTE 0
[  424.158990] 

Maybe it's invalid action, but we should not crash anyway. This is the second problem.

Next I reboot, try to start the same patched Tempesta FW, but this time us --unload - the same crash.

Anyway, the original problem is exactly:

ERROR: Got stuck in releasing of server groups! 1 server groups was not released.

-- we should not see the message and the configuration just must be cleared properly.

aleksostapenko added a commit that referenced this issue Mar 30, 2018
Fix #918: release connection if its' reference count is zero.
aleksostapenko added a commit that referenced this issue Mar 30, 2018
aleksostapenko added a commit that referenced this issue Mar 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants