Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash occurred after reloading the config. #475

Closed
xinqu opened this issue Sep 6, 2015 · 7 comments
Closed

Crash occurred after reloading the config. #475

xinqu opened this issue Sep 6, 2015 · 7 comments
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Milestone

Comments

@xinqu
Copy link

xinqu commented Sep 6, 2015

  1. Because some code has been modified, the line numbers in the core file no longer match the latest line numbers. The specific code is:
    srs_app_rtmp_conn.cpp:
    for (int i = 0; i < (int)on_connect->args.size(); i++) {
  2. Corresponding to the latest code, line 1315
    It's not just this one crash point, but they are all related to the args parameter in the authentication module's config. Every time there is a crash, the args vector does not exist anymore.
  3. Steps to reproduce:
    a. Authentication connection continuously times out (timeout)
    b. Attempt to open the video, fail (due to authentication)
    c. Disable authentication configuration (set http_hook to off)
    d. Program crashes
    It is inferred that after the configuration refresh, the config uses a new pointer, and the old pointer points to memory that has been released. At that time, the authentication module was using the pointer for a long time, eventually accessing non-existent data, causing a crash.

It is relatively easy to reproduce when the authentication connection definitely times out. However, it is unclear whether there is a possibility of a crash if the configuration is refreshed when authentication is successful and there is a large-scale connection access.

  1. Core dump:
(gdb)
#0  0x00002ae73a0cd91e in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#1  0x000000000047cd3f in SrsRtmpConn::http_hooks_on_connect (this=0x817edc0) at src/app/srs_app_rtmp_conn.cpp:1353
#2  0x000000000047d813 in SrsRtmpConn::check_vhost (this=0x817edc0) at src/app/srs_app_rtmp_conn.cpp:592
#3  0x00000000004816fb in SrsRtmpConn::do_cycle (this=0x817edc0) at src/app/srs_app_rtmp_conn.cpp:158
#4  0x000000000047b173 in SrsConnection::cycle (this=0x817ee20) at src/app/srs_app_conn.cpp:64
#5  0x00000000004b766b in SrsThread::thread_cycle (this=0x817eea0) at src/app/srs_app_thread.cpp:187
#6  0x00000000004b74c8 in SrsThread::thread_fun (arg=0x817eea0) at src/app/srs_app_thread.cpp:225
#7  0x00000000005493b7 in _st_thread_main () at sched.c:327
#8  0x0000000000549b08 in st_thread_create (start=0, arg=0x0, joinable=0, stk_size=988114480) at sched.c:591
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

TRANS_BY_GPT3

@winlinvip winlinvip added this to the srs 2.0 release milestone Sep 7, 2015
@winlinvip winlinvip added the Bug It might be a bug. label Sep 7, 2015
@winlinvip
Copy link
Member

winlinvip commented Sep 7, 2015

args is copied.
Let me see if I can reproduce what you said.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Sep 14, 2015

I put the api-server to sleep for 60 seconds every time it receives a connection before responding:

class RESTClients(object):
    def POST(self):
        time.sleep(60)

Start the API:

python ~/srs/research/api-server/server.py 8085

Server configuration file:

vhost __defaultVhost__ {
    http_hooks {
        enabled         on;
        on_connect      http://127.0.0.1:8085/api/v1/clients;
    }

Play video: http://www.ossrs.net/players/srs_player.html?vhost=__defaultVhost__&app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true

Server log displays a 30-second timeout.

[2015-09-14 03:48:21.012][trace][24380][112] RTMP client ip=127.0.0.1
[2015-09-14 03:48:21.015][trace][24380][112] complex handshake success


[2015-09-14 03:48:51.017][error][24380][112][101] parse http post response failed. ret=1011(STREAM ioctl timeout)
[2015-09-14 03:48:51.017][error][24380][112][9] http post on_connect uri failed. client_id=112, url=http://127.0.0.1:8085/api/v1/clients, request={"action":"on_connect","client_id":112,"ip":"127.0.0.1","vhost":"__defaultVhost__","app":"live","tcUrl":"rtmp://127.0.0.1:1935/live","pageUrl":"http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935"}, response=, code=32745, ret=1011(Bad file descriptor)
[2015-09-14 03:48:51.017][error][24380][112][9] hook client on_connect failed. url=http://127.0.0.1:8085/api/v1/clients, ret=1011(Bad file descriptor)
[2015-09-14 03:48:51.017][error][24380][112][9] check vhost failed. ret=1011(Bad file descriptor)
[2015-09-14 03:48:51.017][warn][24380][112][9] client disconnect peer. ret=1004

Then disable http_hooks and reload:

vhost __defaultVhost__ {
    http_hooks {
        enabled         off;
        on_connect      http://127.0.0.1:8085/api/v1/clients;
    }
killall -1 srs

I found that the server did not encounter any problems.

[2015-09-14 03:51:01.858][trace][24380][100] reload config success.


[2015-09-14 03:51:14.870][trace][24380][114] RTMP client ip=127.0.0.1
[2015-09-14 03:51:14.872][trace][24380][114] complex handshake success
[2015-09-14 03:51:14.872][trace][24380][114] connect app, tcUrl=rtmp://127.0.0.1:1935/live, pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=dev&stream=livestream&server=dev&port=1935, swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.26, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, args=null
[2015-09-14 03:51:14.872][trace][24380][114] out chunk size to 30000
[2015-09-14 03:51:14.883][trace][24380][114] ignored. set buffer length to 200
[2015-09-14 03:51:14.905][trace][24380][114] client identified, type=Play, stream_name=livestream, duration=-1.00
[2015-09-14 03:51:14.905][trace][24380][114] source url=/live/livestream, ip=127.0.0.1, cache=1, is_edge=0, source_id=-1[-1]
[2015-09-14 03:51:14.905][trace][24380][114] dispatch cached gop success. count=0, duration=-1
[2015-09-14 03:51:14.905][trace][24380][114] create consumer, queue_size=1.00, jitter=1
[2015-09-14 03:51:14.917][trace][24380][114] mw changed sleep 350=>350, max_msgs=128, esbuf=218750, sbuf 146988=>109375, realtime=0
[2015-09-14 03:51:14.917][trace][24380][114] start play smi=0.00, mw_sleep=350, mw_enabled=1, realtime=0, tcp_nodelay=0
[2015-09-14 03:51:14.917][trace][24380][115] ignored. set buffer length to 200
[2015-09-14 03:51:14.917][trace][24380][114] -> PLA time=0, msgs=0, okbps=34,0,0, ikbps=28,0,0, mw=350

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Sep 14, 2015

View the code, according to your inference, it is possible:



int SrsRtmpConn::http_hooks_on_connect()
{
    int ret = ERROR_SUCCESS;


#ifdef SRS_AUTO_HTTP_CALLBACK
    if (_srs_config->get_vhost_http_hooks_enabled(req->vhost)) {
        // HTTP: on_connect 
        SrsConfDirective* on_connect = _srs_config->get_vhost_on_connect(req->vhost);
        if (!on_connect) {
            srs_info("ignore the empty http callback: on_connect");
            return ret;
        }


        for (int i = 0; i < (int)on_connect->args.size(); i++) {
            std::string url = on_connect->args.at(i);

The condition is that if there are multiple hooks and after reloading, it executes the next on_connect args, there will be a problem.

Let me reproduce it again, configure multiple hooks, and the server only time.sleep(20) without timing out. Then reload within 20 seconds to release the config:

    http_hooks {
        enabled         on;
        on_connect      http://127.0.0.1:8085/api/v1/clients http://127.0.0.1:8085/api/v1/clients;
    }

View the logs:

[2015-09-14 03:59:14.034][trace][24530][107] RTMP client ip=127.0.0.1
[2015-09-14 03:59:14.034][trace][24530][107] srand initialized the random.
[2015-09-14 03:59:14.037][trace][24530][107] complex handshake success


[2015-09-14 03:59:26.236][trace][24530][100] reload config success.


[2015-09-14 04:01:09.240][trace][24577][107] http hook on_connect success. client_id=107, url=http://127.0.0.1:8085/api/v1/clients, request={"action":"on_connect","client_id":107,"ip":"127.0.0.1","vhost":"__defaultVhost__","app":"live","tcUrl":"rtmp://127.0.0.1:1935/live...vhost...__defaultVhost__","pageUrl":"http://www.ossrs.net/players/srs_player.html?vhost=__defaultVhost__&app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true"}, response=0, ret=0
[2015-09-14 04:01:09.240][trace][24577][107] connect app, tcUrl=rtmp://127.0.0.1:1935/live...vhost...__defaultVhost__, pageUrl=http://www.ossrs.net/players/srs_player.html?vhost=__defaultVhost__&app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true, swfUrl=http://www.ossrs.net/players/srs_player/release/srs_player.swf?_version=1.26, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, args=null
[2015-09-14 04:01:09.240][trace][24577][107] out chunk size to 30000

It can be seen that the released config is indeed being used.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Sep 14, 2015

Therefore, whenever config is used and there is a context switch, the required data should be directly copied. For example:



    // the http hooks will cause context switch,
    // so we must copy all hooks for the on_connect may freed.
    // @see https://github.com/simple-rtmp-server/srs/issues/475
    vector<string> hooks;


    if (true) {
        SrsConfDirective* on_connect = _srs_config->get_vhost_on_connect(req->vhost);


        if (!on_connect) {
            srs_info("ignore the empty http callback: on_connect");
            return ret;
        }


        hooks = on_connect->args;
    }


    for (int i = 0; i < (int)hooks.size(); i++) {
        std::string url = hooks.at(i);
        if ((ret = SrsHttpHooks::on_connect(url, req)) != ERROR_SUCCESS) {
            srs_error("hook client on_connect failed. url=%s, ret=%d", url.c_str(), ret);
            return ret;
        }
    }

TRANS_BY_GPT3

@xinqu
Copy link
Author

xinqu commented Sep 14, 2015

I was thinking that I could call copy when getting this config, make a copy and then return a pointer, and then free it after using it in the function.
Now I think the copy you mentioned is better, I will try to make the changes, thank you.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Sep 14, 2015

This impact is extensive, both the DVR and HLS parts, all callbacks have this issue.
Thank you for reporting this bug~

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Sep 14, 2015

I have already fixed it, you can refer to my submission.

'
Make sure to maintain the markdown structure.

TRANS_BY_GPT3

@winlinvip winlinvip self-assigned this Sep 25, 2021
@winlinvip winlinvip changed the title 在reload config后发生crash Crash occurred after reloading the config. Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

2 participants