Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory problem with GOP cache enabled #29

Closed
tuan3w opened this issue May 4, 2018 · 19 comments
Closed

Memory problem with GOP cache enabled #29

tuan3w opened this issue May 4, 2018 · 19 comments

Comments

@tuan3w
Copy link

tuan3w commented May 4, 2018

Hi @winshining,
When I start nginx, it consumes about a little memory. When i simulate concurrent downloads using wrk tool, memory of my machine starts to increase up to 4.5G. However, after benchmark, I see that the memory doesn't release at all.
Beside that, I have several questions about GOP cache. Is the size of memory for caching proportional to the number of subscriber/players when GOP cache enable ? Is there any way for me to control size of GOP cache?
Thanks.

@winshining
Copy link
Owner

winshining commented May 4, 2018

Did you use the latest commit? Some guys posted the same problem as you mentioned. However, I think I've solved it in codes after release v1.2.3. Here is my benchmark:
500 clients (HTTP-FLV) simulated by srs-bench:
[2018-05-04 19:15:57.103] [report] [1789] threads:500 alive:500 duration:2400 tduration:0 nread:127.67 nwrite:0.00 tasks:500 etasks:0 stasks:0 estasks:0
[2018-05-04 19:16:27.106] [report] [1789] threads:500 alive:500 duration:2430 tduration:0 nread:127.66 nwrite:0.00 tasks:500 etasks:0 stasks:0 estasks:0
[2018-05-04 19:16:57.108] [report] [1789] threads:500 alive:500 duration:2460 tduration:0 nread:127.51 nwrite:0.00 tasks:500 etasks:0 stasks:0 estasks:0
Memory consumed by nginx:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1535 nobody 20 0 112m 52m 1100 S 11.7 0.9 4:00.14 nginx
1534 root 20 0 67800 5608 336 S 0.0 0.1 0:00.00 nginx

Answers for GOP cache:
1 No. There is only one GOP cached by publisher, after receiving a complete GOP, gop_cache module will update the GOP chain.
2 No. You can hack the code and modify it:)

@winshining
Copy link
Owner

And here is my nginx.conf:

server {
    listen 1935;
    server_name *.xxx.org;

    out_queue   4096;
    out_cork    16;
    max_streams 128;
    wait_video  on;

    chunk_size 4000;

    application myapp {
        live on;
        gop_cache on;
    }
}

@tuan3w
Copy link
Author

tuan3w commented May 4, 2018

Hi @winshining,
I'm using latest version. Nginx version is 1.13.12. Here is my setup.
https://gist.github.com/tuan3w/3230988b23f13cd30ce59f7a91321c44

@winshining
Copy link
Owner

Hi @tuan3w:
Thank you for your reply.
I used config similar to what you provided, and Nginx version I used is 1.13.10.
The following is my benchmark:
./wrk -d 5m -t10 -c500 http://localhost:8080/live?app=myapp\&stream=mystream
Running 5m test @ http://localhost:8080/live?app=myapp&stream=mystream
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.00us 0.00us 0.00us -nan%
Req/Sec 0.00 0.00 0.00 -nan%
0 requests in 5.00m, 4.55GB read
Requests/sec: 0.00
Transfer/sec: 15.53MB
And the following is statistics about nginx:
top - 00:47:11 up 3:12, 5 users, load average: 0.17, 0.53, 0.75
Tasks: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.0 us, 3.9 sy, 0.0 ni, 93.8 id, 0.3 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem: 3854532 total, 3366588 used, 487944 free, 104024 buffers
KiB Swap: 2147324 total, 0 used, 2147324 free, 1700528 cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32420 www-data 20 0 62952 15m 1468 S 0.0 0.4 1:00.21 nginx
32423 www-data 20 0 60884 13m 1348 S 0.3 0.4 0:50.72 nginx
32422 www-data 20 0 60480 13m 1548 S 0.3 0.3 0:49.41 nginx
32421 www-data 20 0 59564 11m 1320 S 0.0 0.3 0:43.96 nginx
32419 root 20 0 49692 1284 236 S 0.0 0.0 0:00.00 nginx
So there is no memory leak in the latest version now.

@tuan3w
Copy link
Author

tuan3w commented May 4, 2018

Hi @winshining,
It looks fine for me when I do stress test on localhost. Maybe there isn't enough stress on the system. Can you test again with
http://play-with-docker.com
I created five instances to do stress test. After a hour, the memory is still high.

@winshining
Copy link
Owner

Sorry to reply you after a long time.
I deployed nginx and srs_http_load on two different hosts and the following is my benchmark:
1000 clients simulated by srs-bench:
[2018-05-06 00:18:12.193] [report] [6015] threads:1000 alive:1000 duration:30 tduration:0 nread:212.90 nwrite:0.04 tasks:1000 etasks:0 stasks:0 estasks:0
[2018-05-06 00:18:42.195] [report] [6015] threads:1000 alive:1000 duration:60 tduration:0 nread:213.22 nwrite:0.02 tasks:1000 etasks:0 stasks:0 estasks:0
[2018-05-06 00:19:12.205] [report] [6015] threads:1000 alive:1000 duration:90 tduration:0 nread:231.73 nwrite:0.01 tasks:1000 etasks:0 stasks:0 estasks:0
[2018-05-06 00:19:42.203] [report] [6015] threads:1000 alive:1000 duration:120 tduration:0 nread:235.94 nwrite:0.01 tasks:1000 etasks:0 stasks:0 estasks:0
[2018-05-06 00:20:03.012][33][error] read header from server failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.013][33][error] parse response body failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.013][33][error] http client parse response failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.013][33][error] http client get url failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.015][982][error] read header from server failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.015][982][error] parse response body failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.015][982][error] http client parse response failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:03.015][982][error] http client get url failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:12.203] [report] [6015] threads:1000 alive:1000 duration:150 tduration:0 nread:237.27 nwrite:0.01 tasks:1002 etasks:2 stasks:0 estasks:0
[2018-05-06 00:20:42.203] [report] [6015] threads:1000 alive:1000 duration:180 tduration:0 nread:242.77 nwrite:0.01 tasks:1002 etasks:2 stasks:0 estasks:0
[2018-05-06 00:20:43.723][630][error] read header from server failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:43.723][630][error] parse response body failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:43.723][630][error] http client parse response failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:20:43.723][630][error] http client get url failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:21:12.204] [report] [6015] threads:1000 alive:1000 duration:210 tduration:0 nread:238.88 nwrite:0.01 tasks:1003 etasks:3 stasks:0 estasks:0
[2018-05-06 00:21:42.204] [report] [6015] threads:1000 alive:1000 duration:240 tduration:0 nread:239.54 nwrite:0.00 tasks:1003 etasks:3 stasks:0 estasks:0
[2018-05-06 00:22:12.204] [report] [6015] threads:1000 alive:1000 duration:270 tduration:0 nread:238.02 nwrite:0.00 tasks:1003 etasks:3 stasks:0 estasks:0
[2018-05-06 00:22:35.838][393][error] read header from server failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:22:35.838][393][error] parse response body failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:22:35.838][393][error] http client parse response failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:22:35.838][393][error] http client get url failed. ret=104 errno=104(Connection reset by peer)
[2018-05-06 00:22:42.232] [report] [6015] threads:1000 alive:1000 duration:300 tduration:0 nread:239.91 nwrite:0.00 tasks:1004 etasks:4 stasks:0 estasks:0
Statistics about nginx:
top - 00:23:59 up 3 days, 8:34, 2 users, load average: 0.21, 0.29, 0.26
Tasks: 2 total, 2 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 1.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8388608k total, 8388608k used, 0k free, 0k buffers
Swap: 4194304k total, 637056k used, 3557248k free, 2634632k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18929 nobody 20 0 153m 102m 1212 R 19.0 1.2 2:25.99 nginx
18927 nobody 20 0 133m 82m 1228 R 22.3 1.0 2:26.30 nginx
And the following is statistics about nginx after I stopped benchmark:
top - 00:25:25 up 3 days, 8:35, 2 users, load average: 0.39, 0.30, 0.27
Tasks: 2 total, 0 running, 2 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8388608k total, 8388608k used, 0k free, 0k buffers
Swap: 4194304k total, 637056k used, 3557248k free, 2634976k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18929 nobody 20 0 141m 90m 1212 S 0.0 1.1 2:39.70 nginx
18927 nobody 20 0 133m 82m 1228 S 0.0 1.0 2:40.61 nginx

Some memories were recycled in connections pool by nginx, so they were not all freed.
I can't reproduce the problem as you mentioned:(

@tuan3w
Copy link
Author

tuan3w commented May 6, 2018

I updated how you can reproduce the output here. https://gist.github.com/tuan3w/3230988b23f13cd30ce59f7a91321c44

I also made a proof video, you can watch here: https://www.youtube.com/watch?v=SUkcasp83w8&feature=youtu.be

After 10 minutes, my machine still reports that 3.02G memory is used.

@winshining
Copy link
Owner

1

2

@winshining
Copy link
Owner

BTW, I can not watch the video you uploaded, since it refused to be connected.

@tuan3w
Copy link
Author

tuan3w commented May 7, 2018

Hi @winshining ,

I updated video pivacy. Can you recheck again? Maybe the video is abandoned from your country :)).
I looked at your image. Is the url only "./wrk -d ... http://..:8000/live" withoout other parameters ?
You should add quote to url to prevent breaking command due to "&" symbol.

img

Thanks

@winshining
Copy link
Owner

In fact it was with parameters, but there was not enough space for terminal to display them. When I ran it foreground, it showed the complete command.
I can not still watch the video, the browser told me it was time out to connect it (VPN did not help neither).

@tuan3w
Copy link
Author

tuan3w commented May 7, 2018

I uploaded it on google-drive: https://drive.google.com/file/d/1SlKsMq_zDzOzquoqtnStPH4Tc4ySKgCz/view
In the meantime, I will create a docker image for testing.

@winshining
Copy link
Owner

The video was broken, player could only play 26 seconds (both vlc and quicktime).

@tuan3w
Copy link
Author

tuan3w commented May 7, 2018

Please check this link. Not sure why old file is broken :)) . https://drive.google.com/file/d/1DZokdi_hp_uiAyBjZPeHOZvIPlu229TU/view?usp=sharing

@winshining
Copy link
Owner

@tuan3w The video is OK. I will check what caused the high memory consumption.

@winshining
Copy link
Owner

Hi @tuan3w, have you ever tried the latest code? I rewrote the gop cache module, maybe it would decrease the memory consumption. Looking forward to your reply.

@tuan3w
Copy link
Author

tuan3w commented May 17, 2018

Hi @winshining,
I still got the same problem. Maybe it's not software problem. I will investigate on testing more if i have time.
Thanks.

@winshining
Copy link
Owner

winshining commented May 20, 2018

Hi @tuan3w, I found the reason.
The chunk size set by server as default is 4096, the server notifies the clients this value via Set Chunk Size protocol control message. But server uses 128 as chunk size (according to RTMP Specification) before sending Set Chunk Size as 4096 protocol control message to the clients and expects they reply the same chunk size. However, some clients ignore it, that causes that server uses 4096 to allocate space to encapsulate the data clients send while the clients still use 128, the worst situation is that server encapsulates a single 128-byte message with 4096-byte memory space, wasting about 31 times space.
I modified the default chunk size from 4096 to 128. As a result, if the clients ignore Set Chunk Size, server use the minimal size, if they send back the chunk size according to server configuration (not 128), memory will not be wasted neither.
The following are my benchmarks (2 times respectively, also 500 clients and server on different hosts):
Before modification:
old_1
old_2
After modification:
new_1
new_2
PS: htop is an excellent tool!

@tuan3w
Copy link
Author

tuan3w commented May 20, 2018

Hi @winshining ,
I'm tested again and the result is ok. Thanks for investigating on the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants