Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in coolwsd #176

Open
gerazo opened this issue Apr 19, 2022 · 16 comments
Open

Memory leak in coolwsd #176

gerazo opened this issue Apr 19, 2022 · 16 comments

Comments

@gerazo
Copy link

gerazo commented Apr 19, 2022

I am using

  • Nextcloud 23.0.3
  • richdocuments 5.0.3
  • richdocumentscode 21.11.204

This is a server with around 50-100 active users.

Problem:

After around 5 days of continuous running, coolwsd process has a 30GB of resident memory taken and it is not releasing it. By this time, Collabora Office is either totally unresponsive (no documents are opened) or a document is opened but after 10 seconds, it says "connection to server lost" and kicks you out back to the folders. According to the logs, the 75% of RAM taken alert was already released a day before, so the memory consumption of coolwsd steadily rises over time. At the end, the OS OOM killer is triggered which kills the whole apache process tree as the originator of the coolwsd process. This last action also kills all other NC services, but Collabora is unusable well a day before the actual kill takes place.

The above happens regularly with 204. It did not happen with previous releases. I have just upgraded to 306. We will see how it performs. I guess there should be a memory leak somewhere in coolwsd. It is important to note that coolwsd process shows increased memory consumption, not other apache-related processes, nor the CollaboraOnline... process.

@unclesam87
Copy link

seem to have an similar problem
Debian 11 server
Nextcloud 23.0.3
richdocumentscode 21.11.306
nginx as webserver with php-fpm

It now uses around 37% of the memory, but i can see it rises over the last day, had this curve over the last two weeks.

@gerazo
Copy link
Author

gerazo commented Apr 25, 2022

After a week, I can also say that richdocumentscode 21.11.306 is affected the very same way unfortunately. It seems that the service starts to not respond way before OOM-killer finds it, so this issue definitely causes outage in the service. My only workaround for this is to automatically restart the service every night.

@gerazo
Copy link
Author

gerazo commented May 17, 2022

Version 21.11.402 is affected the same way.

@SethosII
Copy link

SethosII commented Aug 8, 2022

Same thing here with NextCloud 24.0.1, Nextcloud Office 6.1.1, Collabora Online - Built-in CODE Server 22.5.401. The memory usage of coolwsd increases for no apperant reason. I monitored the memory usage of the server and here ist the result:

memory-report

So after only 5 days 37 GB are used in total. The total memory usage of the server is back to 2 GB after a restart of apache.

@SethosII
Copy link

I monitored the memory usage further and added a daily restart of apache (the drops in memory usage are the restarts):

memory-report

The memory usage also increases with the restart although slower. So it seems to be a real memory leak.

@nooblag
Copy link

nooblag commented Oct 12, 2022

Reporting similar behaviour with:

richdocuments 6.2.1
richdocumentscode 22.5.502

Running Nextcloud 24.0.6 with 4GB of RAM.

Ubuntu 20.04.5 LTS
nginx/1.18.0 (Ubuntu)
PHP 8.0.14 fpm
psql (PostgreSQL) 12.12

coolwsd memory usage slowly increases over several days until it becomes unresponsive. Grafana screenshot, for example. The spike ending is when PHP service was restarted.

Screenshot 2022-10-13 at 05-50-34

@kadarpik
Copy link

with one month stops working in 6 GB virtual machine, definitely a memory leak is there, I am using 22.05.8.2, it was not that bad in previous versions. Put some restarts into crontab.

@kwisatz
Copy link

kwisatz commented Mar 12, 2023

Nidor-Dashboards-Dashboards-Grafana

I don't now what happened here, but it clearly looks like this is not a gradual leak, but something causing it to steadily allocate until… at some point, oomkiller jumped to the rescue…

@NetBLOKS
Copy link

Same problem here, but just with the collabora server from the nextcloud app store.
Servers with external collabora server do not face this issue.

@rizajur
Copy link

rizajur commented Mar 15, 2023

Servers with external collabora server do not face this issue.

Which version you @NetBLOKS run of collarbora (docker?) and what nc version, was well as setup procedure , using nginx php-fpm ?
I always had these issues no matter what, but it was a few months prior

@NetBLOKS
Copy link

Servers with external collabora server do not face this issue.

Which version you @NetBLOKS run of collarbora (docker?) and what nc version, was well as setup procedure , using nginx php-fpm ? I always had these issues no matter what, but it was a few months prior

Which version you @NetBLOKS run of collarbora -> App Store Version (Collabora Online - Built-in CODE Server) and what nc version (Happens in 24, and 25. got latest 25.0.4), was well as setup procedure (Manual Install, Debian 11, Apache, PHP7.4-FPM)

@trenshaw
Copy link

Can confirm this is still an issue on the following stack:

Ubuntu 24.04 LTS
Nginx 1.24
PHP 8.3-FPM
Nextcloud 29.0.0
richdocuments: 8.4.2
richdocumentscode_arm64: 24.4.201

@Githopp192
Copy link

Githopp192 commented Aug 4, 2024

Nextcloud version:29.0.4.1
Red Hat Enterprise Linux release 8.10 (Ootpa) 10.6.18-MariaDB,
Apache/2.4.37
PHP 8.3.10

  • richdocuments: 8.4.4
  • richdocumentscode: 24.4.502

Since Upgrading to Nextcloud 29.0.4.1 and upgrade PHP 8.2 to PHP 8.3 - Nextcloud Server is almost crashing, because
php-fpm is consuming all the space in /tmp ==>

32G /tmp/systemd-private-4c513a85a5cb462b92e805310c385d9e-php-fpm.service-r9PDnv/tmp/coolwsd.LNJU02GnN5/jails/18443-d61991e6
39G /tmp/systemd-private-4c513a85a5cb462b92e805310c385d9e-php-fpm.service-r9PDnv/tmp/coolwsd.LNJU02GnN5

After restarting PHP-FPM Service the files were removed from /tmp directory.

Here you see, that "coolwsd" is eating all the space from /tmp dir of the server in a short time:

2024.08.04 03:15:02 - Space ok 18% /dev/mapper/server-root --Mount-- /
2024.08.04 03:30:01 - Space ok 19% /dev/mapper/server-root --Mount-- /
2024.08.04 03:45:01 - Space ok 20% /dev/mapper/server-root --Mount-- /
2024.08.04 04:15:03 - Space ok 22% /dev/mapper/server-root --Mount-- /
2024.08.04 04:30:03 - Space ok 23% /dev/mapper/server-root --Mount-- /
2024.08.04 05:00:01 - Space ok 24% /dev/mapper/server-root --Mount-- /
2024.08.04 05:15:02 - Space ok 25% /dev/mapper/server-root --Mount-- /
2024.08.04 05:30:01 - Space ok 26% /dev/mapper/server-root --Mount-- /
2024.08.04 05:45:02 - Space ok 27% /dev/mapper/server-root --Mount-- /
2024.08.04 06:15:01 - Space ok 28% /dev/mapper/server-root --Mount-- /
2024.08.04 06:30:04 - Space ok 32% /dev/mapper/server-root --Mount-- /
2024.08.04 07:00:48 - Space ok 33% /dev/mapper/server-root --Mount-- /
2024.08.04 07:30:01 - Space ok 34% /dev/mapper/server-root --Mount-- /
2024.08.04 07:45:01 - Space ok 35% /dev/mapper/server-root --Mount-- /
2024.08.04 08:00:01 - Space ok 36% /dev/mapper/server-root --Mount-- /
2024.08.04 08:15:02 - Space ok 37% /dev/mapper/server-root --Mount-- /
2024.08.04 08:30:01 - Space ok 38% /dev/mapper/server-root --Mount-- /
2024.08.04 08:45:01 - Space ok 39% /dev/mapper/server-root --Mount-- /
2024.08.04 09:00:02 - Space ok 40% /dev/mapper/server-root --Mount-- /
2024.08.04 09:15:01 - Space ok 41% /dev/mapper/server-root --Mount-- /
2024.08.04 09:30:01 - Space ok 42% /dev/mapper/server-root --Mount-- /
2024.08.04 09:45:02 - Space ok 43% /dev/mapper/server-root --Mount-- /
2024.08.04 10:00:01 - Space ok 44% /dev/mapper/server-root --Mount-- /
2024.08.04 11:00:03 - Space ok 46% /dev/mapper/server-root --Mount-- /
2024.08.04 11:15:01 - Space ok 47% /dev/mapper/server-root --Mount-- /
2024.08.04 11:30:01 - Space ok 49% /dev/mapper/server-root --Mount-- /
2024.08.04 11:45:01 - Space ok 51% /dev/mapper/server-root --Mount-- /
2024.08.04 12:00:02 - Space ok 54% /dev/mapper/server-root --Mount-- /
2024.08.04 12:15:01 - Space ok 55% /dev/mapper/server-root --Mount-- /

Also the memory on the serveris decreasing and decraesing ==>

image

See also those errors in php-fpm.log ==>

PHP Fatal error: Uncaught TypeError: implode(): Argument #1 ($array) must be of type array, string given in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php:398
Stack trace:
#0 /var/www/html/nextcloud/apps/richdocumentscode/proxy.php(398): implode()
#1 {main}
thrown in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php on line 398
[04-Aug-2024 12:08:26 richdocumentscode (proxy.php) error exit, PID: 509268, Message: No content in reply from coolwsd. Is SSL enabled in error ?
[04-Aug-2024 12:08:26] PHP Warning: http_response_code(): Cannot set response code - headers already sent (output started at /var/www/html/nextcloud/apps/richdocumentscode/proxy.php:30) in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php on line 34
[04-Aug-2024 12:19:49 PHP Warning: http_response_code(): Cannot set response code - headers already sent (output started at /var/www/html/nextcloud/apps/richdocumentscode/proxy.php:285) in /var/www/html/nextcloud/apps/richdocumentscode/proxy.php on line 292

Workaround:

  • [ 1 ] - did restart apache, redis and php-fpm - now the memory- & space consumption is normal again

@Githopp192
Copy link

restart apache does solve the issue temporarely .. after a short while .. collwsd will write about 5-10GB per hour ..
100-200GB per day !

This is a big issue, which affects the server stability

@joshtrichards
Copy link
Contributor

@Githopp192 This issue is about a memory link, not disk space / /tmp.

@Githopp192
Copy link

Githopp192 commented Aug 21, 2024

By reading my comments, check the graph, too - i've affected by a memory leak, too

"Also the memory on the server is decreasing and decreasing ==>"

See the image obove ...

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests