Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT after the inclusion of FFI::shutdown() to release memory consumption #170

Closed
craig410 opened this issue Nov 11, 2022 · 22 comments
Closed

Comments

@craig410
Copy link

Running libvips 8.13.3 on Debian bullseye with PHP 8.1 under Apache using jcupitt/vips:2.10

Calling FFI::shutdown() at the end of a script after an Image::Thumbnail() operation results in the process SEGFAULT-ing.

The call to shutdown() is required as without it the memory consumption climbs and is never released back to the system (at least from our experiments).

There are more details and a full reproduction case, including Dockerfile to match the environment: https://github.com/ingenerator/libvips-segfault-repro

Thanks to @acoulton: Failing tests proving either case here: https://github.com/ingenerator/libvips-segfault-repro/actions/runs/3444654670/jobs/5747485403

And a further attempt to debug the source of the SEGFAULT here: ingenerator/libvips-segfault-repro#3

We have followed as much of the documentation as we can find around usingphp-vips appropriately however, it may be that it is not suited to running in this environment, in this way (ie mpm-prefork / multiuser). While happy to spend a little more time debugging, if you have any suggestions of where to go next with it? It would be good to know if it is at least intended to work this way?

@jcupitt
Copy link
Member

jcupitt commented Nov 11, 2022

Hi @craig410,

I've just made a PR that (I think!) works around what is (again, I think) a bug in php-ffi. Would you be able to test it?

#171

You shouldn't need to call shutdown, and hopefully it won't leak.

@craig410
Copy link
Author

Hi @jcupitt,

Thanks for your speedy response. We're still seeing huge memory accumulation without calling shutdown. Roughly 250% increase over 10 requests. https://github.com/ingenerator/libvips-segfault-repro/actions/runs/3446792614/jobs/5752119000

@jcupitt
Copy link
Member

jcupitt commented Nov 11, 2022

Could that be the operation cache? libvips will try to cache recent operations, and has a 100mb cache limit by default.

You can change the default cache sizing here:

https://libvips.github.io/php-vips/classes/Jcupitt-Vips-Config.html#method_cacheSetMaxMem

You can also limit the cache by number of operations and number of open files.

I'd also beware of memory fragmentation. It'll take quite a few iterations (up to several 100?) before memory use stabilises.

@jcupitt
Copy link
Member

jcupitt commented Nov 11, 2022

Oh sigh, this still leaks:

#!/usr/bin/env php
<?php

require __DIR__ . '/vendor/autoload.php';
use Jcupitt\Vips;

#Vips\Config::setLogger(new Vips\DebugLogger());

Vips\Config::cacheSetMax(0);

if (count($argv) != 4) {
    echo("usage: ./watermark-text.php input output \"some text\"\n");
    exit(1);
}

$input_filename = $argv[1];
$output_filename = $argv[2];
$message = $argv[3];

function watermark($image, $message)
{
    $text = Vips\Image::text($message, [
      'width' => $image->width,
      'dpi' => 600,
      'rgba' => true
    ]);

    // scale the alpha down to make it semi-transparent, and rotate by 45
    // degrees
    $text = $text
        ->multiply([1, 1, 1, 0.3])
        ->rotate(45);

    // replicate and crop to match the size of the image
    $text = $text
        ->replicate(1 + $image->width / $text->width,
            1 + $image->height / $text->height)
        ->crop(0, 0, $image->width, $image->height);

    // and overlay on the image
    return $image->composite($text, "over");
}

echo "iteration, growth (bytes)\n";
$prev = 0;
for ($i = 0; $i < 2000; $i++) {
    $image = Vips\Image::newFromFile($input_filename, [
      'access' => 'sequential',
    ]);

    // no leak
    //$image = $image->invert()

    // leaks 192 bytes per loop
    $image = $image->add(1);

    // leaks 288 bytes per loop
    //$image = watermark($image, $message);

    $image->writeToFile($output_filename);

    if ($i % 10 == 0) {
        gc_collect_cycles();
        $now = memory_get_usage();
        $use = $now - $prev;
        $prev = $now;
        echo "$i, $use\n";
    }
}

Looks like a problem with constant handling. I'll have a further look.

@jcupitt
Copy link
Member

jcupitt commented Nov 11, 2022

... let's close this issue and discuss and test in the PR.

@jcupitt jcupitt closed this as completed Nov 11, 2022
@craig410
Copy link
Author

Could that be the operation cache? libvips will try to cache recent operations, and has a 100mb cache limit by default.

You can change the default cache sizing here:

https://libvips.github.io/php-vips/classes/Jcupitt-Vips-Config.html#method_cacheSetMaxMem

You can also limit the cache by number of operations and number of open files.

We've already tried using a tiny cache as well as disabling it completely and noticed no appreciable difference whatsoever. I kept it out of the reproduction to keep it simple.

I'd also beware of memory fragmentation. It'll take quite a few iterations (up to several 100?) before memory use stabilises.

I've run locally 400 iterations of the same operation with the default options and memory peaked at 200MiB and settled at 198MiB

With the options

Config::cacheSetMax(0);
Config::cacheSetMaxMem(1_000_000);
Config::cacheSetMaxFiles(0);

memory peaked at 156MiB after ~92 iterations and remained fairly constant.

We can pick up on PR #171 assuming we're fighting the same issue

@jcupitt
Copy link
Member

jcupitt commented Nov 12, 2022

Actually, I think this is something different. That PR is about small leaks in php-ffi (just a few 100 bytes per iteration), but it sounds like you're seeing something more serious.

@jcupitt jcupitt reopened this Nov 12, 2022
@jcupitt
Copy link
Member

jcupitt commented Nov 12, 2022

I made this test program based on your example:

#!/usr/bin/env php
<?php

require __DIR__ . '/vendor/autoload.php';
use Jcupitt\Vips;

if (count($argv) != 2) {
    echo("usage: ./thumbnail-loop.php input\n");
    exit(1);
}

function thumbnail($filename)
{   
    $image = Vips\Image::thumbnail($filename, 600, [
        'height' => 10_000_000, 
        'export-profile' => 'srgb'
    ]);

    $buf = $image->writeToBuffer('.jpg', [
        'Q' => 75, 
        'strip' => TRUE, 
        'optimize_coding' => TRUE, 
        'profile'=>'srgb'
    ]); 

    return $buf;
}

echo "iteration, growth (bytes)\n";
$prev = 0;
for ($i = 0; $i < 1000; $i++) {
    $buf = thumbnail($argv[1]);

    if ($i % 10 == 0) {
        gc_collect_cycles();
        $now = memory_get_usage();
        $use = $now - $prev;
        $prev = $now;
        echo "$i, $use\n";
    }
}

Running against the php-vips in that PR I see:

$ ./thumbnail-loop.php ~/pics/k2.jpg
iteration, growth (bytes)
0, 1138920
10, 0
20, 0
...
980, 0
990, 0

So no leaks in 1000 iterations, according to memory_get_usage().

I'll try a version that watches RSS in ps instead.

@jcupitt
Copy link
Member

jcupitt commented Nov 12, 2022

This seems to work OK:

echo "iteration, now (kb), growth (kb)\n";
$prev = 0;
for ($i = 0; $i < 10000; $i++) {
    $buf = thumbnail($argv[1]);

    if ($i % 10 == 0) {
        gc_collect_cycles();
        $pid = getmypid();
        $now = intval(`ps --pid $pid --no-headers -orss`);
        $use = $now - $prev;
        $prev = $now;
        echo "$i, $now, $use\n";
    }
}

I tried graphing it:

x

So 160mb after 10,000 iterations. I think the slight upward trend is probably fragmentation. I'll try running it for 100,000 iterations.

@jcupitt
Copy link
Member

jcupitt commented Nov 12, 2022

Here's 55,000 iterations:

x

So I think it's fragmentation and there's no leak.

@acoulton
Copy link

Hi @jcupitt, I work with @craig410. Thank you for continuing to look at this & your comments over the weekend.

Unfortunately I think your test scripts don't accurately represent our runtime environment, so I wanted to give a bit more background to what we're seeing.

It looks like you are testing multiple iterations within the lifetime of a single start-to-finish PHP script (e.g. run as a shell command)?

We're running PHP within apache / mod_php to respond to incoming HTTP requests. There are multiple apache worker threads, each of which contains an embedded PHP runtime (there are no separately identifiable PHP processes). Those threads handle a succession of HTTP requests. The process does not exit between requests, it just resets state - clearing all variables and releasing all memory (apart from anything explicitly placed in shared memory e.g. caches).

Based on the normal behaviour of other PHP extensions, we would expect php-vips only to consume noticeable memory during requests that actually process images.

In production, image requests are a very small minority of our overall traffic, just occasionally thumbnailing relatively small files.

Historically we were using imagemagick through ext-imagick. The memory consumption of an apache worker process would peak a bit during an image request, then drop once the response had been served. Therefore the overall average memory usage across all apache processes was fairly low & stable.

As soon as we deployed our switch to libvips, we observed that apache memory usage would start to climb linearly with each new image processing request, and once memory was taken it would never be released. Memory growth seems to flatten over time, but it never actually drops until/unless the apache processes are terminated either during a deployment or due to the container hitting memory limits.

You can see that quite clearly in our production monitoring (mean of non-evictable memory per application container over time) - we switched to libvips on the 10th October. Metrics before the 22nd September have been archived, but the flat line at around 160 - 180 MiB at the left of the chart is pretty typical over the long term.

image

It was this that prompted @craig410 to start looking at options to control the libvips consumption:

  • Disabling all the caching - which didn't seem to have a noticeable effect
  • Then starting to explore FFI::shutdown() - which got memory consumption back to normal but triggered segfaults on subsequent requests

I know apache / mod_php could be considered a slightly dated setup, so I have tried today to run our reproduction example in php-fpm. It looks to me like the behaviour there is the same, with memory growing over time and never returning to the baseline.

I'm not particularly familiar with low-level memory allocation / fragmentation, but I wouldn't expect fragmentation to cause long-term issues in a mod_php / php_fpm runtime. That implies libvips is holding empty / fragmented runtime memory blocks between requests? My understanding is that normally a PHP extension would release these when the userland references are garbage collected / at the end of each request. FFI::shutdown() does reduce the measured memory, so even if the problem is fragmentation that implies it is within blocks that libvips is managing?

To summarise:

  • In a long-lived process (mod_php / php_fpm), the process hosting libvips appears to hold a large amount of memory indefinitely - long after any userland variables have been garbage collected, and even if subsequent requests are not performing image operations at all. This is not behaviour we've seen with other php extensions in the past.
  • We agree that running php-vips from a command line script appears to work as expected (regardless of number of iterations within the script), memory is released once the process exits back to the shell.
  • Calling FFI::shutdown() after an image operation and before sending the response solves the memory consumption issue, but causes a segfault - apparently within glib - on subsequent requests.

Part of the motivation for migrating to libvips was the advertised memory consumption performance, but so far what we're seeing is that it's better than imagemagick while processing a specific image operation, but significantly worse in terms of requirements for long-term resource allocation.

As @craig410 said originally, we're partly trying to understand whether libvips / php-vips should actually work in a web hosting environment with parallel / long-running processes. It may be the architecture of the library & php bindings is only really designed for use in command-line scripting. If so, we can look at shelling out to a new process to perform an image operation and allow the operating system to clean that up.

Any further advice would be very welcome.

@jcupitt
Copy link
Member

jcupitt commented Nov 14, 2022

Hello @acoulton,

Ooof, I'm a bit vague on the details, but my understanding is that php does two types of memory allocation: per request (via emalloc) and persistent (via the system malloc).

Per request memory allocated via emalloc is automatically released at the end of a request when the request thread exits, and because it's allocated and freed in a huge chunk, can't get fragmented. Persistent objects on the other hand exist for the lifetime of the php process and are shared between threads. You can get fragmentation here, since the memory isn't pooled.

The php side of php-vips will mostly use emalloc, so that's all fine, but the native libvips binary will of course be using the system malloc, so it'll operate like php's persistent memory. If you call shutDown it'll free all the libvips memory for the whole process, so yes, it'll trigger segvs.

Looking at your graph, I think it's operating as expected. You're seeing an extra 100mb, typically, cased by having a native library doing significant amounts of malloc() in your webserver process. I don't think you're seeing a leak (ie. libvips is not losing memory).

As to what you should do I suppose the choices are:

  1. Leave it like this. Is the extra 100mb an issue? If not, I'd be inclined to leave it. libvips ought to be significantly faster than ext-imagick (maybe a factor of 6 on this benchmark) so you could possibly use a lower powered server and see a saving that way.
  2. Move image processing to a separate process. If there's significant image processing, it's probably not a good idea to do it in your main server process. I've put image handling into a separate worker process which takes jobs from a jobs table in the DB and works asynchronously. Keeping heavy CPU load tasks off your main server can be a good idea.
  3. You could fork vipsthumbnail. This works fine for resizing and will give pretty good performance.

@kleisauke could I ping you on this (sorry)? You know a lot more about this subject than me!

@jcupitt
Copy link
Member

jcupitt commented Nov 14, 2022

Oh, as another option, libvips is fast enough that you could also resize images on demand rather than resizing on upload.

You might see a useful drop in storage costs, and users would have images sized exactly for their device.

@kleisauke
Copy link
Member

I think this is a memory fragmentation issue. You might consider using a different memory allocator such as jemalloc to prevent memory fragmentation in long-running, multi-threaded, glibc-based Linux processes. See lovell/sharp#955 to learn more about memory allocation and why the choice of a memory allocator is important.

You could also consider switching to a Alpine-based Docker image, as the memory allocator in musl, upon which Alpine Linux is based, is generally considered very good in terms of lack of fragmentation and returning freed memory.

For example, with these changes:
ingenerator/libvips-segfault-repro@test-php-vips-pull-171...kleisauke:alpine-based

I see:

Result summary:

  - # 1 HTTP:200 Memory: 10.19MB / 33.6GB
  - # 2 HTTP:200 Memory: 15.7MB / 33.6GB
  - # 3 HTTP:200 Memory: 15.84MB / 33.6GB
  - # 4 HTTP:200 Memory: 15.96MB / 33.6GB
  - # 5 HTTP:200 Memory: 15.99MB / 33.6GB
  - # 6 HTTP:200 Memory: 16.02MB / 33.6GB
  - # 7 HTTP:200 Memory: 16.02MB / 33.6GB
  - # 8 HTTP:200 Memory: 16.05MB / 33.6GB
  - # 9 HTTP:200 Memory: 16.07MB / 33.6GB
  - # 10 HTTP:200 Memory: 16.08MB / 33.6GB
  Final Memory: 15.54MB / 33.6GB

So, much less memory usage than glibc.

@kleisauke
Copy link
Member

FWIW, for the same reason Redis ships with jemalloc by default since version 2.4.

Since we introduced the specially encoded data types Redis started suffering from fragmentation. We tried different things to fix the problem, but basically the Linux default allocator in glibc sucks really, really hard. […] Every single case of fragmentation in real world systems was fixed by this change, and also the amount of memory used dropped a bit.

I'm also aware that sharp reduces the default concurrency to 1 when glibc' memory allocator is used, see commit: lovell/sharp@5a9cc83

@acoulton
Copy link

@jcupitt @kleisauke thank you both very much for your comments, that's extremely helpful.

I'm not hugely familiar with the details of low-level memory allocation but the malloc / emalloc explanation makes sense to me.

I think in the short term we'll explore jemalloc (@craig410 is much hotter on things like that than me so can take a look when he's back at work). And/or we may just perform image operations in a shell process as we work towards a more permanent solution.

Just to respond to a couple of your comments:

If you call shutDown it'll free all the libvips memory for the whole process, so yes, it'll trigger segvs.

Is there any usecase where FFI::shutDown() is expected/valid to be called? It might be useful to others in future if there was a bit more information / a warning on the phpdoc for that method to clarify expected behaviour. I could put together a PR with that and a couple of other possible additions to documentation with what we've learned from this process.

  1. Leave it like this. Is the extra 100mb an issue?

Unfortunately, yes. We are running multiple container-based projects & workloads under kubernetes, and have multiple instances of every service to allow for rolling upgrades / geo-redundancy etc. That has a lot of benefits to us over traditional vhost-based shared servers, but in this specific instance is probably worse because each separate set of apache/mod_php processes will have its own malloc-blocks.

We obviously have memory headroom for spikes, but permanently increasing the baseline memory for every pod to allow for unused but fragmented space would require more / bigger nodes and have a noticeable cost implication on our cluster.

  1. Move image processing to a separate process.

Medium-term our plan is to do just that, but there are some architectural complications with our existing codebases that make that a reasonable job in its own right. And image processing is a relatively small component of the overall workload, which hasn't traditionally taxed resources.

That said, we'd be looking to move processing either to a standalone k8s service or something like Cloud Run, so again there are cost implications to the size of permanent background memory allocation sitting idle.

libvips is fast enough that you could also resize images on demand rather than resizing on upload.

The project we've migrated so far accepts files - mostly photos - for a single internal use. We resize on upload to the specific (quite low-res) version the system needs and only store that, discarding the much larger original images that users tend to upload.

The next project we plan to upgrade does already resize on demand with imagick, albeit behind a CDN & caching filesystem. However we'll probably keep the cache - in our architecture the source files are stored in cloud storage, as appservers are clustered & transient. So there's a cost & latency implication for pulling the originals down before any image operations even begin.

In any case I think resizing on demand would produce more image operations rather than fewer, so would make the memory fragmentation issue worse.

consider switching to a Alpine-based Docker image

This, and migrating to php-fpm, is definitely already on our list to explore and it's really helpful to know that might also help with the fragmentation.

@jcupitt
Copy link
Member

jcupitt commented Nov 14, 2022

Is there any usecase where FFI::shutDown() is expected/valid to be called?

It also does things like checking for leaks and writing internal profiling information. It's useful if you don't plan to use libvips again in that process, or if you are trying to track down leaks or unexpected memory use (ironically).

@kleisauke
Copy link
Member

In any case I think resizing on demand would produce more image operations rather than fewer, so would make the memory fragmentation issue worse.

FWIW, the weserv/images project uses libvips extensively and we never had problems with memory fragmentation after we migrated our servers to use jemalloc as the default memory allocator.

We were using php-vips 1.x in production around 2017 and decided to migrate to the OpenResty / LuaJIT / lua-vips combo because PHP didn't had a JIT compiler and integrated FFI functionality at that time. Nowadays, the codebase includes an nginx module around libvips, written in C++.

@jcupitt jcupitt closed this as completed Jun 29, 2023
@DiamondMax
Copy link

So, no solution to growing memory consumption over multiple requests?

My PHP-FPM workers grow over 2GB after less than 200 requests. Using php-vips 1.0.10 to shrink JPEG-XL images (from about 4000*3000 to 2400 by larger side), overlay them with PNG image via composite2 and encode them to legacy JPEG to output to browser.

vips_cache_set_max(1), vips_cache_set_max_files(1) do not help .

Had to set php-fpm pm.max_requests to 200 to respawn worker processes otherwise PHP-FPM gets oom_killed....

PHP 8.2.13 with php-vips binary extension
libvips 8.15 (built from sources)
libjxl 0.8.2 (built from sources)
libspng 0.7.3
Debian 12

@jcupitt
Copy link
Member

jcupitt commented Dec 14, 2023

Hi @DiamondMax,

Using php-vips 1.0.10

I would use php-vips 2.3 (current stable).

@Epskampie
Copy link

Epskampie commented Dec 18, 2023

I'm seeing the same issue. Using php-vips v2.3.0 on ubuntu 23.10 which has libvips 8.14.3-1 using apache2 with mod_php, all running inside docker.

After around 100 requests with resize operations, every apache thread takes around 1.2GB of virtual memory (checked using htop). I also checked using docker stats, which shows the container taking around 1.8GB:

CONTAINER ID   NAME       CPU %     MEM USAGE / LIMIT    MEM %     NET I/O           BLOCK I/O         PIDS
845bf1a5dfdf   v4-dev-1   2.00%     1.769GiB / 23.4GiB   7.56%     2.62MB / 31.2MB   35.5MB / 1.19GB   14

Perhaps this is all expected, or I'm doing something wrong, but it's quite surprising to see it use that much memory in its default configuration when used from PHP in a web-server environment.

If it is helpful I could make a reproduction using docker, let me know if that would be of use.

@acoulton
Copy link

FWIW, we never resolved this under apache & mod_php - switching the default system allocator to jemalloc made things worse (presumably due to other aspects of apache / php memory management) and we didn't find a way to use jemalloc just for libvips in this scenario.

Instead, our PHP code in the web process now uses symfony/process to shell out to a standalone PHP script that performs a single set of image transformations then exits. This does create a small overhead due to starting the processes, and having to write & read images to local disk instead of working with them in memory. But the overall memory management works "out of the box" and for our usecase where image requests make up a small % of the total, and latency is a secondary concern to ongoing memory allocation, it works well for us for now. It also saves having to enable php FFI in the apache/web process which is obviously also more secure.

We are also gradually moving image operations off to Cloud Run, but at the moment still using the same approach so that we can reuse our existing apache-based base image. In due course we'll probably revisit that to look using VIPS through php-fpm or possibly even a different language binding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants