Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on choice operation of 'get' and 'sleep' #109

Closed
civodul opened this issue Sep 8, 2024 · 3 comments · Fixed by #110
Closed

Memory leak on choice operation of 'get' and 'sleep' #109

civodul opened this issue Sep 8, 2024 · 3 comments · Fixed by #110
Assignees
Labels

Comments

@civodul
Copy link
Collaborator

civodul commented Sep 8, 2024

The code below exhibits a memory leak with Fibers 1.3.1:

(use-modules (fibers)
             (fibers channels)
             (fibers operations)
             (fibers timers)
             (ice-9 textual-ports)
             (ice-9 match)
             (rnrs bytevectors))

(include "heap-profiler.scm")
(sigaction SIGINT
  (lambda _
    (profile-heap)
    (primitive-exit 0)))

(setvbuf (current-output-port) 'none)
(run-fibers
 (lambda ()
   (define channel
     (make-channel))

   (spawn-fiber
    (lambda ()
      (let loop ()
        (put-message channel 'hello)
        (sleep 0.0001)
        (loop))))

   (spawn-fiber
    (lambda ()
      (let loop ()
        (format #t "~%heap: ~a MiB~%"
                (round
                 (/ (assoc-ref (gc-stats) 'heap-size) (expt 2. 20))))
        (sleep 1)
        (loop))))

   (let loop ()
     (perform-operation
      (choice-operation (wrap-operation (sleep-operation 100)
                                        (const 'timeout))
                        (get-operation channel)))
     (loop)))
 #:hz 0
 #:parallelism 1)

(The heap-profiler.scm file comes from here.)

A typical run looks like this:

$ guile fibers-leak.scm 
WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'

heap: 4.0 MiB

heap: 7.0 MiB

heap: 9.0 MiB

heap: 12.0 MiB

heap: 16.0 MiB

heap: 16.0 MiB

heap: 22.0 MiB

heap: 21.0 MiB

heap: 22.0 MiB

heap: 29.0 MiB

heap: 29.0 MiB

heap: 29.0 MiB

heap: 37.0 MiB

heap: 37.0 MiB

heap: 37.0 MiB
  C-c C-c
;;; (#<procedure heap-sections ()>)

;;; (sampling 100000)
  %   type                               self    avg obj size
 54.4 program                             1,565,456    32.1
 19.8 pair                                  571,072    16.0
  8.0 unknown                               229,216  1776.9
  7.0 struct                                201,056    51.6
  2.5 partial-continuation                   70,816    32.0
  2.0 pointer                                56,560    16.0
  1.8 vector                                 52,363    50.8
  0.8 stringbuf                              23,744    68.2
  0.8 vm-continuation                        23,152    17.3
  0.7 symbol                                 21,408    32.0
  0.7 atomic-box                             20,640    17.2
  0.4 bytevector                             11,120   358.7
  0.2 variable                                6,608    20.5
  0.2 string                                  6,080    32.0
  0.2 smob                                    5,344    32.0
  0.2 heap-number                             4,496    30.2
  0.1 hash-table                              4,256    32.0
  0.1 weak-table                              3,600    28.8
  0.0 primitive                                 768    16.0
  0.0 weak-vector                               384    19.2
  0.0 bitvector                                 224    37.3
  0.0 keyword                                   192    16.0
  0.0 dynamic-state                             176    16.0
  0.0 primitive-generic                          96    32.0
  0.0 fluid                                      48    16.0
  0.0 port                                       32    32.0
sampled heap: 2.74554 MiB (heap size: 36.64453 MiB)

This suggests delimited continuations are being accumuated somewhere. It's all fine when commenting out the sleep operation.

@civodul
Copy link
Collaborator Author

civodul commented Sep 8, 2024

This is because the "block" function of timer-operation calls schedule-task-at-time, adding a thunk to the scheduler's queue, and there's nothing that undoes this when the operation "loses" in a choice operation.

Eventually that thunk gets scheduled and removed from the run queue, but that might take too long and cause the memory leak we're seeing.

In the example above, changing from (sleep-operation 100) to (sleep-operation 1) makes a big difference: heap size quickly stabilizes because those timers are removed every second, which is fast enough.

It seems to me that there needs to be a way to evacuate those sleep-operation timers explicitly when a sleep operation is "canceled", to avoid that problem, but operations don't have a "cancel" function. Thoughts? Cc: @wingo

@civodul civodul added the bug label Sep 8, 2024
civodul added a commit to civodul/fibers that referenced this issue Oct 1, 2024
Fixes wingo#109.

Previously, an operation like:

  (choice-operation (sleep-operation 1234) (get-operation channel))

would accumulate timer wheel entries every time the ‘get’ operation wins
over the ‘sleep’ operation, potentially leading to unbounded memory
usage (each ‘sleep’ timer and its associated continuation would remain
on the wheel for 1234 seconds in this case).

This commit fixes it by removing the timer wheel entry as soon as the
timer operation is canceled.

* fibers/timers.scm (timer-operation)[wheel-entry]: New variable.
Set it in block function.  Add cancel function.
* fibers/scheduler.scm (scheduler-timers): Export.
* tests/cancel-timer.scm: New file.
* Makefile.am (TESTS): Add it.
@emixa-d
Copy link
Collaborator

emixa-d commented Oct 30, 2024

(Assigning the ... assignment based on what's happening)

@LiberalArtist
Copy link

This is because the "block" function of timer-operation calls schedule-task-at-time, adding a thunk to the scheduler's queue, and there's nothing that undoes this when the operation "loses" in a choice operation.

It seems to me that there needs to be a way to evacuate those sleep-operation timers explicitly when a sleep operation is "canceled", to avoid that problem, but operations don't have a "cancel" function. Thoughts? Cc: @wingo

This is what the unique-to-Concurrent ML basic combinator withNack (spelled nack-guard-evt in Racket) is supposed to provide!

See #97 and Reppy, Concurrent Programming in ML § 6.5.

civodul added a commit to civodul/fibers that referenced this issue Nov 6, 2024
Fixes wingo#109.

Previously, an operation like:

  (choice-operation (sleep-operation 1234) (get-operation channel))

would accumulate timer wheel entries every time the ‘get’ operation wins
over the ‘sleep’ operation, potentially leading to unbounded memory
usage (each ‘sleep’ timer and its associated continuation would remain
on the wheel for 1234 seconds in this case).

This commit fixes it by removing the timer wheel entry as soon as the
timer operation is canceled.

* fibers/timers.scm (timer-operation)[wheel-entry]: New variable.
Set it in block function.  Use ‘make-timer-operation/internal’ and add
cancel function.
* fibers/scheduler.scm (scheduler-timers): Export.
* tests/cancel-timer.scm: New file.
* Makefile.am (TESTS): Add it.
civodul added a commit to civodul/fibers that referenced this issue Nov 10, 2024
Fixes wingo#109.

Previously, an operation like:

  (choice-operation (sleep-operation 1234) (get-operation channel))

would accumulate timer wheel entries every time the ‘get’ operation wins
over the ‘sleep’ operation, potentially leading to unbounded memory
usage (each ‘sleep’ timer and its associated continuation would remain
on the wheel for 1234 seconds in this case).

This commit fixes it by removing the timer wheel entry as soon as the
timer operation is canceled.

* fibers/timers.scm (timer-operation)[wheel-entry]: New variable.
Set it in block function.  Use ‘make-timer-operation/internal’ and add
cancel function.
* fibers/scheduler.scm (scheduler-timers): Export.
* tests/cancel-timer.scm: New file.
* Makefile.am (TESTS): Add it.
civodul added a commit that referenced this issue Nov 10, 2024
Fixes #109.

Previously, an operation like:

  (choice-operation (sleep-operation 1234) (get-operation channel))

would accumulate timer wheel entries every time the ‘get’ operation wins
over the ‘sleep’ operation, potentially leading to unbounded memory
usage (each ‘sleep’ timer and its associated continuation would remain
on the wheel for 1234 seconds in this case).

This commit fixes it by removing the timer wheel entry as soon as the
timer operation is canceled.

* fibers/timers.scm (timer-operation)[wheel-entry]: New variable.
Set it in block function.  Use ‘make-timer-operation/internal’ and add
cancel function.
* fibers/scheduler.scm (scheduler-timers): Export.
* tests/cancel-timer.scm: New file.
* Makefile.am (TESTS): Add it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants