Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R process not releasing memory; Want aggressive garbage collection #927

Closed
schloerke opened this issue Nov 17, 2023 · 1 comment
Closed

Comments

@schloerke
Copy link
Collaborator

Example application or steps to reproduce the problem

Router:

library(plumber)
library(readr)

#* @apiTitle Plumber Test Memory Leak

#* Download data
#* @get /download_data
#* @serializer csv
function() {
	data <- data.frame(
		COLUMN_1 = rep("COLUMN_1",1e+8),
		COLUMN_2 = rep("COLUMN_2",1e+8),
		COLUMN_3 = rep("COLUMN_3",1e+8),
		COLUMN_4 = rep("COLUMN_4",1e+8),
		COLUMN_5 = rep("COLUMN_5",1e+8)
	)
	return(data)
}

#* Release Memory
#* @get /release_memory
function() {
	gc()
	return("ok")
}

Describe the problem in detail

Using #496 (comment) (where LD_PRELOAD is being set to a better malloc library), we can see we have a smaller final footprint after gc() is called. But we need to manually trigger gc() to make the R process footprint reduce in size.

How can we call gc() and not slow down our routes?

@schloerke
Copy link
Collaborator Author

It is possible to add a postserialize hook that will run after the serialization has occurred.

Ex:

library(plumber)
pr() %>%
  pr_hook("postserialize", function(req){
    message("Routing a request for ", req$PATH_INFO)
    # Only run this hook if the request is for the root path
    if (req$PATH_INFO == "/") {
      message("in postserialize")
      later::later(function() {
        message("in postserialize later")
        message("calling gc()!")
        gc()
      }, delay = 0)
      message("exiting postserialize")
    }
  }) %>%
  pr_handle("GET", "/", function(){
    message("in route")
    123
  }) %>%
  pr_run()

Running a request against / gives the these messages in the console:

in route
Routing a request for /
in postserialize
exiting postserialize
in postserialize later
calling gc()!

It shows that the gc() call happens after the postserialize has exited (and also after the response has been sent (not shown in print statements)).

The logic could be updated to work for every route, but that is a little too aggressive. Try to limit your calls to gc() as it does take tangible time to run. It is recommended to only done for routes that are believed / known to need a lot of memory cleanup.

Why wouldn't you add it earlier in the execution of the route?

It is possible that your route uses promises or future to keep the main R worker free to execute other requests. We should only run gc() when we're needing to reduce the large footprint. If we add it earlier and a promise-like route executes, then the gc() would run before the promise-like route is resolved... which would leave the larger footprint from the route (until a followup gc() is called).

Ideally, we naturally would make a larger footprint from a route and when everything for the route has completed, we call gc() to reduce the memory footprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant