Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forwarding to ephemeral condition variables #24

Closed
wlandau opened this issue Nov 29, 2023 · 15 comments
Closed

Forwarding to ephemeral condition variables #24

wlandau opened this issue Nov 29, 2023 · 15 comments

Comments

@wlandau
Copy link

wlandau commented Nov 29, 2023

For controller groups in crew, I would like to be able to wait on an ephemeral condition variable which gets signals from %~>% and whose lifespan is a call to a function. When I simulate many such calls, my R session becomes unresponsive.

library(nanonext)
cv1 <- cv()
f <- function(cv1) {
  cv2 <- cv()
  cv1 %~>% cv2
  until(cv2, 50)
}
for (i in 1:100) {
  print(i)
  f(cv1)
}
1 + 1
@shikokuchuo
Copy link
Owner

May be a bit clunky, but this at the moment is how it works:

library(nanonext)
cv1 <- cv()
f <- function(cv1) {
  cv2 <- cv()
  cv1 %~>% cv2
  until(cv2, 50)
  attr(cv1, "cv") <- NULL
  gc(verbose = FALSE)
}
for (i in 1:100) {
  print(i)
  f(cv1)
}

@wlandau
Copy link
Author

wlandau commented Nov 29, 2023

Thanks for explaining, that definitely avoids problems with my R session hanging. My only concern is the need to call gc(). In targets, that could add up to a gc() call every half second, which could be a burden on the CPU.

@wlandau
Copy link
Author

wlandau commented Nov 29, 2023

Also, in my case, cv1 (which is mirai::nextget("cv")) is forwarding to 2 different CVs: a persistent CV for the inner crew controller, and an ephemeral CV for the call to wait() for the overarching crew controller group. I am only just realizing that this might cause problems.

@wlandau
Copy link
Author

wlandau commented Nov 29, 2023

On the other hand, maybe I won't need ephemeral CVs if I go with my original plan in wlandau/crew#108 (comment). That might be more robust anyway, if I have nextget("cv") forward to controller-specific CVs and then have those CVs forward to a controller group.

@shikokuchuo
Copy link
Owner

Avoiding the gc call should be trivial, but forwarding to more than one cv might involve more complexity than it's worth... what you describe above sounds more robust.

@shikokuchuo
Copy link
Owner

I'm posting a commit which fixes your original issue so that your initial code block runs. If you want to pursue forwarding to multiple cvs pls do open another issue. Probably not impossible, but the forwarding mechanism wasn't designed with this intention at the outset.

shikokuchuo added a commit that referenced this issue Nov 29, 2023
@wlandau
Copy link
Author

wlandau commented Nov 30, 2023

Thanks, @shikokuchuo! I think I figured out a relaying pattern which does not require forwarding to multiple CVs. I will let you know how its goes.

@wlandau
Copy link
Author

wlandau commented Nov 30, 2023

Happy to report that I have completed wlandau/crew#108 and the new tests are passing.

I initially struggled with intermittently hanging until() loops, but those resolved when I stopped paying attention to the cv_value()s of forwarded condition variables. (With the exception of the upstream mirai::nextget("cv"), where the count is needed, is enough for crew to just respond the signals.)

@wlandau
Copy link
Author

wlandau commented Nov 30, 2023

Currently having issues with my other throughput tests though (Ubuntu, https://github.com/wlandau/crew/blob/main/tests/throughput/test-backlog-seconds_idle.R, https://github.com/wlandau/crew/blob/main/tests/throughput/test-backlog-tasks_max.R). I am looking into it to see if it is something to do with crew itself.

@wlandau
Copy link
Author

wlandau commented Nov 30, 2023

@shikokuchuo, here is a mirai-only reprex. On my local Ubuntu machine, the loop hangs towards the end because it is waiting for 6 tasks which are stuck in an unresolved state. I am using mirai 0.11.2.9017 and nanonext 0.10.4.9024. I will test with the CRAN versions to see if it is something to do with the most recent development versions.

library(mirai)
library(nanonext)
library(purrr)

n <- 20L
daemons(n = n, url = "ws://127.0.0.1:5700")
tasks <- replicate(60000, mirai(TRUE))

while(remaining <- sum(map_lgl(tasks, ~.unresolved(.x)))) {
  message("Waiting for ", remaining, " tasks.")
  stats <- status()$daemons
  for (index in seq_len(n)) {
    if (!stats[index, "online"]) {
      launch_local(
        url = rownames(stats)[index],
        maxtasks = 100
      )
    }
  }
  stats <- status()$daemons
  Sys.sleep(2)
}

@wlandau
Copy link
Author

wlandau commented Nov 30, 2023

Restarting my R session in RStudio after running the test above resulted in a crash.

@wlandau
Copy link
Author

wlandau commented Nov 30, 2023

Odd, when I downgraded to mirai 0.11.2 and nanonext 0.10.4, the same problem happens. I wonder what could be happening on my Ubuntu machine. Mac OS seems to be working fine.

@shikokuchuo
Copy link
Owner

I can't reproduce your hang and I develop on Ubuntu as you know. Recent Rstudio upgrades without clearing old config files have caused issues for me - you could try that. Or simply run at an R prompt not using rstudio.

@shikokuchuo
Copy link
Owner

Not only does your mirai-only example run fine (every time), but I just tested all your updated crew tests (build b336368), including the 2 throughput (backlog) ones you mention above, and they also complete fine, on Ubuntu. I'm going to assume you just ran into some configuration issues, unless you can provide more specifics.

@shikokuchuo
Copy link
Owner

The throughput tests issue was solved via shikokuchuo/mirai@d7b4f59, as confirmed here: wlandau/crew#138 (reply in thread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants