Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded dump files #39

Closed
avsdev-cw opened this issue Feb 27, 2019 · 4 comments
Closed

Multi-threaded dump files #39

avsdev-cw opened this issue Feb 27, 2019 · 4 comments

Comments

@avsdev-cw
Copy link

When running in a multi-threaded mode (eg. using future package), any error files get botched if more than one thread throws an error at a time as the file names become non-unique.

Consider adding thread handling or add a process ID to filename (or append milliseconds and use a file exists loop which by its nature will change the filename by a few millis and thus make filenames unique)

@aryoda
Copy link
Owner

aryoda commented Feb 27, 2019

Thanks for reporting this issue (excellent observation - I've never spent a thought about name clashes due to multi-threading).

It would be great if you could provide a minimal reproducible example so that I can examine different solution options.

I am also thinking about another R option to support configuring the file naming pattern for the dump file (which would support a simple work-around to change the name prefix for each thread/sub process without making tryCatchLog aware of running in a multi-threaded context).

It seems difficult to find out the tread [Edit: Removed "and process"] ID of the R session with base R (without using a specialized package) and I want to minimize the package dependencies.

@aryoda
Copy link
Owner

aryoda commented Feb 27, 2019

Internal implementation notes (will be updated incrementally):

@avsdev-cw
Copy link
Author

avsdev-cw commented Feb 28, 2019

Slightly over-engineered reproducible example (suggest saving to file in empty directory & running):

library(future)
library(tryCatchLog)

plan(multiprocess)
options(tryCatchLog.write.error.dump.file = TRUE)

for(ii in 1:10) {
    f1 <- future({ tryCatchLog({ stop("Error 1") }) }, lazy = TRUE)
    f2 <- future({ tryCatchLog({ stop("Error 2") }) }, lazy = TRUE)

    resolve(list(f1, f2))

    dumpFiles <- list.files("./", pattern = "dump_.*\\.rda")
    cat("Found", length(dumpFiles), "dump files\n")

    sapply(dumpFiles, function(dumpFile) {
        e <- new.env(parent = emptyenv())
        load(dumpFile, envir = e)
        cat("\t[", dumpFile, "] Error in file is:", e$last.dump[[length(e$last.dump)]]$log.message, "\n")
    })

    sapply(dumpFiles, file.remove)
}

Note: I used the HenrikBengtsson/future package (available on CRAN) as a source of producing threads

@aryoda aryoda closed this as completed in b713974 Mar 13, 2019
@aryoda
Copy link
Owner

aryoda commented Mar 13, 2019

Bug fixed.

Diagnostics:

Dump files may be overwritten when multiple errors occur at the same second in the same or parallel processes.

Solution:

Create a (hopefully) unique dump file name incl. milliseconds and the process id in the file name, eg.: dump_2019-03-13_at_15-39-33.086_PID_15270.rda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants