-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel processing hangs up on run_swat2012 #38
Comments
Hi @seyounger This error is new to me. A few things that I could think of:
The first two points you could check, the last one I could give a try and see if I can reproduce your issue. Otherwise it will be difficult for me to find a solution. |
Thanks, @chrisschuerz, this is helpful. Can you tell me what versions of packages SWATplusR is built on? I tried to find that info but didn't know where to look. |
In the R package DESCRIPTION file I usually add the version that I built and tested the R package with. This is to avoid that users use much older versions of the R package dependencies. But of course it could happen that newer versions also cause a mess up. I will update my packages and see if I run into troubles as well. |
@seyounger A quick update, I just updated all my R package dependencies and ran the demo model with 8 parameter combinations on 4 cores. I indeed encountered an issue with the updated R packages. It was unfortunately a different one compared to what you described above. So I am not sure if the fix I just did to solve the issue that I found is any halpful to you. |
Thanks for checking. I installed the updated version but that didn't help. Then I set all my packages to the same or as close to the Description version as possible but I'm still experiencing hang ups with no message at all it just stops. I'm now having the same issue with single core optimization runs as well. They will go for several hundred runs but hang before completion. Let me know if there is anything else I can do to help find the problem. Here is my current session info.
|
Hello @seyounger, I am having the same issue, I believe the problem is one of the packages that that works with parallelization of the process - mine just stopped after few updates, and even doing downgrade of the package I could not find a solution. To solve temporarily my problem I switch to Microsoft R Open 4.0.2, and so far so good. I do not know to explain the technical details, but I hope you can also find a solution for your problem. R version 4.0.2 (2020-06-22) Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): |
Thanks @EdbertoLima I tried your method but was unable to install SWATplusR under that version of MRO due to dependency incompatibility processx 3.4.5 is required but only 3.4.3 is available. |
Okay, I got processx 3.4.5 installed by downloading the tar.gz from the archive and am running under RRO 4.0.2. I'll report back if this works for me or not. |
The workaround from @EdbertoLima to use MRO 4.0.2 seems to work. I've tested it successfully on 3 windows machines. Now that I have a working library I have archived it just in case anything changes and would recommend anyone with a working library to back it up because it may not stay that way. I'm leaving this issue open because this is a workaround not a fix. I wish I understood the code structure better to help troubleshoot. |
@seyounger and @EdbertoLima sorry that it took me so long to figure out what is going on with paralel processing. It took me already a while to systematically trigger the issue that I can work on it. But as I now updated Edit: New |
Issue #38 was introduced with an updated of `readr` to version 2.0. `readr` 2.0 introduced lazy reading, that locked input and output files on Windows. This resulted in hang up of simulation. lazy = FALSE was set in readr::read_lines and readr::read_fwf, readr::read_lines was replaced with base::readLines.
I was previously running many swat 2012 iterations on multiple cores without issue, but now my workers are getting hung up. This happens with the demo project as well as my projects. Sometime I get errors like "forrtl: The operation cannot be performed on a file with a user-mapped section open." Other times there is no error it just gets stuck in an infinite loop and the workers keep running even after closing R.
For example, the run below gets hung about 30 percent of the times that I run it. Usually says performing simulation 308 of 400 but never advances, sometimes it hangs on the last run. Hang ups seem more frequent with more complicated parameter sets.
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sensitivity_1.26.0 lubridate_1.7.10 fast_0.64 sf_1.0-1
[5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[9] readr_2.0.0 tidyr_1.1.3 tibble_3.1.3 ggplot2_3.3.5
[13] tidyverse_1.3.1 hydroGOF_0.4-0 zoo_1.8-9 SWATplusR_0.3.5.1
loaded via a namespace (and not attached):
[1] fs_1.5.0 xts_0.12.1 bit64_4.0.5 httr_1.4.2
[5] tools_4.1.0 backports_1.2.1 utf8_1.2.2 R6_2.5.0
[9] KernSmooth_2.23-20 DBI_1.1.1 colorspace_2.0-2 withr_2.4.2
[13] sp_1.4-5 tidyselect_1.1.1 processx_3.5.2 hydroTSM_0.6-0
[17] bit_4.0.4 compiler_4.1.0 automap_1.0-14 cli_3.0.1
[21] rvest_1.0.1 gstat_2.0-7 xml2_1.3.2 scales_1.1.1
[25] classInt_0.4-3 proxy_0.4-26 digest_0.6.27 foreign_0.8-81
[29] rmarkdown_2.9 pkgconfig_2.0.3 htmltools_0.5.1.1 dbplyr_2.1.1
[33] fastmap_1.1.0 rlang_0.4.11 readxl_1.3.1 numbers_0.8-2
[37] rstudioapi_0.13 RSQLite_2.2.7 FNN_1.1.3 generics_0.1.0
[41] jsonlite_1.7.2 magrittr_2.0.1 Rcpp_1.0.7 munsell_0.5.0
[45] fansi_0.5.0 lifecycle_1.0.0 stringi_1.7.3 yaml_2.2.1
[49] plyr_1.8.6 grid_4.1.0 maptools_1.1-1 blob_1.2.2
[53] parallel_4.1.0 crayon_1.4.1 doSNOW_1.0.19 lattice_0.20-44
[57] haven_2.4.1 hms_1.1.0 knitr_1.33 ps_1.6.0
[61] pillar_1.6.1 boot_1.3-28 spacetime_1.2-5 codetools_0.2-18
[65] reprex_2.0.0 glue_1.4.2 evaluate_0.14 modelr_0.1.8
[69] vctrs_0.3.8 tzdb_0.1.2 foreach_1.5.1 cellranger_1.1.0
[73] gtable_0.3.0 reshape_0.8.8 assertthat_0.2.1 cachem_1.0.5
[77] xfun_0.24 broom_0.7.9 e1071_1.7-7 class_7.3-19
[81] snow_0.4-3 intervals_0.15.2 iterators_1.0.13 memoise_2.0.0
[85] units_0.7-2 ellipsis_0.3.2
The text was updated successfully, but these errors were encountered: