-
-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r.univar: Add parallel support #1634
Conversation
6e6581b
to
d9fc79b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial things:
Fails on Ubuntu 18.04, but not 20.04 in CI, too old version of OpenMP?
2021-06-12T13:19:51.6016965Z r.univar_main.c: In function ‘process_raster_threaded’:
2021-06-12T13:19:51.6019743Z r.univar_main.c:500:11: error: ‘value_sz’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6020739Z shared(stats, fd, fdz, raster_row, zoneraster_row, n, sum, sumsq, sum_abs, min, max, size, region, \
2021-06-12T13:19:51.6021406Z ^
2021-06-12T13:19:51.6022224Z r.univar_main.c:500:11: error: ‘map_type’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6025019Z r.univar_main.c:500:11: error: ‘n_zones’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6027487Z r.univar_main.c:500:11: error: ‘cols’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6028607Z r.univar_main.c:500:11: error: ‘rows’ is predetermined ‘shared’ for ‘shared’
@@ -0,0 +1,59 @@ | |||
"""Benchmarking of r.univar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_r_univar.py should not be deleted. This benchmark_r_univar.py file is extra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fails on my Linux machine too with the same message. What is the min version of GCC that this parallel code supports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, gcc 5.5.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, all those variables are const
and removing them from shared
worked. I think const
s are shared by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I must've accidentally deleted the file when separating the scripts to respective directories, will fix it.
raster/r.univar/r.univar_main.c
Outdated
@@ -52,6 +57,14 @@ void set_params() | |||
_("Percentile to calculate (requires extended statistics flag)"); | |||
param.percentile->guisection = _("Extended"); | |||
|
|||
param.threads = G_define_option(); | |||
param.threads->key = "nprocs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a "nproc" standard option for the parser (like e.g G_OPT_M_NPROCS)?
nprocs
is used in several Python scripts as well. So having a standard option could secure a harmonized way to handle such an option...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ninsbl That's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check #1644.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'll modify the code using the standard option.
33021ac
to
75967b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My OpenMP knowledge is too weak to judge if this is the right approach :-(
I have i5-7300U with 2 cores and 4 threads on my laptop. What does this module do with
|
I don't think that needs a warning. You need to explicitly say you want n threads and likely you know the number cores/threads on the machine you are using or a look into process manager or specs can tell you. So, warning is really no necessary since likely that's what you meant. Additionally, what will be the result (improvement or degradation) depends on the specific setup, so the warning anyway cannot claim it will be worse. Well, now, automatically detecting number of cores with nprocs=auto or by default that's a different story! |
I know I have 4 threads, but I don't know what it's doing with 12 threads when I only have 4. What does it even mean? It needs to be explained at least. |
|
@aaronsms Please check this first module. That jump at 14 processes (half of the 28 cores) is interesting. @petrasovaa What is the name of the CPU? Does it have 28 cores with 56 threads or 14 cores with 28 threads (usually 2 threads per core)? |
OK, in the weekly meeting with @aaronsms, @petrasovaa confirmed that it has 14 physical cores and 28 threads. Maybe, it's related to this fact. The only thing is why it jumps at 14, not at 14+1=15 where it starts to fully occupy one core for the first time (?). Not sure about this actually, is it
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not run the benchmark as it ended with OOM in r.surf.fractal with 100000000 cells in a region. Is there something that can be done to adapt to the RAM size of machine instead of failing with OOM and thus losing all results?
Unlike for the test, which should just run everywhere (although skipped in some cases), for benchmarks, we don't have a any portability conceptualized. For example, the grass.benchmark package helps you write a benchmark, but does not tell you how to write it (i.e., you can write a benchmark without using grass.benchmark and that's perfectly fine). So far, we were adapting the benchmark scripts to the test we wanted to do. Suggestions welcome. As for this particular case (OOM), do you envision the Python code to do some heuristics for memory requirements of r.surf.fractal versus your size of your RAM and do benchmarks according to that? |
Heuristics would be good, but might need too much work. Probably the easiest solution would be to add try: except: around r.surf.fractal calls to not fail if OOM situation is encountered. |
17db69d
to
9bca1bb
Compare
9bca1bb
to
5fbe1a7
Compare
5fbe1a7
to
e0f335a
Compare
@petrasovaa yea I suspect the issues previously was due to memory bandwidth issues or false sharing due to cache inefficiencies. So I made some effort to refactor such that threads now are less likely to share the same cache for accessing variables, to achieve higher cache hits. So I won't no longer include the issues regarding worsening performance with threads overloading. I believe we need to check this for other modules as well. |
Fix flake8: bare exceptions
75b0375
to
d441024
Compare
With extended statistics (with nprocs=1) valgrind is getting mad:
|
The valgrind problem was fixed. When extended stats are requested, only single thread is used. |
Co-authored-by: Anna Petrasova <kratochanna@gmail.com>
Co-authored-by: Anna Petrasova <kratochanna@gmail.com>
Co-authored-by: Anna Petrasova <kratochanna@gmail.com>
This PR implements parallelization for all r.univar options except for when "extended (stats)" flag is set to true. This is because there are dynamic allocation and sorting involved, which is trickier to implement, but this could be a work for the future.
Checklists before merging: