r.univar: Add parallel support #1634

aaronsms · 2021-06-11T08:12:05Z

This PR implements parallelization for all r.univar options except for when "extended (stats)" flag is set to true. This is because there are dynamic allocation and sorting involved, which is trickier to implement, but this could be a work for the future.

Checklists before merging:

wenzeslaus

Just some initial things:

Fails on Ubuntu 18.04, but not 20.04 in CI, too old version of OpenMP?

2021-06-12T13:19:51.6016965Z r.univar_main.c: In function ‘process_raster_threaded’:
2021-06-12T13:19:51.6019743Z r.univar_main.c:500:11: error: ‘value_sz’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6020739Z      shared(stats, fd, fdz, raster_row, zoneraster_row, n, sum, sumsq, sum_abs, min, max, size, region, \
2021-06-12T13:19:51.6021406Z            ^
2021-06-12T13:19:51.6022224Z r.univar_main.c:500:11: error: ‘map_type’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6025019Z r.univar_main.c:500:11: error: ‘n_zones’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6027487Z r.univar_main.c:500:11: error: ‘cols’ is predetermined ‘shared’ for ‘shared’
2021-06-12T13:19:51.6028607Z r.univar_main.c:500:11: error: ‘rows’ is predetermined ‘shared’ for ‘shared’

wenzeslaus · 2021-06-13T03:11:59Z

raster/r.univar/testsuite/benchmark_r_univar.py

@@ -0,0 +1,59 @@
+"""Benchmarking of r.univar


test_r_univar.py should not be deleted. This benchmark_r_univar.py file is extra.

Fails on my Linux machine too with the same message. What is the min version of GCC that this parallel code supports?

https://stackoverflow.com/a/13770822/16079666

https://stackoverflow.com/a/13203075/16079666

FYI, gcc 5.5.0

OK, all those variables are const and removing them from shared worked. I think consts are shared by default.

I think I must've accidentally deleted the file when separating the scripts to respective directories, will fix it.

ninsbl · 2021-06-13T20:46:46Z

raster/r.univar/r.univar_main.c

@@ -52,6 +57,14 @@ void set_params()
 	_("Percentile to calculate (requires extended statistics flag)");
    param.percentile->guisection = _("Extended");

+    param.threads = G_define_option();
+    param.threads->key = "nprocs";


Should we add a "nproc" standard option for the parser (like e.g G_OPT_M_NPROCS)?
nprocs is used in several Python scripts as well. So having a standard option could secure a harmonized way to handle such an option...

@ninsbl That's a good idea.

Please check #1644.

Right, I'll modify the code using the standard option.

marisn

My OpenMP knowledge is too weak to judge if this is the right approach :-(

raster/r.univar/r.univar_main.c

HuidaeCho · 2021-06-19T14:54:58Z

I have i5-7300U with 2 cores and 4 threads on my laptop. What does this module do with nprocs > 4? The benchmark script runs fine with up to 12 threads, but I thought my CPU can only do up to 4 threads... I might be wrong. interestingly, more threads didn't always mean faster. See my results below. Is it possible to determine the number of max threads in the code and print a warning if nprocs is greater than that? Also, please consider implementing a fallback terminal size of 80 for redirecting benchmarking outputs; os.get_terminal_size() raises an exception OSError: [Errno 25] Inappropriate ioctl for device on python3 benchmark_r_univar.py >& benchmark_r_univar.log.

r.univar map=elevation,elevation,elevation,elevation,elevation,elevation,elevation,elevation,elevation,elevation percentile=90.0 nprocs=1 separator=pipe -g

Benchmark with 1 thread(s)...
Result - 0.1885438919067383s

Benchmark with 2 thread(s)...
Result - 0.1510328769683838s

Benchmark with 3 thread(s)...
Result - 0.18701410293579102s

Benchmark with 4 thread(s)...
Result - 0.2988687038421631s

Benchmark with 5 thread(s)...
Result - 0.14902639389038086s

Benchmark with 6 thread(s)...
Result - 0.16875758171081542s

Benchmark with 7 thread(s)...
Result - 0.16923255920410157s

Benchmark with 8 thread(s)...
Result - 0.20600790977478028s

Benchmark with 9 thread(s)...
Result - 0.1692127227783203s

Benchmark with 10 thread(s)...
Result - 0.13888721466064452s

Benchmark with 11 thread(s)...
Result - 0.16875271797180175s

Benchmark with 12 thread(s)...
Result - 0.17763447761535645s

r.univar map=elevation,elevation,elevation,elevation,elevation,elevation,elevation,elevation,elevation,elevation zones=basin_50K percentile=90.0 nprocs=1 separator=pipe -g

Benchmark with 1 thread(s)...
Result - 0.27266292572021483s

Benchmark with 2 thread(s)...
Result - 0.17659420967102052s

Benchmark with 3 thread(s)...
Result - 0.2019331455230713s

Benchmark with 4 thread(s)...
Result - 0.42590484619140623s

Benchmark with 5 thread(s)...
Result - 0.2039196491241455s

Benchmark with 6 thread(s)...
Result - 0.1847921848297119s

Benchmark with 7 thread(s)...
Result - 0.1929023265838623s

Benchmark with 8 thread(s)...
Result - 0.22778925895690919s

Benchmark with 9 thread(s)...
Result - 0.3162210464477539s

Benchmark with 10 thread(s)...
Result - 0.2332921504974365s

Benchmark with 11 thread(s)...
Result - 0.2835871696472168s

Benchmark with 12 thread(s)...
Result - 0.24240994453430176s

wenzeslaus · 2021-06-19T20:24:23Z

I tested with 4 cores - 8 threads processor and made some additions to the benchmark script. Here is a test up to nprocs=16, 10 runs, and two additional scenarios. The data is still too small I think, so that's still a todo.

wenzeslaus · 2021-06-19T20:41:34Z

What does this module do with nprocs > 4? ... Is it possible to determine the number of max threads in the code and print a warning if nprocs is greater than that?

I don't think that needs a warning. You need to explicitly say you want n threads and likely you know the number cores/threads on the machine you are using or a look into process manager or specs can tell you. So, warning is really no necessary since likely that's what you meant. Additionally, what will be the result (improvement or degradation) depends on the specific setup, so the warning anyway cannot claim it will be worse.

Well, now, automatically detecting number of cores with nprocs=auto or by default that's a different story!

HuidaeCho · 2021-06-19T22:13:53Z

I don't think that needs a warning. You need to explicitly say you want n threads and likely you know the number cores/threads on the machine

I know I have 4 threads, but I don't know what it's doing with 12 threads when I only have 4. What does it even mean? It needs to be explained at least.

https://forum.openmp.org/viewtopic.php?t=209

HuidaeCho · 2021-06-19T22:17:03Z

I tested with 4 cores - 8 threads processor and made some additions to the benchmark script. Here is a test up to nprocs=16, 10 runs, and two additional scenarios. The data is still too small I think, so that's still a todo.

That looks nice. Interesting it's not happening on my machine.

HuidaeCho · 2021-06-19T22:22:28Z

opm_get_num_threads?

petrasovaa · 2021-07-28T15:29:03Z

r.univar on a desktop with 28 cores, slope map, 16832104560 cells:

Any idea what is going on?

HuidaeCho · 2021-07-30T13:25:15Z

@aaronsms Please check this first module. That jump at 14 processes (half of the 28 cores) is interesting. @petrasovaa What is the name of the CPU? Does it have 28 cores with 56 threads or 14 cores with 28 threads (usually 2 threads per core)?

HuidaeCho · 2021-07-30T13:27:26Z

OK, in the weekly meeting with @aaronsms, @petrasovaa confirmed that it has 14 physical cores and 28 threads. Maybe, it's related to this fact. The only thing is why it jumps at 14, not at 14+1=15 where it starts to fully occupy one core for the first time (?).

Not sure about this actually, is it

core 1 thread 1 => core 1 thread 2 => core 2 thread 1 => core 2 thread 2 => ... (depth first?) OR
core 1 thread 1 => core 2 thread 1 => core 3 thread 1 => ... => core 1 thread 2 => core 2 thread 2 => ... (breadth first?)

marisn

I could not run the benchmark as it ended with OOM in r.surf.fractal with 100000000 cells in a region. Is there something that can be done to adapt to the RAM size of machine instead of failing with OOM and thus losing all results?

raster/r.univar/benchmark/benchmark_r_univar.py

raster/r.univar/r.univar_main.c

wenzeslaus · 2021-08-01T19:56:03Z

Is there something that can be done to adapt to the RAM size of machine instead of failing with OOM and thus losing all results?

Unlike for the test, which should just run everywhere (although skipped in some cases), for benchmarks, we don't have a any portability conceptualized. For example, the grass.benchmark package helps you write a benchmark, but does not tell you how to write it (i.e., you can write a benchmark without using grass.benchmark and that's perfectly fine). So far, we were adapting the benchmark scripts to the test we wanted to do. Suggestions welcome.

As for this particular case (OOM), do you envision the Python code to do some heuristics for memory requirements of r.surf.fractal versus your size of your RAM and do benchmarks according to that?

marisn · 2021-08-02T15:49:39Z

As for this particular case (OOM), do you envision the Python code to do some heuristics for memory requirements of r.surf.fractal versus your size of your RAM and do benchmarks according to that?

Heuristics would be good, but might need too much work. Probably the easiest solution would be to add try: except: around r.surf.fractal calls to not fail if OOM situation is encountered.

petrasovaa · 2021-08-16T20:04:05Z

r.univar on a desktop with 28 cores, slope map, 16832104560 cells:

Any idea what is going on?

Repeated my benchmark with latest code, this looks much better!

aaronsms · 2021-08-18T07:38:06Z

@petrasovaa yea I suspect the issues previously was due to memory bandwidth issues or false sharing due to cache inefficiencies. So I made some effort to refactor such that threads now are less likely to share the same cache for accessing variables, to achieve higher cache hits. So I won't no longer include the issues regarding worsening performance with threads overloading. I believe we need to check this for other modules as well.

Fix flake8: bare exceptions

petrasovaa · 2022-08-27T18:59:00Z

With extended statistics (with nprocs=1) valgrind is getting mad:

==90673== Invalid write of size 2
==90673==    at 0x4842B33: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==90673==    by 0x10C225: process_raster._omp_fn.0 (r.univar_main.c:441)
==90673==    by 0x4A988E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==90673==    by 0x10B6F8: process_raster (r.univar_main.c:342)
==90673==    by 0x10B1FC: main (r.univar_main.c:240)
==90673==  Address 0xee42980 is 0 bytes after a block of size 36,000 alloc'd
==90673==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==90673==    by 0x4899F82: G__realloc (alloc.c:126)
==90673==    by 0x10C344: process_raster._omp_fn.0 (r.univar_main.c:431)
==90673==    by 0x4A988E5: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==90673==    by 0x10B6F8: process_raster (r.univar_main.c:342)
==90673==    by 0x10B1FC: main (r.univar_main.c:240)

Addressed

petrasovaa · 2022-08-30T13:35:07Z

The valgrind problem was fixed. When extended stats are requested, only single thread is used.

Co-authored-by: Anna Petrasova <kratochanna@gmail.com>

aaronsms force-pushed the parallel-r.univar branch from 6e6581b to d9fc79b Compare June 11, 2021 08:17

wenzeslaus added the gsoc Reserved for Google Summer of Code student(s) label Jun 12, 2021

wenzeslaus reviewed Jun 13, 2021

View reviewed changes

ninsbl reviewed Jun 13, 2021

View reviewed changes

HuidaeCho mentioned this pull request Jun 13, 2021

Add a standard option for nprocs= (G_OPT_M_NPROCS) #1644

Merged

aaronsms force-pushed the parallel-r.univar branch from 33021ac to 75967b4 Compare June 17, 2021 16:55

marisn reviewed Jun 19, 2021

View reviewed changes

raster/r.univar/r.univar_main.c Outdated Show resolved Hide resolved

raster/r.univar/r.univar_main.c Outdated Show resolved Hide resolved

marisn previously requested changes Aug 1, 2021

View reviewed changes

raster/r.univar/benchmark/benchmark_r_univar.py Show resolved Hide resolved

raster/r.univar/r.univar_main.c Outdated Show resolved Hide resolved

raster/r.univar/r.univar_main.c Outdated Show resolved Hide resolved

HuidaeCho added raster Related to raster data processing enhancement New feature or request labels Aug 1, 2021

aaronsms force-pushed the parallel-r.univar branch from 17db69d to 9bca1bb Compare August 6, 2021 16:01

wenzeslaus mentioned this pull request Aug 11, 2021

Create new addon r.area.createweight OSGeo/grass-addons#597

Merged

aaronsms force-pushed the parallel-r.univar branch from 9bca1bb to 5fbe1a7 Compare August 13, 2021 12:39

aaronsms force-pushed the parallel-r.univar branch from 5fbe1a7 to e0f335a Compare August 18, 2021 07:34

aaronsms marked this pull request as ready for review August 18, 2021 07:38

wenzeslaus added this to the 8.2.0 milestone Aug 24, 2021

aaronsms and others added 10 commits August 27, 2022 09:07

Add benchmark script

950298d

Refactor and combine two processing entry points

6a81a9e

Allow r.random.surf to fail for benchmarking

744afaa

Fix flake8: bare exceptions

Add performance section

e7c160f

fix indentation to get rid of warnings

701df0c

more indentation changes

965f38c

add test for multiple maps and zones

e918030

add figure to manual page

9c1b219

add parallel keyword

87d5fa3

fix openmp percent reporting

d441024

petrasovaa force-pushed the parallel-r.univar branch from 75b0375 to d441024 Compare August 27, 2022 16:26

petrasovaa added 5 commits August 27, 2022 18:34

indent

b4ad4ee

temporally remove changes

d126c74

merge resolved

c7d05e0

put back changes

6b41798

fix uninitialized value

d7750c0

petrasovaa added 2 commits August 28, 2022 14:23

fix extended statistics segfault, add test

6a6cc65

add another test

f52f07c

petrasovaa approved these changes Aug 30, 2022

View reviewed changes

petrasovaa merged commit 7a51911 into OSGeo:main Aug 30, 2022

petrasovaa mentioned this pull request Aug 30, 2022

[Feat] r.univar: implement OpenMP parallelization also for extended statistics #2564

Closed

ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022

r.univar: Add parallel support (OSGeo#1634)

6862cb9

Co-authored-by: Anna Petrasova <kratochanna@gmail.com>

ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023

r.univar: Add parallel support (OSGeo#1634)

937a4f1

Co-authored-by: Anna Petrasova <kratochanna@gmail.com>

neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023

r.univar: Add parallel support (OSGeo#1634)

99ee6f0

Co-authored-by: Anna Petrasova <kratochanna@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r.univar: Add parallel support #1634

r.univar: Add parallel support #1634

aaronsms commented Jun 11, 2021 •

edited by petrasovaa

Loading

wenzeslaus left a comment

wenzeslaus Jun 13, 2021

HuidaeCho Jun 13, 2021

HuidaeCho Jun 13, 2021

HuidaeCho Jun 13, 2021

HuidaeCho Jun 13, 2021

aaronsms Jun 17, 2021

ninsbl Jun 13, 2021

HuidaeCho Jun 13, 2021

HuidaeCho Jun 13, 2021

aaronsms Jun 17, 2021

marisn left a comment

HuidaeCho commented Jun 19, 2021 •

edited

Loading

wenzeslaus commented Jun 19, 2021

wenzeslaus commented Jun 19, 2021

HuidaeCho commented Jun 19, 2021 •

edited

Loading

HuidaeCho commented Jun 19, 2021

HuidaeCho commented Jun 19, 2021

petrasovaa commented Jul 28, 2021

HuidaeCho commented Jul 30, 2021

HuidaeCho commented Jul 30, 2021 •

edited

Loading

marisn left a comment

wenzeslaus commented Aug 1, 2021

marisn commented Aug 2, 2021

petrasovaa commented Aug 16, 2021

aaronsms commented Aug 18, 2021 •

edited

Loading

petrasovaa commented Aug 27, 2022

petrasovaa commented Aug 30, 2022

r.univar: Add parallel support #1634

r.univar: Add parallel support #1634

Conversation

aaronsms commented Jun 11, 2021 • edited by petrasovaa Loading

wenzeslaus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marisn left a comment

Choose a reason for hiding this comment

HuidaeCho commented Jun 19, 2021 • edited Loading

wenzeslaus commented Jun 19, 2021

wenzeslaus commented Jun 19, 2021

HuidaeCho commented Jun 19, 2021 • edited Loading

HuidaeCho commented Jun 19, 2021

HuidaeCho commented Jun 19, 2021

petrasovaa commented Jul 28, 2021

HuidaeCho commented Jul 30, 2021

HuidaeCho commented Jul 30, 2021 • edited Loading

marisn left a comment

Choose a reason for hiding this comment

wenzeslaus commented Aug 1, 2021

marisn commented Aug 2, 2021

petrasovaa commented Aug 16, 2021

aaronsms commented Aug 18, 2021 • edited Loading

petrasovaa commented Aug 27, 2022

petrasovaa commented Aug 30, 2022

aaronsms commented Jun 11, 2021 •

edited by petrasovaa

Loading

HuidaeCho commented Jun 19, 2021 •

edited

Loading

HuidaeCho commented Jun 19, 2021 •

edited

Loading

HuidaeCho commented Jul 30, 2021 •

edited

Loading

aaronsms commented Aug 18, 2021 •

edited

Loading