-
-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r.texture: support parallel computing by OpenMP #3857
Conversation
I checked the specs of your CPU. It is a 4 cores/8 threads CPU using Intel Hyper-Threading Technology. I believe it is just a pipelining technique to reduce idle instructions. We could speed up daily-use instructions by using two threads per core. However, for heavy computing programs, one thread can exhaust all computing power in a core, so using threads more than the number of cores may not further reduce runtime. The v.surf. rst benchmark from your CPU has the same behavior. The runtime does not reduce after 4 threads. (https://github.com/cyliang368/grass/blob/c7cbf386c97a2a60f0b7319addc54d811e5b5a16/vector/v.surf.rst/vsurfrst_benchmark.png) The easiest way to check may be to run the tool with 4 threads and check if the CPU usage is 100% on the performance monitor. |
I will try to run it on my desktop and see if I get different behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please convert the images to pngs? Also the naming of the images should start with r_texture_... and they should be at the same level as the html file (not in figure directory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you!
threads = atoi(parm.nproc->answer); | ||
#if defined(_OPENMP) | ||
/* Set the number of threads */ | ||
omp_set_num_threads(threads); | ||
if (threads > 1) | ||
G_message(_("Using %d threads for parallel computing."), threads); | ||
#else | ||
if (threads > 1) { | ||
G_warning(_("GRASS GIS is not compiled with OpenMP support, parallel " | ||
"computation is disabled.")); | ||
threads = 1; | ||
} | ||
#endif | ||
execute_texture(data, &dim, measure_menu, measure_idx, &out_set, threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyliang368 Since the PR was merged, and later on a coverity scan was made, it created a new defect here.
Do you see if you're able to fix it in a new small PR? I don't completely see yet how it can divide by zero through a modulo.
** CID 1548584: Integer handling issues (DIVIDE_BY_ZERO)
________________________________________________________________________________________________________
*** CID 1548584: Integer handling issues (DIVIDE_BY_ZERO)
/raster/r.texture/main.c: 328 in main()
322 if (threads > 1) {
323 G_warning(_("GRASS GIS is not compiled with OpenMP support, parallel "
324 "computation is disabled."));
325 threads = 1;
326 }
327 #endif
CID 1548584: Integer handling issues (DIVIDE_BY_ZERO)
In function call "execute_texture", modulo by expression "threads" which may be zero has undefined behavior.
328 execute_texture(data, &dim, measure_menu, measure_idx, &out_set, threads);
329
330 for (i = 0; i < dim.n_outputs; i++) {
331 Rast_close(outfd[i]);
332 Rast_short_history(mapname[i], "raster", &history);
333 Rast_command_history(&history);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know when DIVIDE_BY_ZERO will happen, but I am sure the number of threads shouldn't be 0 or negative. I just submitted a new PR (#3917) to fix it.
@cyliang368 Just curious - do you have any insights about RAM usage increase through this parallelization, esp. with large raster maps? |
@neteler Through #3785 and #3857, the memory used for two local variables will increase with multithreading:
For a large raster with many columns, the first one makes the most increase in RAM usage. Let's say a raster has 100M cells (10k x 10k), and we want to get all texture measurements (13). I assume |
@cyliang368 From what you explained, different threads never read in the same rows? |
@HuidaeCho Different threads never write to the same row, but the values they read should have some overlaps. Here are the brief steps of what this module does:
|
This would mean the module being unable to process rasters larger tan RAM, am I correct? |
You are correct! I am curious about the computational time used by ROWIO. ROWIO fetches data from disk in every iteration, and its latency (thousands of cycles or more) could be hundreds of times longer than fetching from RAM (200~400 cycles). If we fetch all data from disk to RAM, this latency only happens once. If we do ROWIO, this latency happens in every iteration. I remember that some modules set a maximum RAM usage size. ROWIO will be used when the data size is larger than that. Otherwise, we just store all the data at once. Dividing a raster into multiple tiles could also be a solution, but I don't know how this works in GRASS. Has anyone compared the performance between ROWIO and tiling? I am not sure whether my understanding above is correct. Overall, this module cannot handle large-size data well now, but I won't improve it immediately. Instead, I will probably post this enhancement as an issue, then do something else and come back later. |
This is the reason why I mentioned that improvements in ROWIO or SEGMENT libraries beneficial for working with large datasets / in parallel should be made.
Yes, this is an approach I took (keep all data or at least keep all rows hit by a sliding window in RAM). I only added a wrapper in between that hides internals (does row comes from RAM or is per-fetched from a disk does not impact the outcome of computation, just its speed).
Disk layout of raster data is row based thus ROWIO is the closest to the native layout. If a sliding window uses size > 1, then there is also a problem on vertical edges of tiles (with ROWIO only with horizontal ones). Thus a generic / best practice example would be nice. But this is not a discussion for this PR, it is a more general issue.
No need to. This is not the only module that can not handle datasets that do not fit into RAM. The only thing you must do is to create a new PR with a note in the module documentation explaining this limitation. |
Just to clarify, @cyliang368 did not change this behavior, so he is not responsible for fixing documentation, anybody can do this. |
* refactor to use local vars instead of global vars * refactor to seperate execute part for parallelization * remove unused variable * divide h_mearsure.h into paired header files * refactor parameters and flags in main.c * remove unused variable 'threads' in main.c * add execute_parallel.c * parallelize execute, add tests and benchmarks * rename test and benchmark files, integrate parallel part in execute.c * Run benchmark on all methods * get thread id only if OpenMP is defined * replace omp firstprivate with omp private * remove tid, and only use trow to aviod no OpenMP support situation * add keyword "parallel" in main.c * replace the list of functions with the -a flag in benchmark * add benchmark for window sizes, plot speedup and efficiency * remove basename list in benchmark, add benchmark results to html * replace 'x' with '&OSGeo#215' for window sizes in html * breakdown long lines in html * remove known issue in r.texture.html * modify the formats and paths of figures
This PR parallelizes
execute.c
inr.texture
module by OpenMP and creates parallelization benchmarks. It was tested on my fork repository (cyliang368#9).