Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r.patch: implement parallelization with OpenMP #1782

Merged
merged 9 commits into from
Mar 4, 2022

Conversation

aaronsms
Copy link
Contributor

@aaronsms aaronsms commented Aug 10, 2021

Checklists before merging:

  • code review
  • CI passes
  • performance section in documentation
  • confirm values in test are from the old version (run new tests with old code)
  • run tests without openmp
  • visual check of results with custom data ("looks good" with non NC SPM dataset)
  • check that it works with really large data (16B cells)
  • run multi-core benchmark (no degraded performance with many threads)
  • run one core benchmark on many resolutions or many cell
  • rebase to main


int computed = 0;
int written = 0;

Copy link
Member

@HuidaeCho HuidaeCho Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row = 0; to avoid

main.c:250:5: warning: ‘row’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     G_percent(row, nrows, 2);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap fixed it! Supposed to be G_percent(nrows, nrows, 2).

@HuidaeCho
Copy link
Member

Please add a test script.

@wenzeslaus wenzeslaus added enhancement New feature or request gsoc Reserved for Google Summer of Code student(s) raster Related to raster data processing labels Aug 24, 2021
@wenzeslaus wenzeslaus added this to the 8.2.0 milestone Aug 24, 2021
@wenzeslaus
Copy link
Member

Compiling with GCC gives:

main.c: In function ‘main’:
main.c:250:5: warning: ‘row’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  250 |     G_percent(row, nrows, 2);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~

@wenzeslaus
Copy link
Member

This will also need a update/rebase/merge with the main branch because there is a new flake8 job expected which needs to be completed before the merge to main can happen.

@aaronsms
Copy link
Contributor Author

aaronsms commented Dec 5, 2021

I have fixed the compilation issues as well as rebased to the main branch.

@petrasovaa
Copy link
Contributor

Results from the benchmark:
rpatch_benchmark

@petrasovaa
Copy link
Contributor

petrasovaa commented Jan 20, 2022

Here is a benchmark patching multiple tiles in 16 bil raster:

rpatch_benchmark_large_1

@aaronsms Any idea why the memory has no effect?

@aaronsms
Copy link
Contributor Author

aaronsms commented Jan 24, 2022

@aaronsms Any idea why the memory has no effect?

The result is interesting, I should expect the 4000MB should not be worse than 300MB at the very least (is it possible to run in between 300MB and 4000MB?). The memory option allows user to choose the amount of memory to allocate to enable parallelism, but it does not always improve the performance, as long as it is not below some level of memory, where the chunk size gets too small and the threads start waiting for each other.

This begs the question: is the memory option then needed, if it turns out that even at very large files (16B), this optimal threshold of memory does not grow very large, then since the user should expect to use 200/300 MB anyway, perhaps that can be hidden from the user?

Initially, the memory option is added because the initial implementation use close to nothing memory, but to enable parallelism, we need to start using the memory, and currently we are allowing the users to choose that amount.

@petrasovaa
Copy link
Contributor

Here is a benchmark on a much smaller area, looks like with memory=0 there seems to be some slow down with a lot threads.

rpatch_benchmark_large

@aaronsms
Copy link
Contributor Author

For memory=0, it auto sets to the bare minimum required for it to work, which is the number of threads * 1 row.

@petrasovaa petrasovaa self-requested a review February 27, 2022 05:08
@petrasovaa petrasovaa merged commit cb68b54 into OSGeo:main Mar 4, 2022
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023
neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gsoc Reserved for Google Summer of Code student(s) raster Related to raster data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants