Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r.watershed: Handle large datasets greater than INT32_MAX in cells #2884

Merged
merged 6 commits into from
Mar 22, 2024

Conversation

HuidaeCho
Copy link
Member

@HuidaeCho HuidaeCho commented Mar 11, 2023

This PR fixes an unintended integer overflow by the int size_array(..., int, int) function. The new return type of this function is size_t, and takes size_t nrows and ncols. Before this PR, r.watershed was not able to allocate 4 * 43994 * 49529 (about 8.1 GB) because of this integer overflow.

Failed line:

alt = (CELL *)G_malloc(sizeof(CELL) * size_array(&alt_seg, nrows, ncols));

Error:

SECTION 1a (of 4): Initiating Memory.
Current region rows: 43994, cols: 49529
ERROR: G_malloc: unable to allocate 18446744065248018020 bytes of memory at
       raster/r.watershed/ram/init_vars.c:149
WARNING: Subprocess failed with exit code 1

Test code:

#include <stdio.h>

int main(){
    int nrows = 43994;
    int ncols = 49529;
    /* expected size */
    size_t size_expected = sizeof(int) * nrows * ncols;
     /* before this PR with size_array() */
    size_t size_overflow_1 = sizeof(int) * (int)(nrows * ncols);
     /* int size_array() => size_t */
    size_t size_overflow_2 = sizeof(int) * (size_t)(nrows * ncols);
     /* int nrows & ncols for size_array() => size_t */
    size_t size_overflow_3 = sizeof(int) * (int)((size_t)nrows * (size_t)ncols);
     /* this PR */
    size_t size_fixed = sizeof(int) * (size_t)((size_t)nrows * (size_t)ncols);

    printf("size_expected = %zu\n", size_expected);
    printf("size_overflow_1 = %zu\n", size_overflow_1);
    printf("size_overflow_2 = %zu\n", size_overflow_2);
    printf("size_overflow_3 = %zu\n", size_overflow_3);
    printf("size_fixed = %zu\n", size_fixed);
}

Output:

size_expected = 8715915304
size_overflow_1 = 18446744065245597736
size_overflow_2 = 18446744065245597736
size_overflow_3 = 18446744065245597736
size_fixed = 8715915304

@HuidaeCho HuidaeCho added bug Something isn't working raster Related to raster data processing labels Mar 11, 2023
Copy link
Member

@wenzeslaus wenzeslaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever we multiply rows and columns, they should be size_t before multiplication, so the reasoning here fits that.

Can you provide documentation for size_array while at it?

@HuidaeCho
Copy link
Member Author

HuidaeCho commented Mar 11, 2023

Whenever we multiply rows and columns, they should be size_t before multiplication, so the reasoning here fits that.

Can you provide documentation for size_array while at it?

Unfortunately, it still cannot handle larger inputs. Some arrays and variables need to be converted to size_t as well. I'm working on it..., but that can mean more memory consumption for smaller datasets depending on the size of size_t (8 vs. 4 for int in my 64-bit Linux).

@HuidaeCho HuidaeCho marked this pull request as draft March 11, 2023 19:13
@HuidaeCho HuidaeCho changed the title r.watershed: Fix G_malloc size r.watershed: Handle large datasets greater than INT32_MAX in cells Mar 11, 2023
@HuidaeCho HuidaeCho added enhancement New feature or request and removed bug Something isn't working labels Jun 5, 2023
@HuidaeCho HuidaeCho marked this pull request as ready for review June 5, 2023 06:32
@HuidaeCho
Copy link
Member Author

HuidaeCho commented Jun 6, 2023

Verified identical outputs before and after this PR.

For our records, we still have these issues with the -m flag: #2222 and #2482, but they are not related to this PR.

@neteler neteler added this to the 8.4.0 milestone Aug 16, 2023
@landam
Copy link
Member

landam commented Nov 20, 2023

@HuidaeCho Are you planning to merge this PR?

@HuidaeCho
Copy link
Member Author

@HuidaeCho Are you planning to merge this PR?

Do you know how to re-run expired failed tests without making a small commit just to revive them?

@nilason
Copy link
Contributor

nilason commented Nov 21, 2023

@HuidaeCho Are you planning to merge this PR?

Do you know how to re-run expired failed tests without making a small commit just to revive them?

You can always rebase.

@nilason
Copy link
Contributor

nilason commented Nov 21, 2023

For more recently run jobs you can re-run, by going to "Details" and in the upper right side you have:

screen

@github-actions github-actions bot added C Related code is in C module labels Jan 9, 2024
@echoix
Copy link
Member

echoix commented Mar 21, 2024

@HuidaeCho would you like to do a rebase for a fresh CI (101 PRs behind), then merge? If I do it I would appear as co-author. This PR looks ready to go

@HuidaeCho
Copy link
Member Author

@HuidaeCho would you like to do a rebase for a fresh CI (101 PRs behind), then merge? If I do it I would appear as co-author. This PR looks ready to go

@echoix Just rebased it.

@echoix echoix merged commit 259b768 into OSGeo:main Mar 22, 2024
25 checks passed
@wenzeslaus
Copy link
Member

...If I do it I would appear as co-author...

But the co-author is determined (only) by the text in the commit message, no? I think in these cases, the auto-generated co-author should be deleted by whoever is merging it.

@echoix
Copy link
Member

echoix commented Mar 22, 2024

...If I do it I would appear as co-author...

But the co-author is determined (only) by the text in the commit message, no? I think in these cases, the auto-generated co-author should be deleted by whoever is merging it.

Ya, that's what I would have done if it wasn't him, that was still active. The PR already had some rebases done, so I assumed the cleanliness of the commits was important for the author. And it also indirectly asked for a little review to know if everything was still alright with the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C Related code is in C enhancement New feature or request module raster Related to raster data processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants