Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r.surf.random: double min max for DCELL and r.mapcalc-style random numbers #322

Merged
merged 3 commits into from
Dec 2, 2021

Conversation

wenzeslaus
Copy link
Member

The original code did not allow for floating point numbers
as min and max parameters. The decimal part, if provided, was ignored without
any warning. Now the code accepts doubles for DCELL output and ints for
CELL output. Numbers not fully parseable as ints are now rejected by an
explicit check. (The min and max values travel through the code as doubles
in both cases.)

With the original implementation, the max was could have been exceeded for CELL
maps due to the equation used. On the other hand, the min might not have been reached
with the similar probability as max could have been exceeded for ranges
which contained zero.

For example, in 7.6.1 using 10,000,0000 cells and
-i (CELL output), parameters min=2 max=15 produce histogram (r.report u=p,c)
with bins 2-15 approximatelly 7.14-15% and bin 16 with 2-4 cells.
Same setup with min=-2 max=13 produces similar bin 14 with 0-5 cells,
bin -2 is 0 or 1 cell, bins -1,1-15 6.25% and bin 0 12.50%.

Now using the method r.mapcalc is using for rand()
implemented in f_rand() in xrand.c which does not suffer from the same issues.
For consistency, using the same implementation also for DCELL,
so now both CELL and DCELL options are now using functions from gis.h:
G_mrand48() and G_drand48(). Consequently, the gmath.h dependency
was removed (header file includes and in the Makefile) together with unused math.h.

Additionally to enforcing the int, a parameter check also enforces that min <= max
since min > max seems like a input error. (This is different from r.mapcalc which
flips the values, but there we assume less control over the input and it is too
late to report a message, so it is appropriate and more practical to be more
permissive.) min == max is allowed and not treated in a special way.

Raster history is now being written as well as a map title which contains
the range with interval written differently for CELL and DCELL as
the CELL contains max while DCELL does not (based on [0,1) from G_drand48()).

Test is provided for the min-max comparison and int-double check.
A larger test provided for the generated values as well, although
even with a larger number of cells the desired result might not achieved.
A fixed GRASS_RANDOM_SEED is provided for stability (but it may not be the
one making a future problem to show, although it does for 7.6.2).
Also the exceeded max is much easier to catch than the not reached min,
but both range and closeness are checked for both CELL and DCELL.

@wenzeslaus wenzeslaus added bug Something isn't working enhancement New feature or request labels Feb 1, 2020
@wenzeslaus wenzeslaus self-assigned this Feb 1, 2020
@wenzeslaus wenzeslaus added this to the 8.0.0 milestone Jul 28, 2021
@neteler
Copy link
Member

neteler commented Nov 24, 2021

(flake8 refresh)

@neteler neteler closed this Nov 24, 2021
@neteler neteler reopened this Nov 24, 2021
…mbers

The original code did not allow for floating point numbers
as min and max parameters. The decimal part, if provided, was ignored without
any warning. Now the code accepts doubles for DCELL output and ints for
CELL output. Numbers not fully parseable as ints are now rejected by an
explicit check. (The min and max values travel through the code as doubles
in both cases.)

With the original implementation, the max was could have been exceeded for CELL
maps due to the equation used. On the other hand, the min might not have been reached
with the similar probability as max could have been exceeded for ranges
which contained zero.

For example, in 7.6.1 using 10,000,0000 cells and
-i (CELL output), parameters min=2 max=15 produce histogram (r.report u=p,c)
with bins 2-15 approximatelly 7.14-15% and bin 16 with 2-4 cells.
Same setup with min=-2 max=13 produces similar bin 14 with 0-5 cells,
bin -2 is 0 or 1 cell, bins -1,1-15 6.25% and bin 0 12.50%.

Now using the method r.mapcalc is using for rand()
implemented in f_rand() in xrand.c which does not suffer from the same issues.
For consistency, using the same implementation also for DCELL,
so now both CELL and DCELL options are now using functions from gis.h:
G_mrand48() and G_drand48(). Cconsequently, the gmath.h dependency
was removed (header file includes and in the Makefile) together with unused math.h.

Additionally to enforcing the int, a parameter check also enforces that min <= max
since min > max seems like a input error. (This is different from r.mapcalc which
flips the values, but there we assume less control over the input and it is too
late to report a message, so it is appropriate and more practical to be more
permissive.) min == max is allowed and not treated in a special way.

Raster history is now being written as well as a map title which contains
the range with interval written differently for CELL and DCELL as
the CELL contains max while DCELL does not (based on [0,1) from G_drand48()).

Test is provided for the min-max comparison and int-double check.
A larger test provided for the generated values as well, although
even with a larger number of cells the desired result might not achieved.
A fixed GRASS_RANDOM_SEED is provided for stability (but it may not be the
one making a future problem to show, although it does for 7.6.2).
Also the exceeded max is much easier to catch than the not reached min,
but both range and closeness are checked for both CELL and DCELL.
@wenzeslaus wenzeslaus merged commit 866c7f0 into OSGeo:main Dec 2, 2021
@wenzeslaus wenzeslaus deleted the r-surf-random-double-range branch December 2, 2021 02:01
neteler pushed a commit that referenced this pull request Dec 2, 2021
…mbers (#322)

The original code did not allow for floating point numbers
as min and max parameters. The decimal part, if provided, was ignored without
any warning. Now the code accepts doubles for DCELL output and ints for
CELL output. Numbers not fully parseable as ints are now rejected by an
explicit check. (The min and max values travel through the code as doubles
in both cases.)

With the original implementation, the max was could have been exceeded for CELL
maps due to the equation used. On the other hand, the min might not have been reached
with the similar probability as max could have been exceeded for ranges
which contained zero.

For example, in 7.6.1 using 10,000,0000 cells and
-i (CELL output), parameters min=2 max=15 produce histogram (r.report u=p,c)
with bins 2-15 approximatelly 7.14-15% and bin 16 with 2-4 cells.
Same setup with min=-2 max=13 produces similar bin 14 with 0-5 cells,
bin -2 is 0 or 1 cell, bins -1,1-15 6.25% and bin 0 12.50%.

Now using the method r.mapcalc is using for rand()
implemented in f_rand() in xrand.c which does not suffer from the same issues.
For consistency, using the same implementation also for DCELL,
so now both CELL and DCELL options are now using functions from gis.h:
G_mrand48() and G_drand48(). Consequently, the gmath.h dependency
was removed (header file includes and in the Makefile) together with unused math.h.

Additionally to enforcing the int, a parameter check also enforces that min <= max
since min > max seems like a input error. (This is different from r.mapcalc which
flips the values, but there we assume less control over the input and it is too
late to report a message, so it is appropriate and more practical to be more
permissive.) min == max is allowed and not treated in a special way.

Raster history is now being written as well as a map title which contains
the range with interval written differently for CELL and DCELL as
the CELL contains max while DCELL does not (based on [0,1) from G_drand48()).

Test is provided for the min-max comparison and int-double check.
A larger test provided for the generated values as well, although
even with a larger number of cells the desired result might not achieved.
A fixed GRASS_RANDOM_SEED is provided for stability (but it may not be the
one making a future problem to show, although it does for 7.6.2).
Also the exceeded max is much easier to catch than the not reached min,
but both range and closeness are checked for both CELL and DCELL.
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Oct 26, 2022
…mbers (OSGeo#322)

The original code did not allow for floating point numbers
as min and max parameters. The decimal part, if provided, was ignored without
any warning. Now the code accepts doubles for DCELL output and ints for
CELL output. Numbers not fully parseable as ints are now rejected by an
explicit check. (The min and max values travel through the code as doubles
in both cases.)

With the original implementation, the max was could have been exceeded for CELL
maps due to the equation used. On the other hand, the min might not have been reached
with the similar probability as max could have been exceeded for ranges
which contained zero.

For example, in 7.6.1 using 10,000,0000 cells and
-i (CELL output), parameters min=2 max=15 produce histogram (r.report u=p,c)
with bins 2-15 approximatelly 7.14-15% and bin 16 with 2-4 cells.
Same setup with min=-2 max=13 produces similar bin 14 with 0-5 cells,
bin -2 is 0 or 1 cell, bins -1,1-15 6.25% and bin 0 12.50%.

Now using the method r.mapcalc is using for rand()
implemented in f_rand() in xrand.c which does not suffer from the same issues.
For consistency, using the same implementation also for DCELL,
so now both CELL and DCELL options are now using functions from gis.h:
G_mrand48() and G_drand48(). Consequently, the gmath.h dependency
was removed (header file includes and in the Makefile) together with unused math.h.

Additionally to enforcing the int, a parameter check also enforces that min <= max
since min > max seems like a input error. (This is different from r.mapcalc which
flips the values, but there we assume less control over the input and it is too
late to report a message, so it is appropriate and more practical to be more
permissive.) min == max is allowed and not treated in a special way.

Raster history is now being written as well as a map title which contains
the range with interval written differently for CELL and DCELL as
the CELL contains max while DCELL does not (based on [0,1) from G_drand48()).

Test is provided for the min-max comparison and int-double check.
A larger test provided for the generated values as well, although
even with a larger number of cells the desired result might not achieved.
A fixed GRASS_RANDOM_SEED is provided for stability (but it may not be the
one making a future problem to show, although it does for 7.6.2).
Also the exceeded max is much easier to catch than the not reached min,
but both range and closeness are checked for both CELL and DCELL.
ninsbl pushed a commit to ninsbl/grass that referenced this pull request Feb 17, 2023
…mbers (OSGeo#322)

The original code did not allow for floating point numbers
as min and max parameters. The decimal part, if provided, was ignored without
any warning. Now the code accepts doubles for DCELL output and ints for
CELL output. Numbers not fully parseable as ints are now rejected by an
explicit check. (The min and max values travel through the code as doubles
in both cases.)

With the original implementation, the max was could have been exceeded for CELL
maps due to the equation used. On the other hand, the min might not have been reached
with the similar probability as max could have been exceeded for ranges
which contained zero.

For example, in 7.6.1 using 10,000,0000 cells and
-i (CELL output), parameters min=2 max=15 produce histogram (r.report u=p,c)
with bins 2-15 approximatelly 7.14-15% and bin 16 with 2-4 cells.
Same setup with min=-2 max=13 produces similar bin 14 with 0-5 cells,
bin -2 is 0 or 1 cell, bins -1,1-15 6.25% and bin 0 12.50%.

Now using the method r.mapcalc is using for rand()
implemented in f_rand() in xrand.c which does not suffer from the same issues.
For consistency, using the same implementation also for DCELL,
so now both CELL and DCELL options are now using functions from gis.h:
G_mrand48() and G_drand48(). Consequently, the gmath.h dependency
was removed (header file includes and in the Makefile) together with unused math.h.

Additionally to enforcing the int, a parameter check also enforces that min <= max
since min > max seems like a input error. (This is different from r.mapcalc which
flips the values, but there we assume less control over the input and it is too
late to report a message, so it is appropriate and more practical to be more
permissive.) min == max is allowed and not treated in a special way.

Raster history is now being written as well as a map title which contains
the range with interval written differently for CELL and DCELL as
the CELL contains max while DCELL does not (based on [0,1) from G_drand48()).

Test is provided for the min-max comparison and int-double check.
A larger test provided for the generated values as well, although
even with a larger number of cells the desired result might not achieved.
A fixed GRASS_RANDOM_SEED is provided for stability (but it may not be the
one making a future problem to show, although it does for 7.6.2).
Also the exceeded max is much easier to catch than the not reached min,
but both range and closeness are checked for both CELL and DCELL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants