-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of a mask to stdlib_experimental_stats function mean
#128
Comments
That looks good to me. It would appear as "optional" due to the interface to the users, so I think that's what matters, and the above is just an implementation detail how to actually implement it. |
A larger philosophical question is whether we have to implement |
Why would you not implement it on everey function? Also, from rank 3 to rank 7 or 15, it would auto-generated. |
The only downside that I can see is that it will be a lot of functions. But I think we want to support the "mask" argument. The other approach would be that somehow the user would run the mask on the array before passing it to |
I think it would be good to have the same API as |
I agree. |
Excellent proposal. I only have a problem with the following statement:
Is this not dangerous? The user might think that zero is a legitimate mean value. Can't we return (a IEEE?) |
Good point. The intrinsic
So, for the real(dp)::res
if (.not.optval(mask, .true.) then
res = res /0._dp
return
end if Note that GFortran does not allow expressions like |
I would be careful with the nans. I usually turn them off in my production codes. Is there any current intrinsic function in Fortran that can return nan? I think returning zero would work. Stopping the program with an error would also work, but the issue is that then |
I think this resonates in some way with the discussion had about Still, I think mathematically stands that although the sum of no values is zero, the mean of no values is undefined. |
Having had to deal with Nan in a lot of programming environments, the most reliable way I found may seem surprising, but is function nan64(value)
use,intrinsic :: iso_fortran_env, only: real64
implicit none
character(len=*),parameter::ident="@(#)M_units:: nan64(3fp): Returns a NAN (Not a number) of type real64"
character(len=3),save :: STRING='NaN'
real(kind=real64) :: nan64,value
read(STRING,*)nan64
end function nan64 which is one part of a generic nan(value) where the returned value is the type of value that could be auto-generated. Note the string cannot be a parameter, which seems non-intuitive; but internal reads cannot read from a constant. The rest of the generic definition is quite repetitive but (I think) obvious. It's a long story, but a lot of "standard" ways of defining a NaN are subject to whether the compiler supports the IEEE standard, whether optimizations change the definition or test (I have an is_nan function too), and so on. So far, reading the string "NaN" has proved portable and is supported by the Fortran standard as far back as I could find (if anyone knows when the earliest support by the standard was, feel free to add it here). |
@urbanjost interesting. A few questions: what's the purpose of the |
I totally agree with @epagone statement, and I should have thought about it sooner: the Regarding our implementation of module function mean_2_mask_sp_sp(x, dim, mask) result(res)
real(sp), intent(in) :: x(:,:)
logical, intent(in) :: mask(:,:)
integer, intent(in) :: dim
real(sp) :: res(merge(size(x, 1), size(x, 2), mask = 1 < dim ))
select case(dim)
case(1)
res = sum(x, 1, mask) / real(count(mask, 1), sp)
case(2)
res = sum(x, 2, mask) / real(count(mask, 2), sp)
case default
call error_stop("ERROR (mean): wrong dimension")
end select
end function mean_2_mask_sp_sp This implementation may return an array If we aim to return module function mean_2_all_sp_sp(x, mask) result(res)
real(sp), intent(in) :: x(:,:)
logical, intent(in), optional :: mask
real(sp) :: res
if (.not.optval(mask, .true.)) then
res = ??? 0._sp / value / @urbanjost NaN????
return
end if
res = sum(x) / real(size(x, kind = int64), sp)
end function mean_2_all_sp_sp then, for consistency across all Probably, IMHO, the best way to be consistent with module function mean_2_all_sp_sp(x, mask) result(res)
real(sp), intent(in) :: x(:,:)
logical, intent(in), optional :: mask
real(sp) :: res
if (.not.optval(mask, .true.)) then
res = sum(x, .false.) / count(.false.)
return
end if
res = sum(x) / real(size(x, kind = int64), sp)
end function mean_2_all_sp_sp With such an implementation, |
The intrinsic function
I suggest to avoid to return 0 (or any other value, or @urbanjost NaN), at least for consistenty with other functions of |
FYI: Regarding questions on nan64(3f) function: I extracted the function to generate a NaN from a generic function, where VALUE was used merely to select the kind of the NaN. The bit pattern of NaN is dependent on the KIND, the unused value was just used to determine kind. The IDENT variable is a whole other discussion, not really pertinent to the Nan function per-se, The IDENT is used like a C #ident directive. I (still) use metadata like that to automatically build indexes of .code. This particular syntax is often optimized away as an unused variable but is still useful in the source. I use a preprocessor that handles some of the issues with such data to keep it in the binaries. Didn't mean to cause confusion; the basic idea is the most reliable way to generate a NaN that I have found is to do an internal read of the string 'NaN'. The IDENT stuff is very useful for QA and configuration control and automated documentation but is not well supported by Fortran. Many systems even include a place for metadata in executables but it does not seem to be used a lot anymore; but I need it for a variety of reasons. I would ask for support of metadata strings as an extension to Fortran but (surprisingly to me) even #ident is not well supported in C. Lets me automatically build an index like
I have my own what(1) command. It used to be nearly ubiquitous on Unix systems but came with SCCS, which has largely been superseded by mg, git, ... Anyway, ignore the IDENT string, with the exception that maybe there should be some kind of discussion of metadata and Fortran. The nan(3f) and is_nan(3f) functions are in M_units.f90 in my GPF github collection |
@urbanjost I see, thanks for the background. @jvdp1 I think it's probably fine to return NaN, as I make them raise a runtime exception if a NaN is used in my production codes. |
In my view, the only reasonable value to return in this case is a nan. It's also behavior consistent with numpy. If intrinsic modules like |
@milancurcic With use ieee_arithmetic, only: ieee_value, ieee_quiet_nan
implicit none
real::res
res = ieee_value(res, ieee_quiet_nan) This implementation is also not kind-dependent. The only issue with IEEE is that it may not supported by all compilers since it is Fortran 2003. However, this is not a problem with preprocessing (there is already a flag for Fortran 90 in the So, would be a solution like the following one acceptable for #:if VERSION90
res = 0._sp
res = res / res
#:else
res = ieee_value(res, ieee_quiet_nan)
#:endif The IEEE NaN would be returned if the compiler support |
I've used the transfer method for defining NaN for while now. Is there any reason why this is a bad idea? real(sp) :: NaN = transfer(Z'7FF80000', 1.0)
real(dp) :: NaN64 = transfer(Z'7FF0000000000001', 1._dp) |
It is probably a way to implement that in a module From Intel REAL(4), PARAMETER :: FOR_S_SNAN = TRANSFER((/ Z'7FA00000' /),1.0_4)
REAL(4), PARAMETER :: FOR_S_QNAN = TRANSFER((/ Z'7FC00000' /),1.0_4) |
I see nothing wrong with using real, parameter :: nan = b'11111111110000000000000000000000'
real, parameter :: neginf = b'11111111100000000000000000000000'
real, parameter :: posinf = b'01111111100000000000000000000000' However, if |
So, it seems that everybody agrees that Currently, and until (if) res = ieee_value(res, ieee_quiet_nan) Both |
I think it's reasonable to expect F2003 features to work. IEEE arithmetic features seem to be supported by most compilers, their recent versions that is, see table on page 12: https://www.fortranplus.co.uk/app/download/30202489/fortran_2003_2008_2018_compiler_support.pdf |
So are we just going to assume all real kinds on all target compilers and platforms are IEEE compliant? Or at least have a concept of "nan" that resembles IEEE's? This is relevant to #118 as well. |
I would simply test it out (see #15). And if it works everywhere, then use it. Note that "inf" are disabled if "-ffast-math" is used, which is the default in my production codes. But otherwise stdlib can use them I think, just like sqrt(-1.) returns a NaN. I tested this simple code: real :: a
a = -3
print *, sqrt(a)
end with:
and it prints:
So I think the NaN might work even with
and then the code above gives an exception:
That way any such invalid operation such as sqrt of negative numbers is caught right when it happens with a nice stacktrace. So I suggest we use NaN in the same way as a way to trigger an exception, assuming it could be done. |
So this code: program test_none
use ieee_arithmetic, only: ieee_quiet_nan, ieee_value
implicit none
real :: nan
real :: a
nan = ieee_value(nan, ieee_quiet_nan)
a = -3
print *, a, nan
a = nan
print *, a, nan
end triggers the exception at the
So we would not be able to actually assign it to a |
I then tried the following code: program test_none
implicit none
real, parameter :: NaN = transfer(Z'7FF80000', 1.0)
real :: a
a = -3
print *, a, NaN
a = NaN
print *, a, NaN
a = a + 5
print *, a, NaN
end and that unfortunately will not trigger any exception:
I don't know how that works, but somehow |
Conclusion: from the above analysis, I would recommend the |
I had a look in GCC res = -1
res = srqt(res) So I guess it could be also a way to generate ....
elemental real(kind=4) function IEEE_VALUE_4(X, CLASS) result(res)
real(kind=4), intent(in) :: X
type(IEEE_CLASS_TYPE), intent(in) :: CLASS
logical flag
select case (CLASS%hidden)
case (1) ! IEEE_SIGNALING_NAN
if (ieee_support_halting(ieee_invalid)) then
call ieee_get_halting_mode(ieee_invalid, flag)
call ieee_set_halting_mode(ieee_invalid, .false.)
end if
res = -1
res = sqrt(res)
if (ieee_support_halting(ieee_invalid)) then
call ieee_set_halting_mode(ieee_invalid, flag)
end if
case (2) ! IEEE_QUIET_NAN
if (ieee_support_halting(ieee_invalid)) then
call ieee_get_halting_mode(ieee_invalid, flag)
call ieee_set_halting_mode(ieee_invalid, .false.)
end if
res = -1
res = sqrt(res)
if (ieee_support_halting(ieee_invalid)) then
call ieee_set_halting_mode(ieee_invalid, flag)
end if
... |
Regarding the discussion on the return value for zero-size arrays or masked arrays with no true entries: the behavior for maxval and minval is to return -huge(x). I think this is preferable to NaN because it can be tested for with non-IEEE floating point types. For instance, I know IBM's real(16) is non-IEEE. |
When I decide to go down this route, in my codes I usually introduce a |
I don't agree it is preferable to return -huge(x) (or something different than To avoid the dependency on IEEE res = ieee_value(res, ieee_quiet_nan) by res = -1
res = sqrt(res) (as implemented in or res = 0._dp
res = sum(x, .false.) / res ? Using |
I would suggest we provide a function |
Current
(Related to #113)
The current spec for
stdlib
mean
is:mean
- mean of array elementsDescription
Returns the mean of all the elements of
array
, or of the elements ofarray
along dimensiondim
if provided.Syntax
result = mean(array)
result = mean(array, dim)
Arguments
array
: Shall be an array of typeinteger
, orreal
.dim
: Shall be a scalar of typeinteger
with a value in the range from 1 to n, where n is the rank ofarray
.Return value
If
array
is of typereal
, the result is of the same type asarray
.If
array
is of typeinteger
, the result is of typedouble precision
.If
dim
is absent, a scalar with the mean of all elements inarray
is returned. Otherwise, an array of rank n-1, where n equals the rank ofarray
, and a shape similar to that ofarray
with dimensiondim
dropped is returned.Proposal
I would like to propose to add
mask
into the API ofmean
as follows (similar to the intrinsicsum
):mean
- mean of array elementsDescription
Returns the mean of all the elements of
array
, or of the elements ofarray
along dimensiondim
if provided, and if the corresponding element inmask
istrue
.Syntax
result = mean(array [, mask])
result = mean(array, dim [, mask])
Arguments
array
: Shall be an array of typeinteger
, orreal
.dim
: Shall be a scalar of typeinteger
with a value in the range from 1 to n, where n is the rank ofarray
.mask
(optional): Shall be of typelogical
and either be a scalar or an array of the same shape asarray
.Return value
If
array
is of typereal
, the result is of the same type asarray
.If
array
is of typeinteger
, the result is of typedouble precision
.If
dim
is absent, a scalar with the mean of all elements inarray
is returned. Otherwise, an array of rank n-1, where n equals the rank ofarray
, and a shape similar to that ofarray
with dimensiondim
dropped is returned.If
mask
is specified, the result is the mean of all elements ofarray
corresponding totrue
elements ofmask
. If every element ofmask
isfalse
, the result is zero.This definition of
mask
formean
agrees with the definition ofmask
of the intrinsicsum
in the Fortran Standard draft:p. 429
:and
p. 9
:Implementation
Each function for
mean
, e.g.,should then become:
I kown there was some discussions about
dim
and if it should be considered as optional or not in the spec. I guess the same discussion could appear here sincemask
is notoptional
in one of the 2 functions. However, it will behave likeoptional
for the users through the interfacemean
.The text was updated successfully, but these errors were encountered: