Rework QueryConstants to add missing values and to remove improper values #1386

chipkent · 2021-10-04T21:25:49Z

QueryConstants contains values that should not exist, such as NEG_INF_BYTE and is missing values such as NaN. These inconsistencies are resolved in this PR.

Resolves #1218 .

rcaudy

I think we should restrict ourselves to just defining nulls. I don't think the code as written makes a great case for defining mins/maxes/infinities.

Util/src/main/java/io/deephaven/util/QueryConstants.java

rcaudy · 2021-10-06T00:55:15Z

DB/src/main/java/io/deephaven/libs/primitives/ShortNumericPrimitives.java

@@ -2229,7 +2229,7 @@ public static int indexOfMax(DbShortArray values) {
            return NULL_INT;
        }

-        short val = NEG_INF_SHORT;
+        short val = MIN_SHORT;


In all cases, the constant you're using could easily be replaced by the built-in one (Short.MIN_VALUE in this case). It being the NULL_SHORT is irrelevant, since all correct routines separately filter nulls before comparing.

This actually is not true. Java screwed up their definitions of MIN_VALUE for floating point types. For floating-point types, MIN_VALUE is defined as the smallest positive value. For integer types, MIN_VALUE is the smallest number that can be held in the type -- a negative number. If this change is made, then the autogeneration for the old, nasty primitives library breaks.

See:
https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#MIN_VALUE
https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#MIN_VALUE

Hi, I'm parachuting into your conversation, hope you don't mind.

First, a rebuttal. Java didn't (solely) screw up... The same convention is followed in C, C++, Java, JavaScript, and Python.
(Rust and Go do something saner).

In other words, these constants are identical in the following languages

C C++ Java JavaScript Python

DBL_MIN std::numeric_limits<double>::min() Double.MIN_VALUE Number.MIN_VALUE PyFloat_GetMin()

I'm very much in favor of uniformity, but these languages have licked the cookie, the horse has left the barn, the ship has sailed, etc. I don't think you want to define a constant that sort of reminds people of "double" and "min" and then change established convention, even though we might agree that established convention sucks. Given that I would make three suggestions:

Introduce entirely new terminology (LOWEST, HIGHEST) and use it everywhere

LOWEST_BYTE, LOWEST_DOUBLE, ... HIGHEST_BYTE, HIGHEST_DOUBLE, ...

(more radical) For maximum completeness you should populate every plausible entry, including MIN_BOOLEAN and MAX_BOOLEAN, MIN_CHAR and MAX_CHAR.

(most radical) The names are getting a little unwieldy anyway. If it were me I might bite the bullet and reorganize the constants into groups like this:

class DHConstants { public static class NullValues { public static final Boolean BOOLEAN = null; // ugh. -1? really? sad public static final char CHAR = Character.MAX_VALUE - 1; public static final short SHORT = Short.MIN_VALUE; // ... etc public static final double DOUBLE = -Double.MAX_VALUE; } public static class Lowest { public static final boolean BOOLEAN = false; public static final char CHAR = Character.MIN_VALUE; public static final short SHORT = Short.MIN_VALUE; // ... etc public static final double DOUBLE = Math.nextAfter(-Double.MAX_VALUE, 1.0); } public static class Highest { public static final boolean BOOLEAN = true; // -2 I guess? this is also sad public static final char CHAR = Character.MAX_VALUE - 2; public static final short SHORT = Short.MAX_VALUE; // ... etc public static final double DOUBLE = Double.MAX_VALUE; } }

Then you would express names like DHConstants.Lowest.DOUBLE , DHConstants.NullValues.CHAR, and you can use import statements to make them a little more brief.

I'm all for (1) and (2). They sound very reasonable.

I'll push back on (3). As proposed this is more typing for an interactive user without any clear benefit.

I'm fine with (1) if you guys like it better. I'm not offended by the current naming, though. Whatever seems cleanest.

I don't like (2) very much. I suppose it makes some sense to define for char, although we haven't needed them thus far. It feels silly to define for Boolean; I guess false is MIN and true is MAX? I don't suppose it hurts anyone if you guys want to add these, but the only place I could see using them is in an implementation of min or max over an array or collection.

I'm not a big fan of (3), for the reasons @chipkent said more gently. Brevity is a virtue here, as long as it doesn't detract from consistency and clarity.

Per the slack discussion, I am not including the MIN/MAX values for boolean or char.

when I started implementing the fix, I really didn't like the LOWEST/HIGHEST naming. If there is pushback, I'll make the change.

I also discovered that the C++ constants (1) included a MIN/MAX for char and (2) didn't include a MIN/MAX for floating points. This is resolved.

chipkent · 2021-10-26T15:14:00Z

There have been a number of sidebar conversations trying to figure out what the most sane way to approach these constants is. These discussions have raised questions about bad historical decisions (e.g. MIN_VALUE in Java and similar languages actually being the min positive value). They have also raised questions about what more modern languages are doing.

Modern Languages

Julia

Julia's strategy is to approach things from a logical, mathematical perspective. The min/max value of a floating point type is actually infinite. Other constants are defined via functions with type arguments -- essentially equivalent to constants in DH.

https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Special-floating-point-values

Defined constants: Inf32 (float), Inf (double), NaN32 (float), NaN(double)
(typemin(Float32), typemax(float32)) = (-Inf32, Inf32)
(typemin(Float64), typemax(float64)) = (-Inf, Inf)
eps(Type)
floatmin(Type) - smallest positive normal number
floatmax(Type) - largest finite number

Go

Go defines constants for min/max values of integers. It makes explicit the max floating point value and the smallest nonzero floating point value. There is no defined min float value; they rely on symmetry in IEE754.

MaxFloat32 / MaxFloat64 - largest finite floats
SmallestNonzeroFloat32 / SmallestNonzeroFloat64 - smallest non zero floats
math.Inf(sign) - 64b inf
math.NaN() - 64b nan.

https://cs.opensource.google/go/go/+/refs/tags/go1.17.2:src/math/const.go

Rust

Rust explicitly defines each constant.

std::f64::MAX - largest finite value
std::f64::MIN - smallest finite value
std::f64:MIN_POSITIVE - smallest positive value
std::f64::INFINITY - positive infinity
std::f64::NEG_INFINITY - negative infinity
std::f64::NAN - nan

https://doc.rust-lang.org/std/f64/constant.MIN_POSITIVE.html

Proposal

Each of the modern languages is taking a somewhat different approach. Julia and Rust are taking approaches that are most consistent with what a numerical developer would do. Julia is clearly indicating that the max values of the float types are infinite. Rust is explicitly indicating the finite min/max values as well as the min positive value. Feedback from the DevRel group indicated that MIN_<TYPE> and MAX_<TYPE> should define the range of the type, as it does in Julia.

This proposal is making a few design decisions:

Constants are defined for non-normal values.
MIN_<TYPE> and MAX_<TYPE> define the range of possible values for the type.
MIN_FINITE_<TYPE> and MAX_FINITE_<TYPE> define the range of possible finite values for the type.
An explicit constant is used for the smallest positive value.

For integer types, these constants will be defined:

NULL_<TYPE> - null value
MIN_<TYPE> - min value
MAX_<TYPE> - max value

For floating point types, these constants will be defined:

NULL_<TYPE> - null value
NAN_<TYPE> - nan value
POS_INFINITY_<TYPE> - positive infinity
NEG_INFINITY_<TYPE> - negative infinity
MIN_<TYPE> = NEG_INFINITY_<TYPE> - min value is negative infinity
MAX_<TYPE> = POS_INFINITY_<TYPE> - max value is positive infinity
MIN_FINITE_<TYPE> - minimum finite value
MAX_FINITE_<TYPE> - maximum finite value
MIN_POSITIVE_<TYPE> - minimum positive value

Questions

Should the name be POS_INFINITY_<TYPE> or INFINITY_<TYPE>, as Rust does it?

chipkent · 2021-10-26T20:48:51Z

As a note, C++11 has tried to resolve the mess with these constants.
https://en.cppreference.com/w/cpp/types/numeric_limits/lowest

The C++ reference has a warning about unexpected behavior for min.
https://en.cppreference.com/w/cpp/types/numeric_limits/min

chipkent added 5 commits October 1, 2021 15:35

Updated query constants

8985633

Updated query constant dependencies

b283204

Document QueryConstants

dcd19d0

Reorder QueryConstants by type

79641c6

Fixed buggy code translations

003b9c2

chipkent added DocumentationNeeded breaking labels Oct 4, 2021

chipkent added this to the Oct 2021 milestone Oct 4, 2021

chipkent requested a review from rcaudy October 4, 2021 21:25

chipkent self-assigned this Oct 4, 2021

chipkent and others added 2 commits October 4, 2021 15:29

Suppress unused warnings

5cd636d

Formatted with spotless.

1131a9d

rcaudy reviewed Oct 6, 2021

View reviewed changes

chipkent added 2 commits October 22, 2021 11:43

Use Math.nextUp for the min floating point values

7706e62

Made the C++ constants consistent with the java constants

42756e6

chipkent requested review from kosak and rcaudy October 22, 2021 21:18

Set constants based upon the latest code review discussions

dd9e015

rcaudy approved these changes Oct 26, 2021

View reviewed changes

kosak approved these changes Oct 26, 2021

View reviewed changes

chipkent merged commit 6bc029b into deephaven:main Oct 26, 2021

github-actions bot locked and limited conversation to collaborators Oct 26, 2021

chipkent deleted the 1218-delete_or_rename_constants branch December 7, 2021 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework QueryConstants to add missing values and to remove improper values #1386

Rework QueryConstants to add missing values and to remove improper values #1386

chipkent commented Oct 4, 2021

rcaudy left a comment

rcaudy Oct 6, 2021

chipkent Oct 7, 2021

chipkent Oct 7, 2021

kosak Oct 7, 2021

chipkent Oct 12, 2021

rcaudy Oct 14, 2021 •

edited

Loading

chipkent Oct 22, 2021

chipkent commented Oct 26, 2021

chipkent commented Oct 26, 2021

Rework QueryConstants to add missing values and to remove improper values #1386

Rework QueryConstants to add missing values and to remove improper values #1386

Conversation

chipkent commented Oct 4, 2021

rcaudy left a comment

Choose a reason for hiding this comment

rcaudy Oct 6, 2021

Choose a reason for hiding this comment

chipkent Oct 7, 2021

Choose a reason for hiding this comment

chipkent Oct 7, 2021

Choose a reason for hiding this comment

kosak Oct 7, 2021

Choose a reason for hiding this comment

chipkent Oct 12, 2021

Choose a reason for hiding this comment

rcaudy Oct 14, 2021 • edited Loading

Choose a reason for hiding this comment

chipkent Oct 22, 2021

Choose a reason for hiding this comment

chipkent commented Oct 26, 2021

Modern Languages

Julia

Go

Rust

Proposal

Questions

chipkent commented Oct 26, 2021

rcaudy Oct 14, 2021 •

edited

Loading