Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework QueryConstants to add missing values and to remove improper values #1386

Merged
merged 10 commits into from
Oct 26, 2021

Conversation

chipkent
Copy link
Member

@chipkent chipkent commented Oct 4, 2021

QueryConstants contains values that should not exist, such as NEG_INF_BYTE and is missing values such as NaN. These inconsistencies are resolved in this PR.

Resolves #1218 .

@chipkent chipkent added this to the Oct 2021 milestone Oct 4, 2021
@chipkent chipkent requested a review from rcaudy October 4, 2021 21:25
@chipkent chipkent self-assigned this Oct 4, 2021
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should restrict ourselves to just defining nulls. I don't think the code as written makes a great case for defining mins/maxes/infinities.

Util/src/main/java/io/deephaven/util/QueryConstants.java Outdated Show resolved Hide resolved
Util/src/main/java/io/deephaven/util/QueryConstants.java Outdated Show resolved Hide resolved
@@ -2229,7 +2229,7 @@ public static int indexOfMax(DbShortArray values) {
return NULL_INT;
}

short val = NEG_INF_SHORT;
short val = MIN_SHORT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all cases, the constant you're using could easily be replaced by the built-in one (Short.MIN_VALUE in this case). It being the NULL_SHORT is irrelevant, since all correct routines separately filter nulls before comparing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually is not true. Java screwed up their definitions of MIN_VALUE for floating point types. For floating-point types, MIN_VALUE is defined as the smallest positive value. For integer types, MIN_VALUE is the smallest number that can be held in the type -- a negative number. If this change is made, then the autogeneration for the old, nasty primitives library breaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm parachuting into your conversation, hope you don't mind.

First, a rebuttal. Java didn't (solely) screw up... The same convention is followed in C, C++, Java, JavaScript, and Python.
(Rust and Go do something saner).

In other words, these constants are identical in the following languages

C C++ Java JavaScript Python
DBL_MIN std::numeric_limits<double>::min() Double.MIN_VALUE Number.MIN_VALUE PyFloat_GetMin()

I'm very much in favor of uniformity, but these languages have licked the cookie, the horse has left the barn, the ship has sailed, etc. I don't think you want to define a constant that sort of reminds people of "double" and "min" and then change established convention, even though we might agree that established convention sucks. Given that I would make three suggestions:

  1. Introduce entirely new terminology (LOWEST, HIGHEST) and use it everywhere
LOWEST_BYTE, LOWEST_DOUBLE, ...
HIGHEST_BYTE, HIGHEST_DOUBLE, ...
  1. (more radical) For maximum completeness you should populate every plausible entry, including MIN_BOOLEAN and MAX_BOOLEAN, MIN_CHAR and MAX_CHAR.

  2. (most radical) The names are getting a little unwieldy anyway. If it were me I might bite the bullet and reorganize the constants into groups like this:

class DHConstants {
	public static class NullValues {
		public static final Boolean BOOLEAN = null;
                // ugh. -1?  really? sad
		public static final char CHAR = Character.MAX_VALUE - 1;
		public static final short SHORT = Short.MIN_VALUE;
		// ... etc
		public static final double DOUBLE = -Double.MAX_VALUE;
	}
	public static class Lowest {
		public static final boolean BOOLEAN = false;
		public static final char CHAR = Character.MIN_VALUE;
		public static final short SHORT = Short.MIN_VALUE;
		// ... etc
		public static final double DOUBLE = Math.nextAfter(-Double.MAX_VALUE, 1.0);
	}
        public static class Highest {
		public static final boolean BOOLEAN = true;
                // -2 I guess? this is also sad
		public static final char CHAR = Character.MAX_VALUE - 2; 
		public static final short SHORT = Short.MAX_VALUE;
		// ... etc
		public static final double DOUBLE = Double.MAX_VALUE;
	}
}

Then you would express names like DHConstants.Lowest.DOUBLE , DHConstants.NullValues.CHAR, and you can use import statements to make them a little more brief.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for (1) and (2). They sound very reasonable.

I'll push back on (3). As proposed this is more typing for an interactive user without any clear benefit.

Copy link
Member

@rcaudy rcaudy Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with (1) if you guys like it better. I'm not offended by the current naming, though. Whatever seems cleanest.

I don't like (2) very much. I suppose it makes some sense to define for char, although we haven't needed them thus far. It feels silly to define for Boolean; I guess false is MIN and true is MAX? I don't suppose it hurts anyone if you guys want to add these, but the only place I could see using them is in an implementation of min or max over an array or collection.

I'm not a big fan of (3), for the reasons @chipkent said more gently. Brevity is a virtue here, as long as it doesn't detract from consistency and clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the slack discussion, I am not including the MIN/MAX values for boolean or char.

when I started implementing the fix, I really didn't like the LOWEST/HIGHEST naming. If there is pushback, I'll make the change.

I also discovered that the C++ constants (1) included a MIN/MAX for char and (2) didn't include a MIN/MAX for floating points. This is resolved.

@chipkent chipkent requested review from kosak and rcaudy October 22, 2021 21:18
@chipkent
Copy link
Member Author

There have been a number of sidebar conversations trying to figure out what the most sane way to approach these constants is. These discussions have raised questions about bad historical decisions (e.g. MIN_VALUE in Java and similar languages actually being the min positive value). They have also raised questions about what more modern languages are doing.

Modern Languages

Julia

Julia's strategy is to approach things from a logical, mathematical perspective. The min/max value of a floating point type is actually infinite. Other constants are defined via functions with type arguments -- essentially equivalent to constants in DH.

https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Special-floating-point-values

  • Defined constants: Inf32 (float), Inf (double), NaN32 (float), NaN(double)
  • (typemin(Float32), typemax(float32)) = (-Inf32, Inf32)
  • (typemin(Float64), typemax(float64)) = (-Inf, Inf)
  • eps(Type)
  • floatmin(Type) - smallest positive normal number
  • floatmax(Type) - largest finite number

Go

Go defines constants for min/max values of integers. It makes explicit the max floating point value and the smallest nonzero floating point value. There is no defined min float value; they rely on symmetry in IEE754.

  • MaxFloat32 / MaxFloat64 - largest finite floats
  • SmallestNonzeroFloat32 / SmallestNonzeroFloat64 - smallest non zero floats
  • math.Inf(sign) - 64b inf
  • math.NaN() - 64b nan.

https://cs.opensource.google/go/go/+/refs/tags/go1.17.2:src/math/const.go

Rust

Rust explicitly defines each constant.

  • std::f64::MAX - largest finite value
  • std::f64::MIN - smallest finite value
  • std::f64:MIN_POSITIVE - smallest positive value
  • std::f64::INFINITY - positive infinity
  • std::f64::NEG_INFINITY - negative infinity
  • std::f64::NAN - nan

https://doc.rust-lang.org/std/f64/constant.MIN_POSITIVE.html

Proposal

Each of the modern languages is taking a somewhat different approach. Julia and Rust are taking approaches that are most consistent with what a numerical developer would do. Julia is clearly indicating that the max values of the float types are infinite. Rust is explicitly indicating the finite min/max values as well as the min positive value. Feedback from the DevRel group indicated that MIN_<TYPE> and MAX_<TYPE> should define the range of the type, as it does in Julia.

This proposal is making a few design decisions:

  1. Constants are defined for non-normal values.
  2. MIN_<TYPE> and MAX_<TYPE> define the range of possible values for the type.
  3. MIN_FINITE_<TYPE> and MAX_FINITE_<TYPE> define the range of possible finite values for the type.
  4. An explicit constant is used for the smallest positive value.

For integer types, these constants will be defined:

  • NULL_<TYPE> - null value
  • MIN_<TYPE> - min value
  • MAX_<TYPE> - max value

For floating point types, these constants will be defined:

  • NULL_<TYPE> - null value
  • NAN_<TYPE> - nan value
  • POS_INFINITY_<TYPE> - positive infinity
  • NEG_INFINITY_<TYPE> - negative infinity
  • MIN_<TYPE> = NEG_INFINITY_<TYPE> - min value is negative infinity
  • MAX_<TYPE> = POS_INFINITY_<TYPE> - max value is positive infinity
  • MIN_FINITE_<TYPE> - minimum finite value
  • MAX_FINITE_<TYPE> - maximum finite value
  • MIN_POSITIVE_<TYPE> - minimum positive value

Questions

  1. Should the name be POS_INFINITY_<TYPE> or INFINITY_<TYPE>, as Rust does it?

@chipkent
Copy link
Member Author

As a note, C++11 has tried to resolve the mess with these constants.
https://en.cppreference.com/w/cpp/types/numeric_limits/lowest

The C++ reference has a warning about unexpected behavior for min.
https://en.cppreference.com/w/cpp/types/numeric_limits/min

@chipkent chipkent merged commit 6bc029b into deephaven:main Oct 26, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Oct 26, 2021
@chipkent chipkent deleted the 1218-delete_or_rename_constants branch December 7, 2021 21:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QueryConstants: Delete or rename spurious constants
3 participants