Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1365: add integral number codecs, tests #1367

Merged
merged 25 commits into from
Sep 3, 2024

Conversation

tatu-at-datastax
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax commented Aug 27, 2024

What this PR does:

Changes table-shredding to convert JSON values (filter, insert) to use 3 types -- Long, BigInteger, BigDecimal -- instead of just BigDecimal, for less wasteful conversions (BigDecimal use for integral numbers is significant overhead).

Adds more CQL codecs to convert to/from CQL integral types; tests (Unit and Integration).

Which issue(s) this PR fixes:
Fixes #1365

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@tatu-at-datastax tatu-at-datastax self-assigned this Aug 27, 2024
@tatu-at-datastax tatu-at-datastax changed the title Fixes #1365: add integral number codecs, tests (WIP) Fixes #1365: add integral number codecs, tests Aug 27, 2024
/** Filter to use any JSON number against a table column. */
public class NumberTableFilter extends NativeTypeTableFilter<BigDecimal> {
public class NumberTableFilter extends NativeTypeTableFilter<Number> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change now that we can get values in 3 distinct types (Long, BigInteger, BigDecimal).

@tatu-at-datastax
Copy link
Contributor Author

Getting there:

  • Codecs added for all 5 integral CQL types, from all 3 JSON number types (Long, BigInteger, BigDecimal
  • Range checks working EXCEPT for Long case (no ready-made method to use, need to implement)
  • No dedicated Integration Tests yet, minor adds to "findOne" for tables but not sufficient.

@tatu-at-datastax tatu-at-datastax marked this pull request as ready for review August 29, 2024 18:25
@tatu-at-datastax tatu-at-datastax requested a review from a team as a code owner August 29, 2024 18:25
@tatu-at-datastax tatu-at-datastax changed the title (WIP) Fixes #1365: add integral number codecs, tests Fixes #1365: add integral number codecs, tests Aug 29, 2024
@tatu-at-datastax
Copy link
Contributor Author

Bounds-/range-check added for smallint/tinyint/int conversion from Long; only missing insertOne integration tests. Ready to start review.

* Helper class used for constructing and sending commands to the Data API for testing purposes
* (usually integration tests).
*/
public abstract class DataApiCommandSenderBase<T extends DataApiCommandSenderBase> {
Copy link
Contributor Author

@tatu-at-datastax tatu-at-datastax Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far 2 implementations (namespace/keyspace commands, table/collection commans); not sure if Table/Collection should be separated, probably not. Self-recursive type to allow type-safe "this" return for chaining.
Share most of configuration.

@tatu-at-datastax
Copy link
Contributor Author

And now with more-compact-than-before abstractions for more efficient (wrt reading, writing) IT writing. Ready for constructive feedback.

}

static void throwOverflow(DataType targetCQLType, Number value) {
throw new ArithmeticException(String.format("Overflow, value too big for %s", targetCQLType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we include the value in the message (and for the underflow)

Also nice attention to detail on overflow and underflow - I checked Math.toIntExact() and it says any loss of information is an overflow.

Copy link
Contributor Author

@tatu-at-datastax tatu-at-datastax Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- I was doing that, but realized that safeNumber() that catches ArithmeticException and re-wraps as ToCQLCodecException adds the actual value.
This is why I chose to avoid adding it in ArithmeticException itself (since message is included as-is and value then duplicated).

On underflow: yeah I must admit I googled for definitions since I recall "overflow" sometimes being used to mean both, too. :)

JSONCodec.ToCQL.safeNumber(JSONCodec.ToCQL::safeLongToInt),
JSONCodec.ToJSON.unsafeNodeFactory(JsonNodeFactory.instance::numberNode));

public static final JSONCodec<BigDecimal, Short> SMALLINT_FROM_BIG_DECIMAL =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct that we have to smallint from BigDecimal - i thought all integral would come from BigInteger ?

Copy link
Contributor Author

@tatu-at-datastax tatu-at-datastax Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed in cases where JSON has, say, 1.0 and I think we want to allow use like that. Jackson bases type of textual representation of JSON Number so if there's decimal point, it's FP, even if fractional part is all zeroes.

I am assuming we want to allow this coercion (existing code, when only using BigDecimal, did allow this): if not, could drop this converter.

// BigDecimal JsonNode -- but converting CQL integer values into long-valued JsonNode.
// I think our internal handling can deal with Integer and Long valued JsonNodes and this avoids
// some of BigDecimal overhead (avoids conversion overhead, serialization is faster).
private static final JSONCodec<BigInteger, Long> BIGINT_FROM_BIG_INTEGER =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what we are doing with integral values from Jackson - question is what need to support both BigInteger and Long source values ? if it is needed can we add a comment at top to explain strategy .

And that the type mappings here must sync with how we pull values from Jackson.

Copy link
Contributor Author

@tatu-at-datastax tatu-at-datastax Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, with FPs, cost of processing BigDecimal seems to be roughly similar to Double.
But cost of processing BigInteger is significantly higher than that of boxed primitive integral types. So I figured it is a compromise to try to avoid huge number of converters by slight addition -- I think minimally we could go with BigDecimal and BigInteger but prefer adding Long in between (but nothing else, things do get unwieldy).

I'll add a comment here.

100% on mentioning that handling with RowShredder must be kept in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some class Javadoc comments in RowShredder and JSONCodec, hope that suffices.

@@ -104,7 +103,7 @@ private static List<DBFilterBase> findDynamic(CaptureExpression captureExpressio
captureExpression.path(),
NativeTypeTableFilter.Operator.from(
(ValueComparisonOperator) filterOperation.operator()),
(BigDecimal) rhsValue));
(Number) rhsValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note - the changes yuqi and I are doing will revert how the resolver functions here are called back to the original approach where it gets called once with everything that matched. Currently it is getting called for every match.

Nothing to change, just a note that when we do that we should look at how to reduce the use of instanceof and casting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you: that sounds like a good improvement.

case STRING -> value.textValue();
case BOOLEAN -> value.booleanValue();
case NULL -> null;
default -> throw new RuntimeException("Unsupported type");
};
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this class is not finished / high quality - can you add a quick comment to say the different types for numbers need to match the codecs in JSONCodecRegistry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added something in class javadoc, but will also add it right here too.

@@ -16,7 +17,7 @@

public class JSONCodecRegistryTest {

private final JSONCodecRegistryTestData TEST_DATA = new JSONCodecRegistryTestData();
private static final JSONCodecRegistryTestData TEST_DATA = new JSONCodecRegistryTestData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a pattern, i had been keeping the "TestData" class as instance but I guess given how tests work static makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is unfortunate, but it looks like ParameterizedTest requires source method (I tried and non-static ones do not seem to work? LMK if you know of a way) to be static.
I'd prefer non-static as well so if there's a way, that'd be better.

Arguments.of(DataTypes.INT, BigDecimal.valueOf(100), 100) // second 100 is an int
);
// Integer types:
Arguments.of(DataTypes.BIGINT, -456L, -456L),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider rename to something like validCodecToCQL.... now you have out of range below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point, changing

Arguments.of(DataTypes.VARINT, -39999L, BigInteger.valueOf(-39999)),
Arguments.of(DataTypes.VARINT, BigInteger.valueOf(1), BigInteger.valueOf(1)),
Arguments.of(
DataTypes.VARINT, BigDecimal.valueOf(123456789.0), BigInteger.valueOf(123456789)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not checked that the list here and below is exhaustive, let me know if you want it checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's not exhaustive in many ways (i.e. does not try different ranges, values), but does try to cover permutations of 3 input "plain" Java types (BigDecimal, BigInteger, Long) x CQL integer types.

So if you can sanity check, that'd be great.

Copy link
Contributor

@amorton amorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally like where it is going, just small comments on nit picks and making some things clearer.

return new DataApiNamespaceCommandSender(namespace);
}

public static DataApiTableCommandSender tableCommand(String namespace, String tableName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idea - if we name these assertTableCommand, then do static imports in tests, and make the other name change below it could look like this in a test:

assertTableCommand(namespace, table)
    .findOne({})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, will change method name.

Could change class name too, to DataApiCommandAssertions too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only downside is that these look odd as set up components (creating table, inserting rows, that are not part of test).
Will change, we can re-think if need be.

Copy link
Contributor

@amorton amorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@tatu-at-datastax tatu-at-datastax merged commit 869d6af into main Sep 3, 2024
3 checks passed
@tatu-at-datastax tatu-at-datastax deleted the tatu/1365-codec-number-int branch September 3, 2024 17:58
Copy link
Collaborator

@vkarpov15 vkarpov15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data API, JsonCodec testing: integral numeric values (bigint, smallint, int, tinyint, varint)
3 participants