Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bitwise shift operations #12888

Merged
merged 1 commit into from
Jun 13, 2019
Merged

Add bitwise shift operations #12888

merged 1 commit into from
Jun 13, 2019

Conversation

gastonar
Copy link
Contributor

@gastonar gastonar commented May 31, 2019

Allows bitwise shift operations including arithmetic shift right, logical shift right, and shift left.

@jessesleeping
Copy link
Contributor

This is a duplicate PR (original one: #12886). I suggest we close this one and use the original one as we already had review comments there.

@tdcmeehan
Copy link
Contributor

This is a duplicate PR (original one: #12886). I suggest we close this one and use the original one as we already had review comments there.

I recommended that we do that because there was a tricky rebase issue. The other one is closed and we can address that comment here.

@tdcmeehan
Copy link
Contributor

Other functions in MathFunctions also implement other overloads for numeric types with smaller ranges. This is because the engine will automatically widen smaller types into larger ones. For example, without an overload, the engine will happily expand the function into BIGINT if the underlying data is TINYINT. I think we can add some quick overloads for smaller numeric types so this doesn't happen.

Now, all of these functions will take in Java type of long, which means after shifting it's possible that the value exceeds the range of the type. But luckily we already specify the bits we want shifted, so we could adjust the max range to match the type.

@gastonar gastonar changed the title Added bitwise shift operations Add bitwise shift operations Jun 4, 2019
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides directly using the hex format for readability. @jessesleeping @rongrong thoughts?

Copy link
Contributor

@jessesleeping jessesleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Otherwise looks good.

}

if (bits <= 1 || bits > 64) {
throw new PrestoException(INVALID_FUNCTION_ARGUMENT, "Bits specified in bit_count must be between 2 and 64, got " + bits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message use wrong function name (bit_count). Please fix it.

One question, why do we want to block the shifting call when bits == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous bitwise functions, specifically bit_count, the input wasn't allowed to have a single bit, so I kept it like that for consistency. I can change it to simply being bits < 1 though.

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia seems to suggest that logical shift is the one that doesn't care about sign, and arithmetic shift is the one that does:
https://en.wikipedia.org/wiki/Logical_shift
https://en.wikipedia.org/wiki/Arithmetic_shift

Is there an open issue for this? If not, should we open one to discuss about the name and parameters?

@@ -28,4 +28,27 @@ Bitwise Functions

Returns the bitwise XOR of ``x`` and ``y`` in 2's complement representation.

.. function:: bitwise_shift_left_logical(x, shift, bits) -> bigint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally found the bits parameter really confusing. If x is a smallint, am I supposed to know how many bits a smallint should have and provide the right value for bits to get a correct result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bits convention is already a established with bit_count: https://prestodb.github.io/docs/current/functions/bitwise.html

I think, while confusing, it should be consistent; or, we deprecate the bits convention. But for now I'd stick with consistency.

@rongrong
Copy link
Contributor

rongrong commented Jun 6, 2019

Personally I'd like the names to be logical_shift_left, logical_shift_right, arithmetic_shift_left, arithmetic_shift_right. "bitwise" should be implicit since it's a shift operation. I don't think it's absolutely necessary, but we can add it to the beginning if people feel that's better.

Shift left logical operation on ``x`` (treated as ``bits``-bit signed integer)
shifted by ``shift`` in 2's complement representation.

SELECT bitwise_shift_left_logical(7, 2, 4); -- 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 is a 5-bit signed number. This doesn't make sense to me. if 12 is a 4-bit signed number, then we cannot represent a 64-bit signed number with bigint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not necessarily that operation doesn't make sense, maybe just the description doesn't make sense).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, for logical left shift, you probably will always fill 0, so bits is how many bits to keep after shift (where to truncate). So describing the number as n-bit signed just adds confusion.

For logical right shift, what you want to describe is what to fill with (0 or 1), so maybe defining the sign-bit makes sense. Though I think it makes more sense to say how many bits I want to operate on, and whether I'd fill it with 0 or 1 (logical_right_shift(number, shift, bits, 0/1)) Describing logical shift with signs is just very confusing to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If right shift is also going to fill with 0, then we should not describe the function using "signed number" at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition, logical right shift will always fill in 0s, so it does not preserve the sign of the input. Meanwhile, arithmetic right shift will fill with the MSB of the input. This operation will preserve the sign of the input. Therefore, I think we should just stick to these default conventions. I feel like adding the 0/1 field as a 4th input would just make it more confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did remove the "signed number" part in the description though, I can see how that can be confusing.

@rongrong
Copy link
Contributor

rongrong commented Jun 6, 2019

Related to #4028

@tdcmeehan
Copy link
Contributor

tdcmeehan commented Jun 6, 2019

Personally I'd like the names to be logical_shift_left, logical_shift_right, arithmetic_shift_left, arithmetic_shift_right. "bitwise" should be implicit since it's a shift operation. I don't think it's absolutely necessary, but we can add it to the beginning if people feel that's better.

I think they should include bitwise in the front to be consistent with the other bitwise functions.

https://prestodb.github.io/docs/current/functions/bitwise.html

There would be no difference between arithmetic and logic left shift. So we're left with:

  • bitwise_shift_left
  • bitwise_arithmetic_shift_right
  • bitwise_logical_shift_right

i.e. logical/arithmetic is now in front rather than back of the name

Allows bitwise shift operations including arithmetic shift right,
logical shift right, and shift left.
@rongrong
Copy link
Contributor

rongrong commented Jun 6, 2019

I'm wondering, why are we going with functions rather than operator? Ideally, there should be >> and << and depending on the type of the input (signed or unsigned), it would be logical or arithmetic. Though it doesn't seem that Presto support unsigned type, which makes it a bit inconvenient. But arguably, in a system that does not support unsigned numbers, it's strange to support logical shift.

@highker
Copy link
Contributor

highker commented Jun 6, 2019

I guess it's because SQL syntax only has basic numeric operators specified (+, -, ...). For those not in SQL spec, it could be safer to treat them as functions for possible future compliance.

valueExpression
    : primaryExpression                                                                 #valueExpressionDefault
    | valueExpression AT timeZoneSpecifier                                              #atTimeZone
    | operator=(MINUS | PLUS) valueExpression                                           #arithmeticUnary
    | left=valueExpression operator=(ASTERISK | SLASH | PERCENT) right=valueExpression  #arithmeticBinary
    | left=valueExpression operator=(PLUS | MINUS) right=valueExpression                #arithmeticBinary
    | left=valueExpression CONCAT right=valueExpression                                 #concatenation
    ;

@rongrong
Copy link
Contributor

Dug around a little bit and Postgres support >> and << and Oracle actually has >>, << and >>>. So I'm betting on SQL Spec not assigning conflicting meanings to these in the foreseeable future. That can be a separate issue though. With that we can potentially still have a use of these functions, since they allow arbitrary bits rather than just valid data type boundaries. Operator, on the other hand, would return you the correct type, rather than always make it a BIGINT and require down-casting. In summary, I think it's worth exploring the operator route, but I won't block this PR from being merged.

@tdcmeehan tdcmeehan merged commit f9c77c5 into prestodb:master Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants