Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Spark CAST(string as boolean) #10382

Closed
wants to merge 1 commit into from

Conversation

rui-mo
Copy link
Collaborator

@rui-mo rui-mo commented Jul 3, 2024

Spark only allows the strings t, f, y, n, 1, 0, yes, no, true, false and
their uppercase equivalents. Folly allows more strings than spark does, e.g.
'on' and 'off'. This PR restricts the strings that can be converted to boolean
to these strings. Strings apart from these will throw.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2024
Copy link

netlify bot commented Jul 3, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 439279e
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6684df321948870008e44c98

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Thank you for the fix.

velox/type/Conversions.h Outdated Show resolved Hide resolved
velox/type/Conversions.h Outdated Show resolved Hide resolved
velox/type/Conversions.h Outdated Show resolved Hide resolved
/// throws. By default, Presto compatible cast is conducted. The allowed strings
/// are `t, f, 1, 0, true, false` and their upper case equivalents.
/// @tparam TPolicy The specified cast policy. When set as SparkCastPolicy, the
/// allowed strings are `t, f, 1, 0, y, n, true, false, yes, no` and their upper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more readable if the comment describe the differences between Presto and Spark policies.

PrestoCastPolicy allows `t, f, 1, 0, true, false` strings and ...

SparkCastPolicy allows all strings that PrestoCastPolicy allows plus `y, n, yes, no` and their ...

Copy link
Collaborator Author

@rui-mo rui-mo Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I find Presto cast could also use 'LegacyCastPolicy' but its behavior should be the same with 'PrestoCastPolicy' for this case as Presto's implementation in VarcharOperators is not controlled by a config (please kindly correct me if I'm wrong). Changed the comment as below.

The specified cast policy. PrestoCastPolicy and
LegacyCastPolicy allow t, f, 1, 0, true, false and their upper case
equivalents. SparkCastPolicy allows all strings that they allow plus y, n, yes, no and their upper case equivalents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe LegacyCastPolicy is used by a streaming system at Meta. It uses Presto functions, but requires slightly different behavior in some cases. CC: @bikramSingh91 @kgpai @kagamiori

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explain. In Velox, Presto cast uses 'LegacyCastPolicy' if the query config 'kLegacyCast' is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo Should we document the behavior of this API when TPolicy is LegacyCastPolicy ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is documented as below, having the same behavior with PrestoCastPolicy. Thanks.

/// @tparam TPolicy PrestoCastPolicy and LegacyCastPolicy allow `t, f, 1, 0,
/// true, false` and their upper case equivalents. SparkCastPolicy additionally
/// allows 'y, n, yes, no' and their upper case equivalents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks.

velox/type/Conversions.h Show resolved Hide resolved
@mbasmanova mbasmanova added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jul 3, 2024
Expected<bool> castToBoolean(const char* data, size_t len);
/// @tparam TPolicy The specified cast policy. PrestoCastPolicy and
/// LegacyCastPolicy allow `t, f, 1, 0, true, false` and their upper case
/// equivalents. SparkCastPolicy allows all strings that they allow plus `y, n,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may drop 'The specified cast policy.'
nit: SparkCastPolicy additionally allows 'y, n, yes, no' and their upper case equivalents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated.

/// Presto compatible cast of strings to boolean. There is a set of strings
/// allowed to be casted to boolean. These strings are `t, f, 1, 0, true, false`
/// and their upper case equivalents. Casting from other strings to boolean
/// This function casts a string to a boolean value. There is a set of strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function casts a string to a boolean value.

Casts a string to a boolean. Allows a fixed set of strings.

@facebook-github-bot
Copy link
Contributor

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pedroerp merged this pull request in 5926750.

Copy link

Conbench analyzed the 1 benchmark run on commit 59267504.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants