Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix casting "Generic" Vectors to FlatVector<UnknownValue> #11170

Closed

Conversation

kevinwilfong
Copy link
Contributor

Summary:
In simple functions and simple aggregations, there's a bunch of places where we use
TypeToFlatVector to decide how to cast a VectorPtr to its concrete type. Notably in
the VectorWriters for complex types which we use in simple functions, we use this to
cast the child Vectors of the complex Vectors we'll write to.

Today, if the type is Generic, we cast to a FlatVector. This is wrong
as the Vector could be of any flat-like Vector (FlatVector, ArrayVector, MapVector,
etc.). Ultimately this just sort of works AFAICT because the GenericWriter ignores
whatever type it was cast to and treats it as a generic BaseVector. It's generally
unsafe though, and UBSan is catching this in our Fuzzer tests.

To fix this, I updated the TypeKind of a Generic in SimpleTypeTrait to be INVALID
rather than UNKNOWN. UNKNOWN is intended for Vectors of nulls where
the type can't be determined. In this case the type can be determined in the code
designed to handle Generics, and if you're trying to use the TypeKind of a Generic
in SimpleTypeTrait you're probably doing something wrong, so INVALID seems like a
better fit. The TypeTraits for INVALID are also not primitive and not fixed width unlike
those for UNKNOWN which matches the other fields in SimpleTypeTrait for Generic.

I also updated TypeToFlatVector to add a template specialization that sets type to
BaseVector, rather than using KindToFlatVector. This is consistent with where these
Vectors are used (e.g. GenericWriter) and makes the casts safe.

Differential Revision: D63905959

Summary:
In simple functions and simple aggregations, there's a bunch of places where we use 
TypeToFlatVector to decide how to cast a VectorPtr to its concrete type.  Notably in
the VectorWriters for complex types which we use in simple functions, we use this to
cast the child Vectors of the complex Vectors we'll write to.

Today, if the type is Generic, we cast to a FlatVector<UnknownValue>.  This is wrong
as the Vector could be of any flat-like Vector (FlatVector, ArrayVector, MapVector,
etc.).  Ultimately this just sort of works AFAICT because the GenericWriter ignores
whatever type it was cast to and treats it as a generic BaseVector. It's generally
unsafe though, and UBSan is catching this in our Fuzzer tests.

To fix this, I updated the TypeKind of a Generic in SimpleTypeTrait to be INVALID
rather than UNKNOWN.  UNKNOWN is intended for Vectors of nulls where
the type can't be determined.  In this case the type can be determined in the code
designed to handle Generics, and if you're trying to use the TypeKind of a Generic
in SimpleTypeTrait you're probably doing something wrong, so INVALID seems like a
better fit.  The TypeTraits for INVALID are also not primitive and not fixed width unlike
those for UNKNOWN which matches the other fields in SimpleTypeTrait for Generic.

I also updated TypeToFlatVector to add a template specialization that sets type to
BaseVector, rather than using KindToFlatVector. This is consistent with where these
Vectors are used (e.g. GenericWriter) and makes the casts safe.

Differential Revision: D63905959
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 4, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63905959

Copy link

netlify bot commented Oct 4, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 37981fb
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67002ffd1881ed00081404ee

@@ -84,5 +85,9 @@ struct TypeToFlatVector {
using type = typename KindToFlatVector<SimpleTypeTrait<T>::typeKind>::type;
};

template <typename T, bool comparable, bool orderable>
struct TypeToFlatVector<Generic<T, comparable, orderable>> {
using type = BaseVector;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing as the expectation here is that it should return a flat vector, however, like you pointed out earlier, there is no better alternative. Since Generic is a special case that warrants these kind of mapping which look like hacks, can you add comments here to explain these decisions? They can get verbose, but at the same time would be great to have them here as the complexity of this choice is not obvious to a new reader.

What do you think?

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 2b25d86.

Copy link

Conbench analyzed the 1 benchmark run on commit 2b25d863.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

kevinwilfong pushed a commit to kevinwilfong/velox that referenced this pull request Oct 7, 2024
Summary:
This was a request in a comment made on 
facebookincubator#11170 I missed it before I landed the
change, so this is to address it.

Differential Revision: D63987591
facebook-github-bot pushed a commit that referenced this pull request Oct 7, 2024
Summary:
Pull Request resolved: #11187

This was a request in a comment made on
#11170 I missed it before I landed the
change, so this is to address it.

Reviewed By: amitkdutta, bikramSingh91

Differential Revision: D63987591

fbshipit-source-id: 08c403a394d67402c393d40cea58bce5afafaf2f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants