-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix casting "Generic" Vectors to FlatVector<UnknownValue> #11170
Conversation
Summary: In simple functions and simple aggregations, there's a bunch of places where we use TypeToFlatVector to decide how to cast a VectorPtr to its concrete type. Notably in the VectorWriters for complex types which we use in simple functions, we use this to cast the child Vectors of the complex Vectors we'll write to. Today, if the type is Generic, we cast to a FlatVector<UnknownValue>. This is wrong as the Vector could be of any flat-like Vector (FlatVector, ArrayVector, MapVector, etc.). Ultimately this just sort of works AFAICT because the GenericWriter ignores whatever type it was cast to and treats it as a generic BaseVector. It's generally unsafe though, and UBSan is catching this in our Fuzzer tests. To fix this, I updated the TypeKind of a Generic in SimpleTypeTrait to be INVALID rather than UNKNOWN. UNKNOWN is intended for Vectors of nulls where the type can't be determined. In this case the type can be determined in the code designed to handle Generics, and if you're trying to use the TypeKind of a Generic in SimpleTypeTrait you're probably doing something wrong, so INVALID seems like a better fit. The TypeTraits for INVALID are also not primitive and not fixed width unlike those for UNKNOWN which matches the other fields in SimpleTypeTrait for Generic. I also updated TypeToFlatVector to add a template specialization that sets type to BaseVector, rather than using KindToFlatVector. This is consistent with where these Vectors are used (e.g. GenericWriter) and makes the casts safe. Differential Revision: D63905959
This pull request was exported from Phabricator. Differential Revision: D63905959 |
✅ Deploy Preview for meta-velox canceled.
|
@@ -84,5 +85,9 @@ struct TypeToFlatVector { | |||
using type = typename KindToFlatVector<SimpleTypeTrait<T>::typeKind>::type; | |||
}; | |||
|
|||
template <typename T, bool comparable, bool orderable> | |||
struct TypeToFlatVector<Generic<T, comparable, orderable>> { | |||
using type = BaseVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit confusing as the expectation here is that it should return a flat vector, however, like you pointed out earlier, there is no better alternative. Since Generic is a special case that warrants these kind of mapping which look like hacks, can you add comments here to explain these decisions? They can get verbose, but at the same time would be great to have them here as the complexity of this choice is not obvious to a new reader.
What do you think?
This pull request has been merged in 2b25d86. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: This was a request in a comment made on facebookincubator#11170 I missed it before I landed the change, so this is to address it. Differential Revision: D63987591
Summary:
In simple functions and simple aggregations, there's a bunch of places where we use
TypeToFlatVector to decide how to cast a VectorPtr to its concrete type. Notably in
the VectorWriters for complex types which we use in simple functions, we use this to
cast the child Vectors of the complex Vectors we'll write to.
Today, if the type is Generic, we cast to a FlatVector. This is wrong
as the Vector could be of any flat-like Vector (FlatVector, ArrayVector, MapVector,
etc.). Ultimately this just sort of works AFAICT because the GenericWriter ignores
whatever type it was cast to and treats it as a generic BaseVector. It's generally
unsafe though, and UBSan is catching this in our Fuzzer tests.
To fix this, I updated the TypeKind of a Generic in SimpleTypeTrait to be INVALID
rather than UNKNOWN. UNKNOWN is intended for Vectors of nulls where
the type can't be determined. In this case the type can be determined in the code
designed to handle Generics, and if you're trying to use the TypeKind of a Generic
in SimpleTypeTrait you're probably doing something wrong, so INVALID seems like a
better fit. The TypeTraits for INVALID are also not primitive and not fixed width unlike
those for UNKNOWN which matches the other fields in SimpleTypeTrait for Generic.
I also updated TypeToFlatVector to add a template specialization that sets type to
BaseVector, rather than using KindToFlatVector. This is consistent with where these
Vectors are used (e.g. GenericWriter) and makes the casts safe.
Differential Revision: D63905959