[C++] Simplify type_traits.h #38204

bkietz · 2023-10-11T13:42:10Z

Describe the enhancement requested

We have a large number of type traits in type_traits.h which are used inconsistently and whose relationships and specific semantics are not really documented. These have been added ad-hoc as they were needed rather than systematically laid out. I think instead of trying to construct them in reflection the class hierarchy of DataType is only indirectly useful; what we actually use these to check for is

can I visit the slots of an array of this DataType?
can those slots be represented as a std::string_view or are they numeric?
does an array of this type have an offsets buffer, and if so what is its type?

So I think it'd be more valuable to replace ambiguous traits like is_binary_like and is_string_like altogether, perhaps with is_string_view_visitable and is_utf8.

Nice to haves:

At the same time, traits like is_floating_type could be upgraded to c++17 variable templates. These are partially specializable as with class templates, so we wouldn't lose any flexibility by rewriting to:

namespace traits {
template <typename T>
constexpr bool is_floating_point = std::is_base_of_v<FloatingPointType, T>;
} // namespace traits

It'd also be nice to unify the template traits with the function traits (which taking a Type::type or a const DataType& and return bool), so that where one is available the other is definitely present (similar to the way enable_if specializations are currently always present).

The enable_if specializations are handy but could be replaced by if constexpr in many places.

Component(s)

C++

The text was updated successfully, but these errors were encountered:

felipecrv · 2023-10-19T14:11:22Z

used inconsistently and whose relationships and specific semantics are not really documented.

We can document them (!) with a publishable version of this:

These have been added ad-hoc as they were needed rather than systematically laid out

After I laid them out in the graph above there are only a few things I would change because I think the predicates actually follow a neat hierarchy:

Drop all the predicates that use std::is_base_of and define them either via other predicates or explicit enumeration of the types
Remove the two DEPRECATED predicates/traits:is_base_list_type and enable_if_base_list
Drop the substring "size" from some list-related traits and use only length

I think instead of trying to construct them in reflection the class hierarchy of DataType is only indirectly useful; what we actually use these to check for is ...

Sometimes we want to test whether a type belongs to an explicit definition of a set and sometimes we want to test for a property. There are uses for both. This is a classic debate in Logic [1].

One problem with predicates that test for a property (extensional equality) is that the set of types keeps growing and we can't know for sure existing kernel implementations outside of our control are not assuming other properties from the predicates we currently provide. If we had a "has offsets" predicate, we would be tempted to include list-views on that, but almost surely that would make code written based on that property break because that code probably also assumes that the offsets would be sorted. We would end-up with a confusing "has offsets" in the codebase with a big warning telling people that it can't be extended to list-views.

I think a good contract to honor in type_traits.h is to rarely include predicates that can be expanded: all the predicates define sets that we can enumerate easily. When I added the new list-view related predicates, I only expanded the is_nested predicate which I think is generic enough that when people use it, they don't assume much (🤞).

[1] https://en.wikipedia.org/wiki/Extensionality

bkietz · 2023-10-19T14:50:10Z

We can document them

We could, but I disagree that this is the best direction to take type_traits.h.

Sometimes we want to test whether a type belongs to an explicit definition of a set and sometimes we want to test for a property.

Could you give an example? I'm not convinced this is a useful distinction. If we explicitly define a set, when would we do so other than to enumerate the elements which have a property?

we can't know for sure existing kernel implementations outside of our control

This is a fair point, but I'd say that it would be sufficient to deprecate the existing traits to give users an opportunity to update.

felipecrv · 2023-10-19T16:59:03Z

Could you give an example? I'm not convinced this is a useful distinction. If we explicitly define a set, when would we do so other than to enumerate the elements which have a property?

Most of predicates in type_traits.h are intensional: they are referring to the types by definitions, not by the properties that are valid about them. We leave it to kernel code to derive interesting properties from the spec.

Example of properties:

Are offsets sorted?
Should it have length or length + 1 offsets?
Does it have a null bitmap that can be trusted?
Is the memory for the array elements allocated contiguously? If I access each element by string_view, can I access a sub-range simply by having a larger string_view?
Can I write to the child array in a different order I write the elements to the array? (this is true for fixed-size lists and list-views, but not for lists)

This is why I would prefer to keep type_traits.h (the public interface) mostly defining sets of types by name and keep the extensional definitions that talk about the properties private and closer to the kernels that leverage them.

felipecrv · 2023-10-19T17:02:35Z

This is a fair point, but I'd say that it would be sufficient to deprecate the existing traits to give users an opportunity to update.

Proposal: start an experimental type_traits.h alternative (private in the beginning), adopt it in kernels, evaluate the result, and compare it with the current situation and only then decide to deprecate type_traits.h.

WillAyd · 2023-10-26T22:01:20Z

@felipecrv is that visual available in the source or documentation today? It is a fantastic reference

felipecrv · 2023-10-27T13:54:24Z

@felipecrv is that visual available in the source or documentation today? It is a fantastic reference

@WillAyd not yet because I haven't figured out an effortless way of keeping it up to date. I will update this gist [1] when I update it again.

[1] https://gist.github.com/felipecrv/3c02f3784221d946dec1b031c6d400db

bkietz added the Type: enhancement label Oct 11, 2023

bkietz self-assigned this Oct 11, 2023

github-actions bot added the Component: C++ label Oct 11, 2023

bkietz mentioned this issue Oct 11, 2023

GH-37710: [C++][Integration] Add C++ Utf8View implementation #37792

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Simplify type_traits.h #38204

[C++] Simplify type_traits.h #38204

bkietz commented Oct 11, 2023

felipecrv commented Oct 19, 2023

bkietz commented Oct 19, 2023

felipecrv commented Oct 19, 2023

felipecrv commented Oct 19, 2023 •

edited

Loading

WillAyd commented Oct 26, 2023

felipecrv commented Oct 27, 2023 •

edited

Loading

[C++] Simplify type_traits.h #38204

[C++] Simplify type_traits.h #38204

Comments

bkietz commented Oct 11, 2023

Describe the enhancement requested

Nice to haves:

Component(s)

felipecrv commented Oct 19, 2023

bkietz commented Oct 19, 2023

felipecrv commented Oct 19, 2023

felipecrv commented Oct 19, 2023 • edited Loading

WillAyd commented Oct 26, 2023

felipecrv commented Oct 27, 2023 • edited Loading

felipecrv commented Oct 19, 2023 •

edited

Loading

felipecrv commented Oct 27, 2023 •

edited

Loading