Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New extents implementation #212

Merged
merged 33 commits into from
Apr 4, 2023
Merged

New extents implementation #212

merged 33 commits into from
Apr 4, 2023

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Nov 22, 2022

This replaces the extents implementation with something significanlty simpler. Need to do some more performance testing, but this is ready for review.

@crtrott crtrott mentioned this pull request Nov 22, 2022
@crtrott crtrott force-pushed the new-extents-impl branch 3 times, most recently from f206c73 to 3fa2214 Compare February 1, 2023 22:52
@crtrott crtrott marked this pull request as ready for review February 1, 2023 23:00
@crtrott
Copy link
Member Author

crtrott commented Feb 1, 2023

You may just want to look at the new extents file not in the diff. I did just replace the file after all.

include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
@crtrott
Copy link
Member Author

crtrott commented Feb 4, 2023

Some perf tests:

template <class Extents, class ... Args>
void extents_construction(benchmark::State& state, Extents, int R, Args ... args) {
  for (auto _ : state) {
    double val =0;
    for(int r=0; r<R; r++) {
       Extents ext(args...);
       val += ext.extent(0);
    }
    if(val<1.0*R*10) std::terminate();
  }
}

BENCHMARK_CAPTURE(
  extents_construction, int_int_int, stdex::dextents<int,3>(), 100000000, 100, 100, 100
);
BENCHMARK_CAPTURE(
  extents_construction, array_int_3, stdex::dextents<int,3>(), 100000000, std::array<int,3>{100, 100, 100}
);

In -O0 the new extents are faster. In O1 and above they are the same.

O0:

# stable
extents_construction/int_int_int 3966498599 ns   3957175517 ns            1
extents_construction/array_int_3 4942861061 ns   4931248140 ns            1
# new-extents-impl
extents_construction/int_int_int 1841957323 ns   1837625666 ns            1
extents_construction/array_int_3 3745596231 ns   3736797353 ns            1

O3:

#stable
extents_construction/int_int_int  108691256 ns    108436294 ns            6
extents_construction/array_int_3  108762740 ns    108507276 ns            6
#new-extents-impl
extents_construction/int_int_int  108782075 ns    108526909 ns            6
extents_construction/array_int_3  108849447 ns    108594120 ns            6

@crtrott
Copy link
Member Author

crtrott commented Feb 4, 2023

Compile time test construction:

#include<experimental/mdspan>

constexpr size_t dyn = std::experimental::dynamic_extent;
template<class IndexT, size_t R, size_t ... Args>
struct ExtentsConstructor {
  using extents_t = std::experimental::extents<IndexT, R, Args...>;
  template<class ... DynArgs>
  static constexpr size_t value (DynArgs ... args) {
    extents_t ext(R,args...);
    return ext.extent(0) + ExtentsConstructor<IndexT, R-1, Args...>::value(args...);
  }
};

template<class IndexT, size_t ... Args>
struct ExtentsConstructor<IndexT, 0, Args...> {
  template<class ... DynArgs>
  static constexpr size_t value (DynArgs ... args) { return 0; }
};

size_t foo(int val1) {
  size_t value = 0;
  value += ExtentsConstructor<int32_t, 120, 3, 4, 5>::value(val1,4,5);
  value += ExtentsConstructor<uint32_t, 120, 3, 4, 5>::value(val1,4,5);
  value += ExtentsConstructor<int64_t, 120, 3, 4, 5>::value(val1,4,5);
  value += ExtentsConstructor<uint64_t, 120, 3, 4, 5>::value(val1,4,5);
  value += ExtentsConstructor<int32_t, 120, dyn,dyn,dyn>::value(val1,val1,val1);
  value += ExtentsConstructor<uint32_t, 120, dyn,dyn,dyn>::value(val1,val1,val1);
  value += ExtentsConstructor<int64_t, 120, dyn,dyn,dyn>::value(val1,val1,val1);
  value += ExtentsConstructor<uint64_t, 120, dyn,dyn,dyn>::value(val1,val1,val1);
  return value;
}

Simply compile as object file:
time g++ -O3 -c test.cpp -I${HOME}/Kokkos/mdspan/include

GCC 11.1 Old: 5.4s New 4.8s
Clang 15 Old: 4.0s New 3.9s

So looks like compile time for construction went down.

@crtrott
Copy link
Member Author

crtrott commented Feb 4, 2023

Compile time access benchmark:

#include<experimental/mdspan>

constexpr size_t dyn = std::experimental::dynamic_extent;
template<class Extents, size_t R>
struct ExtentsConstructor {
  static constexpr size_t value (const Extents& ext) {
    return ext.extent(static_cast<int32_t>(R%Extents::rank())) +
           ext.extent(static_cast<int64_t>(R%Extents::rank())) +
           ext.extent(static_cast<uint32_t>(R%Extents::rank())) +
           ext.extent(static_cast<uint64_t>(R%Extents::rank())) +
            + ExtentsConstructor<Extents, R-1>::value(ext);
  }
};

template<class Extents>
struct ExtentsConstructor<Extents, 0> {
  static constexpr size_t value (const Extents&) { return 0; }
};

size_t foo(int val1) {
  size_t value = 0;
  {
    using ext_t = std::experimental::extents<int32_t, 3, 4, 5>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<int64_t, 3, 4, 5>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<uint32_t, 3, 4, 5>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<uint64_t, 3, 4, 5>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<int32_t, dyn, dyn, dyn>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<int64_t, dyn, dyn, dyn>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<uint32_t, dyn, dyn, dyn>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  {
    using ext_t = std::experimental::extents<uint64_t, dyn, dyn, dyn>;
    value += ExtentsConstructor<ext_t, 120>::value(ext_t(val1,4,5));
  }
  return value;
}

GCC 11.1 Old: 0.95 New 0.89
Clang 15 Old: 0.72 New 0.66

@crtrott
Copy link
Member Author

crtrott commented Feb 4, 2023

I also ran the existing benchmarks and didn't see anything bad. So in general I think this is good to go.

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lovely improvement -- it's much easier to read! Thank you for doing this!

My only concern is minor, that functions like get sometimes take int and sometimes take size_t. This could lead to conversion warnings.

include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
crtrott and others added 2 commits March 15, 2023 13:30
Co-Authored-By: Wesley Maxey <71408887+wmaxey@users.noreply.github.com>
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love it but it is mostly fine

@@ -272,7 +239,11 @@ struct layout_stride {
#else
: __base_t(__base_t{__member_pair_t(
#endif
#ifndef _MDSPAN_NEW_EXTENT2S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this macro?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover from when you could switch in the PR between the old and new implementation, gonna remove.

else
return 0;
}
template <size_t r> MDSPAN_INLINE_FUNCTION constexpr static size_t get() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload is never used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kinda would go for consistency here across the different things which looks similar?

CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines +101 to +103
constexpr static T get(size_t) { return T(); }
template <size_t> MDSPAN_INLINE_FUNCTION constexpr static T get() {
return T();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling get on an empty array is definitely a precondition violation. Thus, would you consider using __builtin_unreachable() with GCC, or std::unreachable with C++23, with functions like this? This works fine in practice with constexpr functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't want to add ifdefs and builtins inside this implementation only code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at least assert since it will be called on out of bounds access and static_assert where possible

: __storage_{
#else
: __base_t(__base_t{typename __base_t::__stored_type{
constexpr maybe_static_array(const std::span<T, N> &vals) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider passing span by value instead of by const reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure.

Copy link
Contributor

@nmm0 nmm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my biggest issue is returning values if the size of maybe_static_array is 0. I would rather specialize and maybe even assert if the get functions are accessed

CMakeLists.txt Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
Comment on lines +233 to +235
constexpr static size_t m_size = sizeof...(Values);
constexpr static size_t m_size_dynamic =
_MDSPAN_FOLD_PLUS_RIGHT((Values == dyn_tag), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be moved into the respective functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually m_size_dynamic is used right below here, I guess if I moved that member to the end so the constexpr accessor functions are already defined? But I don't see a big benefit since they are anyway private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just less code. I'm not dead set on this

Comment on lines +279 to +282
MDSPAN_TEMPLATE_REQUIRES(class T, size_t N,
/* requires */ (N == m_size_dynamic && N == 0))
MDSPAN_INLINE_FUNCTION
explicit constexpr extents(Integral... exts) noexcept
#if defined(_MDSPAN_USE_ATTRIBUTE_NO_UNIQUE_ADDRESS)
: __storage_{
#else
: __base_t(__base_t{typename __base_t::__stored_type{
constexpr maybe_static_array(const std::array<T, N> &) : m_dyn_vals{} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MDSPAN_TEMPLATE_REQUIRES(class T, size_t N,
/* requires */ (N == m_size_dynamic && N == 0))
MDSPAN_INLINE_FUNCTION
explicit constexpr extents(Integral... exts) noexcept
#if defined(_MDSPAN_USE_ATTRIBUTE_NO_UNIQUE_ADDRESS)
: __storage_{
#else
: __base_t(__base_t{typename __base_t::__stored_type{
constexpr maybe_static_array(const std::array<T, N> &) : m_dyn_vals{} {}
MDSPAN_TEMPLATE_REQUIRES(class T,
/* requires */ (m_size_dynamic == 0))
MDSPAN_INLINE_FUNCTION
constexpr maybe_static_array(const std::array<T, 0> &) : m_dyn_vals{} {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work because of std::array not working on device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh weird, my diff got messed up idk why it grabbed some of the old code. The only change here was to simplify

MDSPAN_TEMPLATE_REQUIRES(class T, size_t N,
                           /* requires */ (N == m_size_dynamic && N == 0))
MDSPAN_INLINE_FUNCTION
constexpr maybe_static_array(const std::array<T, N> &) : m_dyn_vals{} {}

to

MDSPAN_TEMPLATE_REQUIRES(class T,
                           /* requires */ (m_size_dynamic == 0))
MDSPAN_INLINE_FUNCTION
constexpr maybe_static_array(const std::array<T, 0> &) : m_dyn_vals{} {}

-- you are already using array here so this isn't introducing it in a new place

include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Outdated Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
template<class OtherIndexType, size_t... RHS>
template <class IndexType, size_t... Extents> class extents {
public:
// typedefs for integral types used
Copy link
Contributor

@mhoemmen mhoemmen Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider checking the Mandate that IndexType is a signed or unsigned integral type? That would effectively make the code self-documenting, so that the comment would become superfluous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can do that.

Copy link
Contributor

@mhoemmen mhoemmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Christian! This is much more concise; I like it! I just made a few optional suggestions.

Copy link
Contributor

@nmm0 nmm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall I would want the out of bounds checks to be asserted/static_asserted when able but otherwise looks good!

include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
Comment on lines +101 to +103
constexpr static T get(size_t) { return T(); }
template <size_t> MDSPAN_INLINE_FUNCTION constexpr static T get() {
return T();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at least assert since it will be called on out of bounds access and static_assert where possible

include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
include/experimental/__p0009_bits/extents.hpp Show resolved Hide resolved
Comment on lines +233 to +235
constexpr static size_t m_size = sizeof...(Values);
constexpr static size_t m_size_dynamic =
_MDSPAN_FOLD_PLUS_RIGHT((Values == dyn_tag), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just less code. I'm not dead set on this

Comment on lines +279 to +282
MDSPAN_TEMPLATE_REQUIRES(class T, size_t N,
/* requires */ (N == m_size_dynamic && N == 0))
MDSPAN_INLINE_FUNCTION
explicit constexpr extents(Integral... exts) noexcept
#if defined(_MDSPAN_USE_ATTRIBUTE_NO_UNIQUE_ADDRESS)
: __storage_{
#else
: __base_t(__base_t{typename __base_t::__stored_type{
constexpr maybe_static_array(const std::array<T, N> &) : m_dyn_vals{} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh weird, my diff got messed up idk why it grabbed some of the old code. The only change here was to simplify

MDSPAN_TEMPLATE_REQUIRES(class T, size_t N,
                           /* requires */ (N == m_size_dynamic && N == 0))
MDSPAN_INLINE_FUNCTION
constexpr maybe_static_array(const std::array<T, N> &) : m_dyn_vals{} {}

to

MDSPAN_TEMPLATE_REQUIRES(class T,
                           /* requires */ (m_size_dynamic == 0))
MDSPAN_INLINE_FUNCTION
constexpr maybe_static_array(const std::array<T, 0> &) : m_dyn_vals{} {}

-- you are already using array here so this isn't introducing it in a new place

Comment on lines +111 to 113
public static_array_impl<0, T, Values...> {

template <class ThisIndexType, size_t... Extents>
class extents
#if !defined(_MDSPAN_USE_ATTRIBUTE_NO_UNIQUE_ADDRESS)
: private detail::__no_unique_address_emulation<
detail::__partially_static_sizes_tagged<detail::__extents_tag, ThisIndexType , size_t, Extents...>>
#endif
{
public:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor point, but both publics here are redundant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above thing is not the same (where you say m_size_dynamic==0). Think N =1 and m_size_dynamic = 0, your case would take it, whats there right now doesn.t

@mhoemmen
Copy link
Contributor

mhoemmen commented Apr 4, 2023

@nmm0 wrote:

I think we should at least assert since it will be called on out of bounds access and static_assert where possible

regarding

 template <size_t> MDSPAN_INLINE_FUNCTION constexpr static T get() {
    return T();
}

another approach would be to use the C++23 feature std::unreachable(), or one of the compiler-specific equivalents, like GCC's __builtin_unreachable().

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>
@crtrott crtrott merged commit a9489ae into kokkos:stable Apr 4, 2023
@crtrott crtrott deleted the new-extents-impl branch April 4, 2023 17:25
mhoemmen added a commit to mhoemmen/mdspan that referenced this pull request Jul 26, 2023
…product_kokkos_impl

[ready] kokkos impl `triangular_matrix_{left,right}_product`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants