Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Tensors are not "modifiable lvalue" in extended lambdas. #457

Closed
sksg opened this issue Jul 28, 2023 · 4 comments
Closed

[BUG] Tensors are not "modifiable lvalue" in extended lambdas. #457

sksg opened this issue Jul 28, 2023 · 4 comments

Comments

@sksg
Copy link

sksg commented Jul 28, 2023

Describe the bug
When using tensors inside extended lambdas, the tensor elements cannot be assigned to.

To Reproduce
This minimal example fails:

#include <matx.h>

template<typename FUNC>
__global__ void kernel_launch(FUNC kernel) {
    kernel();
}

int main() {
    auto a = matx::make_tensor<float>({10});
    kernel_launch<<<0, 10>>>([=] __device__() { a(threadIdx.x) = threadIdx.x; });  // <-- error line
}

The build fails with error: expression must be a modifiable lvalue

Expected behavior
The above expression should not fail, and a should be filled with numbers from 0..9.

System details (please complete the following information):

  • OS: Ubuntu 22.04.1 LTS
  • CUDA version: 11.8
  • g++ version: 11.3.0
@cliffburdick
Copy link
Collaborator

Hi @sksg, when you're passing a tensor to a device/global function the tensor must first be converted to tensor_impl_t. The reason is that tensor_t maintains some state, like a reference count, that is not usable or wanted in device code. tensor_impl_t strips a tensor to the bare minimum needed to be used on device code. To convert, you use the base_type trait:

typename base_type<Op>::type op_;

So your code may be something like

int main() {
    auto a = matx::make_tensor<float>({10});
    auto a_base = base_type<decltype(a)>(a);
    kernel_launch<<<0, 10>>>([=] __device__() { a_base(threadIdx.x) = threadIdx.x; });  // <-- error line
}

I have not tried this yet to see if it works, but I will test and get back to you today.

@cliffburdick
Copy link
Collaborator

Hi @sksg, the issue is that there are two definitions of operator(), a const and a non-const version. Since lambdas capture by const, it's choosing the const version that cannot modify an lvalue. Instead you can pass it as a function parameter:

using namespace matx;
template<typename T, typename FUNC>
__global__ void kernel_launch(T t, FUNC kernel) {
    kernel(t);
}

    kernel_launch<<<0, 10>>>(a_base, [=] __device__(auto ab) { 
      ab(threadIdx.x) = threadIdx.x; });

And this works as well. Let me know if that suffices.

@sksg
Copy link
Author

sksg commented Jul 31, 2023

Hi @cliffburdick

Thanks for the speedy feedback.

I did not know that lambdas were const by default. It then makes perfect sense. I did try it with a mutable lambda instead:

#include <matx.h>

template<typename FUNC>
__global__ void kernel_launch(FUNC kernel) {
    kernel();
}

int main() {
    auto a = matx::make_tensor<float>({10});
    kernel_launch<<<0, 10>>>([=] __device__() mutable { a(threadIdx.x) = threadIdx.x; });  // <-- No errors!
}

Technically, I can verify that the device array a has the correct values. But I see your point wrt. matx::base_type. I do not know if any of the host state will be corrupted by using a directly rather than a_base. The above syntax is very concise, though.

Thanks for helping out. This is perfectly usable for me.

@cliffburdick
Copy link
Collaborator

Thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants