How to manipulate tensors distributed across multiple devices in heat #1118

nuwoo · 2023-03-15T02:09:10Z

nuwoo
Mar 15, 2023

Follow the distributed tutorials, code like that, already used heat_test.py check the mpi and heat installation.

import heat

N = 2**10
a=heat.arange(N,dtype=heat.float32,device='gpu',split=0)
a_0=heat.arange(int(N/2),device='gpu',split=0,dtype=heat.int64)
a_1=int(N/2)+a_0

a[a_0] = a[a_1] + a[a_0]
a[a_1] = a[a_0] - a[a_1]

When it uses only one GPU, it works fine, but when there are multiple GPUs, it reports an error:

distributed-error

So i want to know if i operate it in a wrong way or some manual to reference.

ps: sorry to ask question here, i can't use pyheat tag in stackoverflow.

ClaudiaComito · 2023-03-15T11:34:46Z

ClaudiaComito
Mar 15, 2023
Maintainer

Hi again @nuwoo , asking here is perfect!

I want to make sure I understand your script.

You've created a 1-D array aand distributed it (split=0) over two GPUs.
You've created two 1-D indexing arrays - but they're also distributed (split=0), so half of a_0 is in GPU 1, the other half on GPU 2, same for a_1. I'm assuming this was by mistake.

Because of the follow-up sums, here's what I would do:

import heat

N = 2**10
a=heat.arange(N,dtype=heat.float32,device='gpu',split=0)

processes = a.comm.size
# reshape a into a 2D array, 1 row per process
a = heat.reshape(a, (processes, -1)) 
# a is still distributed along axis 0
print("global shape, local shape, split axis = ", a.shape, a.lshape, a.split)

# set first row to sum of rows (equivalent to your a[a_0] = a[a_1] + a[a_0])
a[0] = heat.sum(a, axis=0)  # heat.sum calls MPI Allreduce 

# set second row to difference of rows
a[1] = -a[1]
a[1] = heat.sum(a, axis=0)

# if you need `a` to be 1D, reshape again
a = heat.reshape(a, (N,))

Does this help? Let us know how it goes.

Thanks to you I found a bug in the print function - #1121

3 replies

nuwoo Mar 16, 2023
Author

thanks a lot, i will restruct the code and reply later, i also found the print bug but forget it. Another question, if i want to manipulate the distributed array in multi devices not resplit it, what should i do ?

ClaudiaComito Mar 16, 2023
Maintainer

@nuwoo can you tell us what exactly you're trying to do? From your code snippet above it looks like you want to swap the first half of the array with the second half, is that your actual use case?

ClaudiaComito Mar 16, 2023
Maintainer

import heat

N = 2**10
a=heat.arange(N,dtype=heat.float32,device='gpu',split=0)

processes = a.comm.size
# reshape a into a 2D array, 1 row per process
a = heat.reshape(a, (processes, -1)) 
# a is still distributed along axis 0
print("global shape, local shape, split axis = ", a.shape, a.lshape, a.split)

# set first row to sum of rows (equivalent to your a[a_0] = a[a_1] + a[a_0])
a[0] = heat.sum(a, axis=0)  # heat.sum calls MPI Allreduce 

# set second row to difference of rows
a[1] = -a[1]
a[1] = heat.sum(a, axis=0)

# if you need `a` to be 1D, reshape again
a = heat.reshape(a, (N,))

This code manipulates the arrays without resplitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to manipulate tensors distributed across multiple devices in heat #1118

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to manipulate tensors distributed across multiple devices in heat #1118

nuwoo Mar 15, 2023

Replies: 1 comment · 3 replies

ClaudiaComito Mar 15, 2023 Maintainer

nuwoo Mar 16, 2023 Author

ClaudiaComito Mar 16, 2023 Maintainer

ClaudiaComito Mar 16, 2023 Maintainer

nuwoo
Mar 15, 2023

Replies: 1 comment 3 replies

ClaudiaComito
Mar 15, 2023
Maintainer

nuwoo Mar 16, 2023
Author

ClaudiaComito Mar 16, 2023
Maintainer

ClaudiaComito Mar 16, 2023
Maintainer