Implement distributed `unfold` operation #1419

FOsterfeld · 2024-04-02T14:50:04Z

Due Diligence

General:
- title of the PR is suitable to appear in the Release Notes
Implementation:
- unit tests: all split configurations tested
- unit tests: multiple dtypes tested
- documentation updated where needed

Description

Add the function unfold to the available manipulations. unfold(a, dimension, size, step) for a DNDarray a behaves like torch.Tensor.unfold.

Example:

>>> x = ht.arange(1., 8)
>>> x
DNDarray([1., 2., 3., 4., 5., 6., 7.], dtype=ht.float32, device=cpu:0, split=e)
>>> ht.unfold(x, 0, 2, 1)
DNDarray([[1., 2.],
          [2., 3.],
          [3., 4.],
          [4., 5.],
          [5., 6.],
          [6., 7.]], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.unfold(x, 0, 2, 2)
DNDarray([[1., 2.],
          [3., 4.],
          [5., 6.]], dtype=ht.float32, device=cpu:0, split=None)

Issue/s resolved: #1400

Changes proposed:

Type of change

New feature (non-breaking change which adds functionality)

Memory requirements

Performance

Does this change modify the behaviour of other functions? If so, which?

no

github-actions · 2024-04-02T14:54:34Z

Thank you for the PR!

…ilar_to_torch_Tensor_unfold

github-actions · 2024-04-02T15:05:58Z

Thank you for the PR!

github-actions · 2024-04-02T15:13:49Z

Thank you for the PR!

codecov · 2024-04-02T15:29:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.07%. Comparing base (ef97474) to head (c03db4c).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1419      +/-   ##
==========================================
+ Coverage   92.04%   92.07%   +0.02%     
==========================================
  Files          83       83              
  Lines       12113    12144      +31     
==========================================
+ Hits        11150    11181      +31     
  Misses        963      963

Flag	Coverage Δ
unit	`92.07% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mrfh92 · 2024-04-03T10:34:16Z

The tests on the CUDA-runner seem to hang at test_manipulations.py for 5 MPI-processes.
This also happens locally on my machine, so there seems to be an error in unfold that results in hanging (most likely an MPI deadlock?)

…> chunk_size more tests

…ilar_to_torch_Tensor_unfold

github-actions · 2024-04-03T15:48:41Z

Thank you for the PR!

…ilar_to_torch_Tensor_unfold

github-actions · 2024-04-08T16:31:20Z

Thank you for the PR!

github-actions · 2024-04-10T10:07:12Z

Thank you for the PR!

…node

…ilar_to_torch_Tensor_unfold

github-actions · 2024-04-10T10:45:30Z

Thank you for the PR!

github-actions · 2024-04-10T10:46:39Z

Thank you for the PR!

for more information, see https://pre-commit.ci

github-actions · 2024-04-10T11:10:08Z

Thank you for the PR!

mrfh92 · 2024-04-15T12:03:27Z

On the Terrabyte cluster, using 8 processes on 2 nodes with 4 GPUs each I get the following error:

ERROR: test_unfold (heat.core.tests.test_manipulations.TestManipulations)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/dss/dsshome1/03/di93zek/heat/heat/core/tests/test_manipulations.py", line 3775, in test_unfold
    ht.unfold(x, 0, min_chunk_size, min_chunk_size + 1)  # no fully local unfolds on some nodes
  File "/dss/dsshome1/03/di93zek/heat/heat/core/manipulations.py", line 4272, in unfold
    ret_larray = torch.cat((unfold_loc, unfold_halo), dimension)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument tensors in method wrapper_CUDA_cat)

----------------------------------------------------------------------
Ran 32 tests in 26.574s

on CPU, everything seems to work (at least in test_manipulations.py)

…ilar_to_torch_Tensor_unfold

FOsterfeld · 2024-07-23T13:05:20Z

@mrfh92 I have now added the error for the case that size=1. I could also verify that the synchronization errors that caused data corruption do not occur anymore, so this PR should be ready for merging.

Undid my stupid change before that belongs to another issue

github-actions · 2024-07-23T14:09:01Z

Thank you for the PR!

…ilar_to_torch_Tensor_unfold

github-actions · 2024-08-12T08:06:19Z

Thank you for the PR!

…ilar_to_torch_Tensor_unfold

github-actions · 2024-08-13T11:19:46Z

Thank you for the PR!

mrfh92 · 2024-08-13T11:25:57Z

@FOsterfeld from my point of view this now looks fine
@ClaudiaComito do you agree?

mrfh92

Looks fine from my point of view. @FOsterfeld Thanks 👍

heat/core/dndarray.py

ClaudiaComito

@FOsterfeld @mrfh92 this looks great, I only found some (presumably) dead code that can be removed, otherwise I think it can be merged. Thanks a lot!

heat/core/tests/test_manipulations.py

heat/core/dndarray.py

…ilar_to_torch_Tensor_unfold

github-actions · 2024-08-17T22:31:12Z

Thank you for the PR!

github-actions · 2024-08-17T22:32:29Z

Thank you for the PR!

ClaudiaComito

Great job @FOsterfeld !

…ilar_to_torch_Tensor_unfold

github-actions · 2024-08-19T04:43:58Z

Thank you for the PR!

…ilar_to_torch_Tensor_unfold

github-actions · 2024-08-19T08:39:18Z

Thank you for the PR!

FOsterfeld added 4 commits March 19, 2024 16:39

implemented the easy cases and a simple test

4f35da8

general case

ff74451

exception handling, added test with two unfold (2D slices)

ff38eb2

added unfold to manipulations module

b95d40a

FOsterfeld and others added 2 commits April 2, 2024 17:01

added test

e01daf0

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

31fd8b4

…ilar_to_torch_Tensor_unfold

FOsterfeld and others added 3 commits April 3, 2024 17:15

fixed behavior for empty unfold_loc, exception handling for size - 1 …

b334002

…> chunk_size more tests

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

1747481

…ilar_to_torch_Tensor_unfold

wrong exception type in test

2e04c11

FOsterfeld and others added 3 commits April 8, 2024 18:03

fixed wrong exception type in tests

4e9bbe2

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

ad9c797

…ilar_to_torch_Tensor_unfold

fixed test for single node setting

c28b99c

added better docstring

f67ef7e

FOsterfeld and others added 2 commits April 10, 2024 12:40

added test to cover case that there are no fully local unfolds for a …

b40a715

…node

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

8b01812

…ilar_to_torch_Tensor_unfold

FOsterfeld and others added 2 commits April 10, 2024 13:04

fixed test case of no fully local unfolds

713e2ad

[pre-commit.ci] auto fixes from pre-commit.com hooks

b323da8

for more information, see https://pre-commit.ci

fixed error due to unspecified torch device

d833a77

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

cd01cbb

…ilar_to_torch_Tensor_unfold

Update batchparallelclustering.py

e6ef047

Undid my stupid change before that belongs to another issue

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

1d1eb6e

…ilar_to_torch_Tensor_unfold

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

abea89d

…ilar_to_torch_Tensor_unfold

mrfh92 removed the PR talk label Aug 13, 2024

mrfh92 requested a review from ClaudiaComito August 13, 2024 11:25

mrfh92 previously approved these changes Aug 13, 2024

View reviewed changes

heat/core/dndarray.py Outdated Show resolved Hide resolved

ClaudiaComito requested changes Aug 15, 2024

View reviewed changes

heat/core/tests/test_manipulations.py Outdated Show resolved Hide resolved

heat/core/dndarray.py Outdated Show resolved Hide resolved

Removed old/dead code, resolved review

d00569a

FOsterfeld dismissed mrfh92’s stale review via d00569a August 17, 2024 22:26

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

1af7076

…ilar_to_torch_Tensor_unfold

FOsterfeld requested review from ClaudiaComito and mrfh92 August 17, 2024 22:31

ClaudiaComito approved these changes Aug 19, 2024

View reviewed changes

ClaudiaComito added merge queue enhancement New feature or request labels Aug 19, 2024

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

d2e3ce9

…ilar_to_torch_Tensor_unfold

ClaudiaComito changed the title ~~Features/1400 implement unfold operation similar to torch tensor unfold~~ Implement distributed unfold operation Aug 19, 2024

Merge branch 'main' into features/1400-Implement_unfold-operation_sim…

c03db4c

…ilar_to_torch_Tensor_unfold

mtar merged commit 2ecf597 into main Aug 19, 2024
9 checks passed

mtar deleted the features/1400-Implement_unfold-operation_similar_to_torch_Tensor_unfold branch August 19, 2024 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement distributed `unfold` operation #1419

Implement distributed `unfold` operation #1419

FOsterfeld commented Apr 2, 2024 •

edited by ClaudiaComito

Loading

github-actions bot commented Apr 2, 2024

github-actions bot commented Apr 2, 2024

github-actions bot commented Apr 2, 2024

codecov bot commented Apr 2, 2024 •

edited

Loading

mrfh92 commented Apr 3, 2024 •

edited

Loading

github-actions bot commented Apr 3, 2024

github-actions bot commented Apr 8, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

mrfh92 commented Apr 15, 2024 •

edited

Loading

FOsterfeld commented Jul 23, 2024

github-actions bot commented Jul 23, 2024

github-actions bot commented Aug 12, 2024

github-actions bot commented Aug 13, 2024

mrfh92 commented Aug 13, 2024

mrfh92 left a comment

ClaudiaComito left a comment

github-actions bot commented Aug 17, 2024

github-actions bot commented Aug 17, 2024

ClaudiaComito left a comment

github-actions bot commented Aug 19, 2024

github-actions bot commented Aug 19, 2024

Implement distributed unfold operation #1419

Implement distributed unfold operation #1419

Conversation

FOsterfeld commented Apr 2, 2024 • edited by ClaudiaComito Loading

Due Diligence

Description

Changes proposed:

Type of change

Memory requirements

Performance

Does this change modify the behaviour of other functions? If so, which?

github-actions bot commented Apr 2, 2024

github-actions bot commented Apr 2, 2024

github-actions bot commented Apr 2, 2024

codecov bot commented Apr 2, 2024 • edited Loading

Codecov Report

mrfh92 commented Apr 3, 2024 • edited Loading

github-actions bot commented Apr 3, 2024

github-actions bot commented Apr 8, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

github-actions bot commented Apr 10, 2024

mrfh92 commented Apr 15, 2024 • edited Loading

FOsterfeld commented Jul 23, 2024

github-actions bot commented Jul 23, 2024

github-actions bot commented Aug 12, 2024

github-actions bot commented Aug 13, 2024

mrfh92 commented Aug 13, 2024

mrfh92 left a comment

Choose a reason for hiding this comment

ClaudiaComito left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 17, 2024

github-actions bot commented Aug 17, 2024

ClaudiaComito left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 19, 2024

github-actions bot commented Aug 19, 2024

Implement distributed `unfold` operation #1419

Implement distributed `unfold` operation #1419

FOsterfeld commented Apr 2, 2024 •

edited by ClaudiaComito

Loading

codecov bot commented Apr 2, 2024 •

edited

Loading

mrfh92 commented Apr 3, 2024 •

edited

Loading

mrfh92 commented Apr 15, 2024 •

edited

Loading