New and Improved MapFusion #1629

philip-paul-mueller · 2024-08-22T13:54:43Z

This PR introduces a new and improved version of MapFusion.
A summary of the changes can also be found here, it compares the resulting SDFGs generated by the old and new transformation of some unit tests.

Fixed Bugs and removed Limitations

The subsets (not the .subset member of the Memlet; I mean the concept) of the new intermediate data descriptor were not computed correctly in some cases, especially in presence of offsets. See the test_offset_correction_range_read(), test_offset_correction_scalar_read() and the test_offset_correction_empty() tests.
Upon the propagation of the subsets, due to the changed intermediate, was not handled properly. Essentially, the transformation only updated .subset and ignored .other_subset. Which is correct in most cases but not always. See the test_fusion_intrinsic_memlet_direction() for more.
During the check if two maps could be fused the .dynamic property of the Memelts were fully ignored leading to wrong code.
The read-write conflict checks were refined, before all arrays needed to be accessed the wrong way, i.e. before a fusion was rejected when one map accessed A[i, j] and the other map was accessing B[i + 1, j]. Now this is possible as long as every access is point wise. See the test_fusion_different_global_accesses() test for an example.
The shape of the reduced intermediate is cleaned, i.e. unnecessary dimensions of size 1, are removed, except they were present in the original shape. To make an example, the intermediate array, T, had shape (10, 1, 20) and inside the map was accessed T[__i, 0, __j], then the old transformation would have created an reduced intermediate of shape (1, 1, 1), new its shape is (1). Note if the intermediate has shape (10, 20) instead and would be accessed as T[__i, __j] then a Scalar would have been created. See also the struct_dataflow flag below.

New Flags

only_toplevel_maps: If True the transformation will only fuse maps that are located at the top level, i.e. maps inside maps will not be merged.
only_inner_maps: If True then the transformation will only fuse maps that are inside other maps.
assume_always_shared: If True` then the transformation will assume that every intermediate is shared, i.e. the referenced data is used somewhere else in the SDFG and has to become an output of the fused maps. This will create dead data flow, but avoids a scan of the full SDFG.
strict_dataflow: This flag is enabled by default. It has two effects, first it will disable the cleaning of reduced intermediate storage. The second effect is more important as it will preserve a much stricter data flow. Most importantly, if the intermediate array is used downstream (this is not limited to the case that the array is the output of the second map) then the maps will not be fused together. This is mostly to work around some other bugs in DaCe, where other transformations failed to pink up the dependency. Note that the fused map would be correct, the problem are other transformations.

`FullMapFusion`

This PR also introduced the FullMapFusion pass, which makes use of the FindSingleUseData pass that was introduced in PR#1906.
The FullMapFusion applies MapFusion as long as possible, i.e. fuses all maps that can be fused.
But instead of scanning the SDFG every time an intermediate node has to be classified, i.e. can it be deleted or not, it is done once and then reused which will speed up fusion process as it will remove the need to traverse the full SDFG many times.
This new pass also replaced the direct application of MapFusion in auto_optimizer.

References

Collection of known issues in other transformation:

Now using the 3.9 type hints.

But it is too restrictive.

When the function was fixing the innteriour of the second map, it did not remove the readiong.

It almost passes all fuction. However, the one that needs renaming are not yet done.

…t in the input and output set. However, it is very simple.

Before it was going to look for the memlet of the consumer or producer. However, one should actually only look at the memlets that are adjacent to the scope node. At least this is how the original worked. I noticed this because of the `buffer_tiling_test.py::test_basic()` test. I was not yet focused on maps that were nested and not multidimensional. It seems that the transformation has some problems there.

Whet it now cheks for covering (i.e. if the information to exchange is enough) it will now no longer decend into the maps, but only inspect the first outgoing/incomming edges of the map entrie and exit. I noticed that the other way was to restrictive, especially for map tiling.

Otherwise we can end up in recursion.

Before it was replacing the elimated variables by zero. Which actually worked pretty good, but I have now changed that such that `offset()` is used. I am not sure why I used `replace` in the first place, but I think that there was an issue. However, I am not sure.

…ck is taken.

…g about importing, I will now remove them again.

Now I get an error in `tests/transformations/mapfusion_test.py::test_interstate_fusion`. But what is interessting that that particular test worked for the non optimized versions, see link below. It could be that I now have found the issue. Let's try it again. See: https://github.com/spcl/dace/actions/runs/13288839449/job/37104159945?pr=1629

However, it is still not perfect.

But it only affects `3.13.2`, all `3.9` tests pass (earlier it was different), okay I am not sure, but as far as I can remember only the `3.9` were reported. Now we keep all free tests and try again. https://github.com/spcl/dace/actions/runs/13304805373/job/37153254012?pr=1629

I have now disabled more stuff, let's see if the bug is still there. I think I should also reenable list processing, I am wondering if this is the issue.

Before `662817eff19dde65e15a370d64045cdbe2650fde` the tests failed and then we removed the follwoing tests - autooptimize - blas - fortran We will now add `autooptimize` back, let's see what is happeing. Alternatively we should also remove some `set` in the map_fusion implementation to make it more deterministic, it helped for the 3.9.

Last year, the then state of MapFusion from [DaCe PR#1629](spcl/dace#1629) was added to GT4Py, as a temporary fix until the PR in DaCe is merged and parallel map fusion has become available there. However, during that time the transformation in the PR has evolved and improved and some of the bug that were fixed are now appearing in GT4Py, for example [PR#1850](#1850) and [PR#1856](#1856). Thus this PR updated the MapFusion transformation that is currently inside GT4Py and replaces it with newest development version from DaCe. Because we need it, and it was designed from the start to be that way, it also adds parallel map fusion to the transformation. As before, this transformation, currently fully located in `map_fusion_dace.py`, is only kept inside the repo until DaCe has caught up to it. The PR also introduces some additional memory layer that encapsulates the DaCe transformation. Something that we have to deal with in the long run and we currently do because other parts of the toolchain require it. --------- Co-authored-by: edopao <edoardo16@gmail.com>

In [`ed58523d9b52c21e2fb6f6c4012e8d5f5a096021`](GridTools/gt4py@937e894) we reenabled `autooptimize` and the test passed. Now we are adding the `blas` tests back, then only the FORTRAN stuff is left.

This is the last tests that we can add. Remember the [one](spcl@47babf6) without it passed. And the [one](spcl@abe7051) that hadd all failed.

Now we will only have the numpy test. I want a clear tripping point, so let's run it again.

After removing everything, except the numpy test, it passes, see spcl@2ba2bff. So we will now add the fortran test again and it should fail.

It is not inside the fortran tests. So let's add the many other single files. This is quite strange.

Let's add the `blas` and `autooptimize` they were also triggering it last time.

Why was it working before, i.e. when all are present.

…t_fft_r2c`, let's try it again.

So let's add some stuff to see what happens.

…spcl/dace/actions/runs/13368029115/job/37330125396?pr=1629 So we are now removing some of them let's see what is happeining.

So now we add some new of the stray python tests: add_edge_pair_test.py add_state_api_test.py argmax_test.py array_interface_test.py blockreduce_cudatest.py buffer_tiling_test.py callback_test.py call_sdfg_test.py chained_nested_tasklet_test.py chained_tasklet_test.py compile_sdfg_test.py config_test.py confres_test.py conftest.py consolidate_edges_test.py const_access_test.py constant_array_test.py consume_chunk_cond_test.py consume_test.py control_flow_test.py copynd_test.py cpp_tasklet_test.py cppunparse_test.py cr_complex_test.py cuda_block_test.py cuda_grid2d_test.py cuda_grid_test.py cuda_highdim_kernel_test.py cuda_smem2d_test.py cuda_smem_test.py custom_build_folder_test.py custom_reduce_test.py datadesc_test.py default_storage_test.py different_stride_test.py duplicate_arg_test.py duplicate_naming_test.py dynamic_sdfg_functions_test.py dynamic_tb_map_cudatest.py enumerator_test.py external_module.py external_module_test.py global_resolver_test.py graph_test.py half_cudatest.py halfvec_cudatest.py host_map_host_data_test.py ifchain_test.py implicit_sdfg_test.py indirection_test.py The tests that are **not yet** added back again are: inline_chain_test.py inline_external_edges_test.py inline_noinput_test.py inline_noncontig_dim_test.py inline_nonsink_access_test.py inline_symbol_test.py inlining_test.py instrumentation_test.py intarg_test.py interstate_assignment_test.py kernel_fusion_cudatest.py lib_reuse_test.py local_inline_test.py map_dim_shuffle_test.py map_indirect_array_test.py mapreduce_test.py memlet_lifetime_validation_test.py memlet_propagation_decreasing_test.py memlet_propagation_squeezing_test.py memlet_propagation_test.py memlet_propagation_volume_test.py mlir_tasklet_test.py multi_inline_test.py multi_output_scope_test.py multiple_cr_test.py multiple_tasklet_test.py multiprogram_cudatest.py multistate_init_test.py multistream_copy_cudatest.py multistream_custom_cudatest.py multistream_kernel_cudatest.py ndloop_test.py nested_control_flow_test.py nested_cr_test.py nested_loop_test.py nested_reduce_test.py nested_sdfg_python_test.py nested_sdfg_scalar_test.py nested_sdfg_test.py nested_stream_test.py nested_strides_test.py nested_symbol_partial_test.py nested_symbol_test.py nested_vector_type_test.py nest_subgraph_test.py numpy_bool_input_test.py offset_stride_test.py openmp_test.py parallel_sections_test.py

I have now swapped the top level tests. I.e. all that were actived before are now disabled and all that were disabled before are now activated. Let's hope that it fails.

Started with a first version of the map fusion stuff.

aa433fe

philip-paul-mueller changed the title ~~Started with a first version of the map fusion stuff.~~ New and Improved MapFusion Aug 22, 2024

philip-paul-mueller marked this pull request as draft August 22, 2024 13:55

philip-paul-mueller added 27 commits August 23, 2024 08:32

Made some stylistic modification to teh code.

71a88a1

Now using the 3.9 type hints.

Added a function for estimating if something is pointwhise.

bc87ddb

But it is too restrictive.

Now there is an error in the actuall rewiering stuff.

497a2d6

Fixed a bug in the map fusion.

9e36447

When the function was fixing the innteriour of the second map, it did not remove the readiong.

Made some formating changes.

7a48e0d

Updated the tests of the map fusion.

d609045

It almost passes all fuction. However, the one that needs renaming are not yet done.

WIP: Started with a renamer function.

52c4542

Continued with the parallel fusion stuff.

3b758bf

The fusion transformation now also checks if there is a write conflic…

377b428

…t in the input and output set. However, it is very simple.

Updated some tests.

db4864b

Fixed an error. I shouild refactor that damn loop.

f395acd

Some improvements to the tests.

b1ab95e

Removed some debugging stuff.

945ca8f

Fixed some typing stuff.

940b9b6

Started with a better implementation for the data dependency test.

ecae361

First version of the pointwise checker in the map fusion.

64d07fd

Updated some test cases.

33a0edf

The shared data cache can not be dumped.

ff018f4

Otherwise we can end up in recursion.

Buffer tiling now finally works.

9267ea9

The Mapreduce now also works.

fc2db8a

Added a test to the map fusion stuff that ensures that the shared blo…

4d9f11d

…ck is taken.

Added a test for the indirect accesses case.

2b91465

Updated the heat 3d test. It now ensures that the fusion is now done.

73f4415

Fixed an error in the parallel map fusion.

94ecd19

philip-paul-mueller added 2 commits February 12, 2025 10:59

Enable more tests, let's see what is happing.

00823fd

More fun with indexing.

c94163e

philip-paul-mueller mentioned this pull request Feb 12, 2025

fix[dace][next]: Update MapFusion GridTools/gt4py#1857

Merged

philip-paul-mueller added 13 commits February 12, 2025 15:46

Next try.

3f6fd49

These tetss fails for other reasons that I do not understand somethin…

f929dab

…g about importing, I will now remove them again.

Made the processing order of the map renaming more deterministic.

b38ba0b

However, it is still not perfect.

Let's hope that the CI fails soon.

6441c2a

Why does the CI not start?

57a8a13

Added everything back, just for mege.

3cdc750

Merge remote-tracking branch 'spcl/main' into new-map-fusion

4bf37d1

Disabled again some tests.

b45cf5b

Let's put everything back.

abe7051

Now the bug is back, but only for 3.13, 3.9 passes consistently.

662817e

I have now disabled more stuff, let's see if the bug is still there. I think I should also reenable list processing, I am wondering if this is the issue.

philip-paul-mueller added 13 commits February 14, 2025 09:11

Even closer.

47babf6

In [`ed58523d9b52c21e2fb6f6c4012e8d5f5a096021`](GridTools/gt4py@937e894) we reenabled `autooptimize` and the test passed. Now we are adding the `blas` tests back, then only the FORTRAN stuff is left.

Now we add the fortran tests.

fa1331a

This is the last tests that we can add. Remember the [one](spcl@47babf6) without it passed. And the [one](spcl@abe7051) that hadd all failed.

Okay by adding the fortan tests it fails, see spcl@fa1331a.

2ba2bff

Now we will only have the numpy test. I want a clear tripping point, so let's run it again.

Nest step.

2ffeaaf

After removing everything, except the numpy test, it passes, see spcl@2ba2bff. So we will now add the fortran test again and it should fail.

No luck.

37265ba

It is not inside the fortran tests. So let's add the many other single files. This is quite strange.

Still no error.

2563f72

Let's add the `blas` and `autooptimize` they were also triggering it last time.

What else let's try this.

c95a966

Adding these files makes it works.

0a01a22

Why was it working before, i.e. when all are present.

It has failed but in some random test `tests/library/fft_test.py::tes…

f1b920b

…t_fft_r2c`, let's try it again.

After running it again, it passes.

bb77515

So let's add some stuff to see what happens.

After adding all stray *.py files it fails, see https://github.com/…

0aadd02

…spcl/dace/actions/runs/13368029115/job/37330125396?pr=1629 So we are now removing some of them let's see what is happeining.

The last pipeline passed, spcl@2699fdb

7ca2328

I have now swapped the top level tests. I.e. all that were actived before are now disabled and all that were disabled before are now activated. Let's hope that it fails.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New and Improved MapFusion #1629

New and Improved MapFusion #1629

philip-paul-mueller commented Aug 22, 2024 •

edited

Loading

New and Improved MapFusion #1629

Are you sure you want to change the base?

New and Improved MapFusion #1629

Conversation

philip-paul-mueller commented Aug 22, 2024 • edited Loading

Fixed Bugs and removed Limitations

New Flags

FullMapFusion

References

philip-paul-mueller commented Aug 22, 2024 •

edited

Loading

`FullMapFusion`