-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New and Improved MapFusion #1629
Open
philip-paul-mueller
wants to merge
217
commits into
spcl:main
Choose a base branch
from
philip-paul-mueller:new-map-fusion
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
New and Improved MapFusion #1629
philip-paul-mueller
wants to merge
217
commits into
spcl:main
from
philip-paul-mueller:new-map-fusion
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now using the 3.9 type hints.
But it is too restrictive.
When the function was fixing the innteriour of the second map, it did not remove the readiong.
It almost passes all fuction. However, the one that needs renaming are not yet done.
…t in the input and output set. However, it is very simple.
Before it was going to look for the memlet of the consumer or producer. However, one should actually only look at the memlets that are adjacent to the scope node. At least this is how the original worked. I noticed this because of the `buffer_tiling_test.py::test_basic()` test. I was not yet focused on maps that were nested and not multidimensional. It seems that the transformation has some problems there.
Whet it now cheks for covering (i.e. if the information to exchange is enough) it will now no longer decend into the maps, but only inspect the first outgoing/incomming edges of the map entrie and exit. I noticed that the other way was to restrictive, especially for map tiling.
Otherwise we can end up in recursion.
Before it was replacing the elimated variables by zero. Which actually worked pretty good, but I have now changed that such that `offset()` is used. I am not sure why I used `replace` in the first place, but I think that there was an issue. However, I am not sure.
…g about importing, I will now remove them again.
Now I get an error in `tests/transformations/mapfusion_test.py::test_interstate_fusion`. But what is interessting that that particular test worked for the non optimized versions, see link below. It could be that I now have found the issue. Let's try it again. See: https://github.com/spcl/dace/actions/runs/13288839449/job/37104159945?pr=1629
However, it is still not perfect.
But it only affects `3.13.2`, all `3.9` tests pass (earlier it was different), okay I am not sure, but as far as I can remember only the `3.9` were reported. Now we keep all free tests and try again. https://github.com/spcl/dace/actions/runs/13304805373/job/37153254012?pr=1629
I have now disabled more stuff, let's see if the bug is still there. I think I should also reenable list processing, I am wondering if this is the issue.
Before `662817eff19dde65e15a370d64045cdbe2650fde` the tests failed and then we removed the follwoing tests - autooptimize - blas - fortran We will now add `autooptimize` back, let's see what is happeing. Alternatively we should also remove some `set` in the map_fusion implementation to make it more deterministic, it helped for the 3.9.
philip-paul-mueller
added a commit
to GridTools/gt4py
that referenced
this pull request
Feb 14, 2025
Last year, the then state of MapFusion from [DaCe PR#1629](spcl/dace#1629) was added to GT4Py, as a temporary fix until the PR in DaCe is merged and parallel map fusion has become available there. However, during that time the transformation in the PR has evolved and improved and some of the bug that were fixed are now appearing in GT4Py, for example [PR#1850](#1850) and [PR#1856](#1856). Thus this PR updated the MapFusion transformation that is currently inside GT4Py and replaces it with newest development version from DaCe. Because we need it, and it was designed from the start to be that way, it also adds parallel map fusion to the transformation. As before, this transformation, currently fully located in `map_fusion_dace.py`, is only kept inside the repo until DaCe has caught up to it. The PR also introduces some additional memory layer that encapsulates the DaCe transformation. Something that we have to deal with in the long run and we currently do because other parts of the toolchain require it. --------- Co-authored-by: edopao <edoardo16@gmail.com>
In [`ed58523d9b52c21e2fb6f6c4012e8d5f5a096021`](GridTools/gt4py@937e894) we reenabled `autooptimize` and the test passed. Now we are adding the `blas` tests back, then only the FORTRAN stuff is left.
This is the last tests that we can add. Remember the [one](spcl@47babf6) without it passed. And the [one](spcl@abe7051) that hadd all failed.
Now we will only have the numpy test. I want a clear tripping point, so let's run it again.
After removing everything, except the numpy test, it passes, see spcl@2ba2bff. So we will now add the fortran test again and it should fail.
Let's add the `blas` and `autooptimize` they were also triggering it last time.
Why was it working before, i.e. when all are present.
…t_fft_r2c`, let's try it again.
So let's add some stuff to see what happens.
…spcl/dace/actions/runs/13368029115/job/37330125396?pr=1629 So we are now removing some of them let's see what is happeining.
So now we add some new of the stray python tests: add_edge_pair_test.py add_state_api_test.py argmax_test.py array_interface_test.py blockreduce_cudatest.py buffer_tiling_test.py callback_test.py call_sdfg_test.py chained_nested_tasklet_test.py chained_tasklet_test.py compile_sdfg_test.py config_test.py confres_test.py conftest.py consolidate_edges_test.py const_access_test.py constant_array_test.py consume_chunk_cond_test.py consume_test.py control_flow_test.py copynd_test.py cpp_tasklet_test.py cppunparse_test.py cr_complex_test.py cuda_block_test.py cuda_grid2d_test.py cuda_grid_test.py cuda_highdim_kernel_test.py cuda_smem2d_test.py cuda_smem_test.py custom_build_folder_test.py custom_reduce_test.py datadesc_test.py default_storage_test.py different_stride_test.py duplicate_arg_test.py duplicate_naming_test.py dynamic_sdfg_functions_test.py dynamic_tb_map_cudatest.py enumerator_test.py external_module.py external_module_test.py global_resolver_test.py graph_test.py half_cudatest.py halfvec_cudatest.py host_map_host_data_test.py ifchain_test.py implicit_sdfg_test.py indirection_test.py The tests that are **not yet** added back again are: inline_chain_test.py inline_external_edges_test.py inline_noinput_test.py inline_noncontig_dim_test.py inline_nonsink_access_test.py inline_symbol_test.py inlining_test.py instrumentation_test.py intarg_test.py interstate_assignment_test.py kernel_fusion_cudatest.py lib_reuse_test.py local_inline_test.py map_dim_shuffle_test.py map_indirect_array_test.py mapreduce_test.py memlet_lifetime_validation_test.py memlet_propagation_decreasing_test.py memlet_propagation_squeezing_test.py memlet_propagation_test.py memlet_propagation_volume_test.py mlir_tasklet_test.py multi_inline_test.py multi_output_scope_test.py multiple_cr_test.py multiple_tasklet_test.py multiprogram_cudatest.py multistate_init_test.py multistream_copy_cudatest.py multistream_custom_cudatest.py multistream_kernel_cudatest.py ndloop_test.py nested_control_flow_test.py nested_cr_test.py nested_loop_test.py nested_reduce_test.py nested_sdfg_python_test.py nested_sdfg_scalar_test.py nested_sdfg_test.py nested_stream_test.py nested_strides_test.py nested_symbol_partial_test.py nested_symbol_test.py nested_vector_type_test.py nest_subgraph_test.py numpy_bool_input_test.py offset_stride_test.py openmp_test.py parallel_sections_test.py
I have now swapped the top level tests. I.e. all that were actived before are now disabled and all that were disabled before are now activated. Let's hope that it fails.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new and improved version of
MapFusion
.A summary of the changes can also be found here, it compares the resulting SDFGs generated by the old and new transformation of some unit tests.
Fixed Bugs and removed Limitations
.subset
member of the Memlet; I mean the concept) of the new intermediate data descriptor were not computed correctly in some cases, especially in presence of offsets. See thetest_offset_correction_range_read()
,test_offset_correction_scalar_read()
and thetest_offset_correction_empty()
tests..subset
and ignored.other_subset
. Which is correct in most cases but not always. See thetest_fusion_intrinsic_memlet_direction()
for more..dynamic
property of the Memelts were fully ignored leading to wrong code.A[i, j]
and the other map was accessingB[i + 1, j]
. Now this is possible as long as every access is point wise. See thetest_fusion_different_global_accesses()
test for an example.T
, had shape(10, 1, 20)
and inside the map was accessedT[__i, 0, __j]
, then the old transformation would have created an reduced intermediate of shape(1, 1, 1)
, new its shape is(1)
. Note if the intermediate has shape(10, 20)
instead and would be accessed asT[__i, __j]
then aScalar
would have been created. See also thestruct_dataflow
flag below.New Flags
only_toplevel_maps
: IfTrue
the transformation will only fuse maps that are located at the top level, i.e. maps inside maps will not be merged.only_inner_maps
: IfTrue
then the transformation will only fuse maps that are inside other maps.: If
True` then the transformation will assume that every intermediate is shared, i.e. the referenced data is used somewhere else in the SDFG and has to become an output of the fused maps. This will create dead data flow, but avoids a scan of the full SDFG.strict_dataflow
: This flag is enabled by default. It has two effects, first it will disable the cleaning of reduced intermediate storage. The second effect is more important as it will preserve a much stricter data flow. Most importantly, if the intermediate array is used downstream (this is not limited to the case that the array is the output of the second map) then the maps will not be fused together. This is mostly to work around some other bugs in DaCe, where other transformations failed to pink up the dependency. Note that the fused map would be correct, the problem are other transformations.FullMapFusion
This PR also introduced the
FullMapFusion
pass, which makes use of theFindSingleUseData
pass that was introduced in PR#1906.The
FullMapFusion
applies MapFusion as long as possible, i.e. fuses all maps that can be fused.But instead of scanning the SDFG every time an intermediate node has to be classified, i.e. can it be deleted or not, it is done once and then reused which will speed up fusion process as it will remove the need to traverse the full SDFG many times.
This new pass also replaced the direct application of MapFusion in
auto_optimizer
.References
Collection of known issues in other transformation:
RefineNestedAccess
and `SDFGState._read_and_write_sets()