Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Moore] Clean up struct ops and add missing tests #7392

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

fabianschuiki
Copy link
Contributor

Rework the Moore dialect operations that manipulate struct values. These are intended to operate on StructType and UnpackedStructType values directly, but were defined in ODS as operating on references to structs. This was likely a hold-over from early development where we were still figuring out the distinction between ref types and value types in SV.

This commit adjusts the struct ops such that they actually operate on struct values instead of references to structs. It also moves more operand constraints into ODS and simplifies the op verifiers by factoring out some common code into helper functions.

Enhance the struct_inject canonicalizer to also consider struct_create operations as part of the inject chain. This allows an initial struct_create that is modified by a subsequent inject to be canonicalized into a new struct_create with updated values.

Add missing basic and error tests for the struct-related ops, and simplify the variable destructuring test by removing unrelated operations.

Also fixes an issue in variable op destructuring where a variable with initial value would have its initial value discarded during destructuring. Initial values now prevent destructuring. Alternatively, we may choose to insert appropriate struct_extract ops to destructure the initial value in the future.

@mingzheTerapines
Copy link
Contributor

mingzheTerapines commented Jul 29, 2024

Awesome, I learned a lot from these codes.

return {};
fields.push_back(NamedAttribute(member.name, field));
}
return DictionaryAttr::get(getContext(), fields);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really elegent

@mingzheTerapines
Copy link
Contributor

mingzheTerapines commented Jul 29, 2024

@fabianschuiki So structExtractOp could be like this? But it is has trait Pure which can not trigger SROA.

func.func @SplitStructs(%arg0: !moore.i42, %arg1: !moore.i1337) {
  // Named variables

  // CHECK-DAG: %x.a = moore.variable : <i42>
  // CHECK-DAG: %x.b = moore.variable : <i1337>
  // CHECK-NOT: moore.struct_extract
  %x = moore.variable : <struct<{a: i42, b: i1337}>>
  %0 = moore.read %x : struct<{a: i42, b: i1337}>
  // CHECK-DAG: [[A:%.+]] = moore.read %x.a : <i42>
  // CHECK-DAG: [[B:%.+]] = moore.read %x.b : <i1337>
  %1 = moore.struct_extract %x, "a" : struct<{a: i42, b: i1337}> -> i42
  %2 = moore.struct_extract %x, "b" : struct<{a: i42, b: i1337}> -> i1337
  // CHECK: moore.blocking_assign %arg0, [[A]]
  // CHECK: moore.blocking_assign %arg1, [[B]]
  moore.blocking_assign %arg0, %1 : !moore.i42
  moore.blocking_assign %arg1, %2 : !moore.i1337
  return
}

Comment on lines 313 to 314
if (isa<SVModuleOp>(getOperation()->getParentOp()))
return {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we will tag global variables with struct type as destructurable. For example, struct values exist in func.func.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we may want to also allow for variables in SVModuleOp to be destructured by SROA. Would that be a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we destructure in Moore dialect, mem2reg pass may eliminate part of struct variable, such as %x.a will be eliminated if it is unused. Some struct variable will be incomplete which means struct information will lose in this IR. Some optimizations for nested type may be disabled in lower IR.
What's more, most cases of struct variables operation will be transformed to non-struct variables operation unless this struct is used not by fielding. But in hw dialect, it still has many struct Ops. We do not need to destructure them in so higher IR.
I was concerning about this, wo I disabled SROA for global variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be no problem. As the follow you mentioned, when moore.read exists, SROA has nothing to do, and vice versa. So I think maybe we can allow for variables in SVModuleOp to be destructured by SROA. Those variables with struct type don't expand by SROA, we can turn them into moore.struct_create(I notice canonicalize method can do this), then lower moore.struct_create directly into hw.struct_create. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds great. We can't directly go from moore.variable to moore.struct_create, because the variable has ref<struct> type but moore.struct_create creates a struct directly. But in case Mem2Reg replaces the variable with SSA values, we can definitely materialize moore.struct_create. That would be especially nice also with moore.assigned_variable 😃

Comment on lines +7 to +16
// CHECK-DAG: %x.a = moore.variable : <i42>
// CHECK-DAG: %x.b = moore.variable : <i1337>
// CHECK-NOT: moore.struct_extract_ref
%x = moore.variable : <struct<{a: i42, b: i1337}>>
%0 = moore.struct_extract_ref %x, "a" : <struct<{a: i42, b: i1337}>> -> <i42>
%1 = moore.struct_extract_ref %x, "b" : <struct<{a: i42, b: i1337}>> -> <i1337>
// CHECK: moore.blocking_assign %x.a, %arg0
// CHECK: moore.blocking_assign %x.b, %arg1
moore.blocking_assign %0, %arg0 : !moore.i42
moore.blocking_assign %1, %arg1 : !moore.i1337
Copy link
Member

@hailongSun2000 hailongSun2000 Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how to handle the moore.extract like 🤔

%0 = moore.read %x;
%1 = moore.extract %0, "a"

when we run --sroa, %x will be earsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a read %x blocks SROA for %x. For example, the following code without read causes SROA to expand the variable %x:

func.func @Case1() {
  %x = moore.variable : <struct<{a: i42, b: i1337}>>
  moore.struct_extract_ref %x, "a" : <struct<{a: i42, b: i1337}>> -> <i42>
  moore.struct_extract_ref %x, "b" : <struct<{a: i42, b: i1337}>> -> <i1337>
  return
}
// after circt-opt --sroa
func.func @Case1() {
  %x.b = moore.variable : <i1337>
  %x.a = moore.variable : <i42>
  return
}

But if I add a read %x, SROA does not expand the variable:

func.func @Case2() {
  %x = moore.variable : <struct<{a: i42, b: i1337}>>
  moore.struct_extract_ref %x, "a" : <struct<{a: i42, b: i1337}>> -> <i42>
  moore.struct_extract_ref %x, "b" : <struct<{a: i42, b: i1337}>> -> <i1337>
  moore.read %x : <struct<{a: i42, b: i1337}>>
  return
}
// after circt-opt --sroa
func.func @Case2() {
  %x = moore.variable : <struct<{a: i42, b: i1337}>>
  moore.struct_extract_ref %x, "a" : <struct<{a: i42, b: i1337}>> -> <i42>
  moore.struct_extract_ref %x, "b" : <struct<{a: i42, b: i1337}>> -> <i1337>
  moore.read %x : <struct<{a: i42, b: i1337}>>
  return
}

So it looks like SROA is doing the correct thing in case there is a read which it cannot properly expand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SROA will be enabled only if the destructurable variable is unused after rewiring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if we have a local variable with struct type declared in the procedure body. We cannot eliminate it using mem2reg. But it's not a problem, we have decided to lower moore.procedure to llhd.process 😃 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Maybe we'll find more optimizations later on to help eliminate more of these variables -- in case that's even needed. But as long as we can lower it to LLHD, things should work 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabianschuiki what order should SROA and Canonicalize be?
If this PR #7341 is applied, no more variableOp for struct will exist after the Canonicalize Pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can canonicalize variables away everywhere. There are a lot of use cases that involve multiple assignments to variables, potentially from different processes. So I'm pretty sure that canonicalization will not really remove any variables. Things like a mem2reg conversion is fairly involved and requires quite a bit of analysis. I'm pretty sure that can't be done in a canonicalization pattern alone, but requires a dedicated mem2reg pass. Does the one that MLIR already has work for this?

if (auto index = getStructFieldIndex(type, name))
return getStructMembers(type)[*index].type;
return {};
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful interface for structLikeType. Will it be better if let be Interfaces for StructLikeType which means moving these to mooreTypes.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently experimenting with creating a common base class for StructType and UnpackedStructType, such that you can do a dyn_cast<AnyStructType>(type) to go from StructType and UnpackedStructType to a common AnyStructType base class. Hopefully I can move this code into that base class in MooreTypes.cpp as you suggest 😃

Rework the Moore dialect operations that manipulate struct values. These
are intended to operate on `StructType` and `UnpackedStructType` values
directly, but were defined in ODS as operating on references to structs.
This was likely a hold-over from early development where we were still
figuring out the distinction between ref types and value types in SV.

This commit adjusts the struct ops such that they actually operate on
struct values instead of references to structs. It also moves more
operand constraints into ODS and simplifies the op verifiers by
factoring out some common code into helper functions.

Enhance the `struct_inject` canonicalizer to also consider
`struct_create` operations as part of the inject chain. This allows an
initial `struct_create` that is modified by a subsequent inject to be
canonicalized into a new `struct_create` with updated values.

Add missing basic and error tests for the struct-related ops, and
simplify the variable destructuring test by removing unrelated
operations.

Also fixes an issue in variable op destructuring where a variable with
initial value would have its initial value discarded during
destructuring. Initial values now prevent destructuring. Alternatively,
we may choose to insert appropriate `struct_extract` ops to destructure
the initial value in the future.
@fabianschuiki
Copy link
Contributor Author

@mingzheTerapines in your example

%x = moore.variable : <struct<{a: i42, b: i1337}>>
%0 = moore.read %x : <struct<{a: i42, b: i1337}>>
%1 = moore.struct_extract %0, "a" : struct<{a: i42, b: i1337}> -> i42
%2 = moore.struct_extract %0, "b" : struct<{a: i42, b: i1337}> -> i1337

the SROA pass would not expand %x because there is a read operation that reads the entire variable %x. However, we could add a canonicalization that tries to pull struct_exracts up an in front of read operations. In your example that would result in:

%x = moore.variable : <struct<{a: i42, b: i1337}>>
%0 = moore.struct_extract_ref %x, "a" : <struct<{a: i42, b: i1337}>> -> <i42>
%1 = moore.struct_extract_ref %x, "b" : <struct<{a: i42, b: i1337}>> -> <i1337>
%2 = moore.read %0 : <i42>
%3 = moore.read %1 : <i1337>

In this form, SROA could operate again. This is a difficult canonicalization to get right though: splitting a single read of %x up into separate reads of the fields means that we now have to be careful that there are no writes to %x in between the individual reads. The reads have to be inserted at the exact same location as the old read, which means that the struct_extracts have to be pulled up in front of the reads and converted into a struct_extract_ref.

I don't know how LLVM handles this. Splitting a struct read up into multiple field reads doesn't feel like it's always the right thing to do: if the struct has 20 small i1 fields, doing a single read of the 20-bit struct is going to be much more efficient than 20 individual i1 reads. I'm wondering if optimizations like these use a cost modle in LLVM, or if there is a heuristic that is used to pick whether field reads or struct reads should be preferred. (For example, if >50% of a struct's bits are read it might be more efficient to read the entire struct once and then use struct_extract, instead of reading every individual field through struct_extract_ref.)

@mingzheTerapines
Copy link
Contributor

mingzheTerapines commented Jul 30, 2024

@fabianschuiki I partly agree with you.
" For splitting a single read of %x up into separate reads of the fields means that we now have to be careful that there are no writes to %x in between the individual reads":
we don't need to split old readOp, we can directly generate moore.struct_extract_refOp and new readOp by canonicalization of structExtractOp, which will "getinputmutable " of the old readOp. Then canonicalization will remove unused variable such as old readOp.
"For it might be more efficient to read the entire ":
It depends.
If it is for parallel case for reading and writing, reading field a and writing field b could be done parallely which means more efficient, but

  1. read entire struct
  2. readfield a from 1
  3. writefiled b to struct
    need to be atomically in order of codes.
    But it could be all right if we
    a. read entire struct
    b. readfield a from a
    c. writefiled b to a.
    d. write a to struct
    Read the value and do reading and writing on this value. Finally write the value back after all reading and writing. This may cause a long time occupying register. I am not sure this is a kind of side effect.

@fabianschuiki fabianschuiki merged commit b7b82fd into main Jul 30, 2024
4 checks passed
@fabianschuiki fabianschuiki deleted the fschuiki/moore-struct-fixes branch July 30, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants