-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Moore] Improve WaitEventOp, lower to LLHD #7518
Conversation
e33b25b
to
c540888
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first part of my review, I'll finish it later 😅
@@ -51,6 +51,32 @@ def ProcessOp : LLHDOp<"process", [ | |||
let assemblyFormat = "attr-dict-with-keyword $body"; | |||
} | |||
|
|||
def FinalOp : LLHDOp<"final", [ | |||
NoRegionArguments, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd make sense to add the RecursiveMemoryEffects
trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ProcessOp probably also wants this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point!
def EventOp : MooreOp<"wait_event", [ | ||
HasParent<"ProcedureOp"> | ||
def WaitEventOp : MooreOp<"wait_event", [ | ||
RecursiveMemoryEffects, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a HasParent<"ProcedureOp">
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because you can write something like this:
task foo(ref logic x);
@(posedge x);
endtask
module Bar;
logic y;
always foo(y);
endmodule
which would probably lower to this:
func.func @foo(%x: !moore.ref<l1>) {
moore.wait_event {
%0 = moore.read %x : <l1>
moore.detect_event posedge %0 : l1
}
}
hw.module @Bar() {
%y = moore.variable : <l1>
moore.procedure always {
func.call @foo(%y) : (!moore.ref<l1>) -> ()
}
}
I'm not entirely sure how we'd deal with this… probably some lowering pass on functions to make sure we can suspend in the middle of a function call 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh forgot about tasks 😞
That's a good question indeed.
// TODO: Enable the following once it not longer interferes with @(...) | ||
// event control checks. The introduced dummy variables make the event | ||
// control observe a static local variable that never changes, instead of | ||
// observing a module-wide signal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this due to mem2reg and not having something like the sample
op we talked about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly. The current implementation of SimplifyProcedures
introduces shadow variables for any global variable that it accesses. Since we now have regular Control Flow ops and basic blocks, and mem2reg is becoming more solid, we'll have to revisit this and figure out what exactly we want to do in this pass. Some form of mapping assignments of global variables to process outputs would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we originally planned to inline the body of always_comb
into the top level/module(hw.module
). Since we need to ensure the variable has a correct value(After running SimplifyProcedures
, we will run mem2reg
to eliminate the shadow variables). However, we would like to lower moore.procedure
into llhd.process
for the time being, I think maybe we needn't SimplifyProcedures
, WDYT 😁?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that sounds fantastic! 😀 I'm pretty sure we can add SimplifyProcedures again in the future to optimize some procedures early on, but I totally agree with going to LLHD processes for now and doing optimizations there. 🥳
}]; | ||
let arguments = (ins | ||
EdgeAttr:$edge, | ||
UnpackedType:$input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be restricted to a one-bit type or should we add a test for multi-bit operands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can have multi-bit operands, and the edge detection truth table will apply per bit. If the result is non-zero you get an event. The current MooreToCore conversion might need a few extensions to work reliably with multi-bit operands. Maybe a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erroring out in MooreToCore for now sounds good to me.
// the interesting signals. | ||
rewriter.setInsertionPointToEnd(checkBlock); | ||
if (!triggers.empty()) { | ||
auto triggered = rewriter.createOrFold<comb::OrOp>(loc, triggers, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, no short-circuiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is short-circuiting between the events in SystemVerilog. Having it be an expression may make lowering a bit easier later 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good to know that there's no short-circuiting in SV. 👍
It is excited to see its taking shape 👍. |
for (auto returnOp : | ||
llvm::make_early_inc_range(body.getOps<ReturnOp>())) { | ||
rewriter.setInsertionPoint(returnOp); | ||
rewriter.replaceOpWithNewOp<llhd::HaltOp>(returnOp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, the return statement cannot be used in a(n) initial/final. Slang also detects it(return statement is only valid inside task and function blocks
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right! The Moore dialect has its own moore.return
operation, which is used to terminated moore.procedure
ops. We should probably give it a better name… A return
in Verilog is mapped to a func.return
in MLIR, which is different from the moore.return
inserted implicitly as a terminator for moore.procedure
s 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I mistook body.getOps<ReturnOp>
as the return of function. Thanks for correcting me!
8ffa907
to
3c8775a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the rest of my review. Thanks for all these great improvements! 🎉
SmallDenseSet<Value> alreadyObserved; | ||
clonedOp.getBody().walk([&](Operation *operation) { | ||
for (auto value : operation->getOperands()) { | ||
if (value.getParentRegion() == &clonedOp.getBody()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are nested regions within the wait_event op region, we'd have to check if clonedOp.getBody()
is an ancestor region of the value defining region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! I've changed this and added a test for it.
auto triggered = rewriter.createOrFold<comb::OrOp>(loc, triggers, true); | ||
rewriter.create<cf::CondBranchOp>(loc, triggered, resumeBlock, waitBlock); | ||
} else { | ||
rewriter.create<cf::BranchOp>(loc, waitBlock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the resume block would have no predecessors. Do you think it's save to leave this to the conversion framework? We don't have a test for this case to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have added the pattern I've seen failing before to the test with an empty wait_event
. It looks like the dialect conversion uses the dominance information collected before the conversion to iterate through ops -- looks like making a block unreachable doesn't exclude it from the conversion.
Resolves #7482 |
Rework the `moore.wait_event` op to be able to accurately model the semantics of SystemVerilog's `@...` event control statements. The op now has a body region which is executed to detect a relevant change in one or more interesting values. A new `moore.detect_event` op serves as the mechanism to encode whether a posedge, negedge, both, or any change at all on a value should be detected as an event. Based on this new `moore.wait_event` op we can properly convert most of the event control statements in `ImportVerilog` to a corresponding MLIR op. Delay control like `#1ns` is not handled yet. In the MooreToCore conversion this new op allows us to properly generate `llhd.wait` operations at the right place that suspend process execution until an interesting event has occurred. This now also allows us to support almost all SystemVerilog processes in the lowering. The only missing ones are `always_comb` and `always_latch` which require an implicit `llhd.wait` to be inserted. @maerhart has a version of that lowering almost ready though. This commit also adds an `llhd.final` op in order to be able to lower `final` procedures. Fixes #7482. Co-authored-by: Martin Erhart <maerhart@outlook.com>
3c8775a
to
e4d0efc
Compare
Rework the
moore.wait_event
op to be able to accurately model the semantics of SystemVerilog's@...
event control statements. The op now has a body region which is executed to detect a relevant change in one or more interesting values. A newmoore.detect_event
op serves as the mechanism to encode whether a posedge, negedge, both, or any change at all on a value should be detected as an event.Based on this new
moore.wait_event
op we can properly convert most of the event control statements inImportVerilog
to a corresponding MLIR op. Delay control like#1ns
is not handled yet.In the MooreToCore conversion this new op allows us to properly generate
llhd.wait
operations at the right place that suspend process execution until an interesting event has occurred. This now also allows us to support almost all SystemVerilog processes in the lowering. The only missing ones arealways_comb
andalways_latch
which require an implicitllhd.wait
to be inserted. @maerhart has a version of that lowering almost ready though.This commit also adds an
llhd.final
op in order to be able to lowerfinal
procedures.