Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synthesis: minimal synthesis to drive MAX_UNGROUP_SIZE policy #1758

Conversation

oharboe
Copy link
Collaborator

@oharboe oharboe commented Jan 16, 2024

@gadfort @maliberty Thoughts?

This roughly cuts hierarchical synthesis times in half.

@gadfort
Copy link
Contributor

gadfort commented Jan 16, 2024

@oharboe what are you attempting to accomplish with this?

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 16, 2024

@oharboe what are you attempting to accomplish with this?

The first stage of synthesis takes seconds instead of hours this way...

@gadfort
Copy link
Contributor

gadfort commented Jan 16, 2024

Instead of relying on an area estimate (I've been using a instance count instead, which doesn't require a techmap to occur). Running two sets of synthesis is wasteful (I don't think the current setup in ORFS is doing us any favors and this just pushes off fixing it properly)

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 16, 2024

Instead of relying on an area estimate (I've been using a instance count instead, which doesn't require a techmap to occur). Running two sets of synthesis is wasteful (I don't think the current setup in ORFS is doing us any favors and this just pushes off fixing it properly)

I'm counting cells instead of area in this PR.

@gadfort
Copy link
Contributor

gadfort commented Jan 16, 2024

I see, still without running a techmap it's not clear that you are getting a proper count since $lcu and other complex structures would still be left behind. If your method works properly, I don't see the need for the python script since then you can just have a single synthesis run without the need to invoke yosys twice.

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 16, 2024

I see, still without running a techmap it's not clear that you are getting a proper count since $lcu and other complex structures would still be left behind. If your method works properly, I don't see the need for the python script since then you can just have a single synthesis run without the need to invoke yosys twice.

Good point, that would be much nicer.

How do you "use instance count"? Is that in the SiliconCompiler?

@gadfort
Copy link
Contributor

gadfort commented Jan 16, 2024

you can see https://github.com/siliconcompiler/siliconcompiler/blob/1122fadb92b4ca5803b7b66f0b071f3185310ffb/siliconcompiler/tools/yosys/syn_asic.tcl#L185
Like @maliberty said this has not been evaluated to make sure it works in all cases, but I suspect it would work for you, except it will take longer than a few seconds since it's doing some extra work along the way.

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 16, 2024

you can see https://github.com/siliconcompiler/siliconcompiler/blob/1122fadb92b4ca5803b7b66f0b071f3185310ffb/siliconcompiler/tools/yosys/syn_asic.tcl#L185 Like @maliberty said this has not been evaluated to make sure it works in all cases, but I suspect it would work for you, except it will take longer than a few seconds since it's doing some extra work along the way.

Interesting. How is this .tcl file run? It looks like you running it from tclsh and are importing yosys as a module and restarting synthesis a number of times, so there's some global state kept in the yosys module across "yosys foo" calls.

@gadfort
Copy link
Contributor

gadfort commented Jan 16, 2024

it's run via SC, we don't import yosys commands so the tcl script is purely tcl and yosys calls are via the yosys foo (which is how yosys set it up). It runs several trial flattenings to see when the design is no longer "benefits" from flattening. Long term yosys needs a better system which is compiled in since that would give better access to the FFs etc which is really what determines the level of flattening that is possible.

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 16, 2024

it's run via SC, we don't import yosys commands so the tcl script is purely tcl and yosys calls are via the yosys foo (which is how yosys set it up). It runs several trial flattenings to see when the design is no longer "benefits" from flattening. Long term yosys needs a better system which is compiled in since that would give better access to the FFs etc which is really what determines the level of flattening that is possible.

Silly question: how is state persisted between invocations of yosys in your .tcl script? In ram? on disk?

@gadfort
Copy link
Contributor

gadfort commented Jan 16, 2024

in ram, the call https://github.com/siliconcompiler/siliconcompiler/blob/1122fadb92b4ca5803b7b66f0b071f3185310ffb/siliconcompiler/tools/yosys/syn_asic.tcl#L41
saves the state (that way I can make some speculative changes and see if they are any good before commiting to that)
and https://github.com/siliconcompiler/siliconcompiler/blob/1122fadb92b4ca5803b7b66f0b071f3185310ffb/siliconcompiler/tools/yosys/syn_asic.tcl#L56
loads it back into the working memory.

@oharboe oharboe force-pushed the hierarchical-minimum-synthesis-for-ppolicy branch from 6642a8a to b3a08aa Compare January 17, 2024 07:13
@oharboe
Copy link
Collaborator Author

oharboe commented Jan 17, 2024

@gadfort @maliberty Single pass hierarchical synthesis. The first pass is very fast. Use number of cells as driver for the policy on whether to flatten modules or not.

There is a failure, but I wouldn't expect identical results.

I tried it on MegaBoom and it ran through rtlmp, which fails for a flattened design.

This implementation is a mix between tcl & python. The tcl interface in Yosys leaves something to be desired, so there's something to be said for exporting a .json file of everything and using Python or Tcl to examine the model, rather than try to use the yosys interface...

@oharboe
Copy link
Collaborator Author

oharboe commented Jan 18, 2024

@maliberty Trying to graph the new policy vs. proper cell count after full synthesis when using cells for MAX_UNGROUP_SIZE.

Note! This only shows whether the policy is similar, it doesn't show anything about the difference in quality of results!

make DESIGN_CONFIG=designs/nangate45/bp_multi_top/config.mk synth

NOTE!!! I fixed the label, the red stars is "flattened in neither case", i.e. policy is the same.

image

Adding to fast build:

techmap
opt -fast -full -purge

image

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe oharboe force-pushed the hierarchical-minimum-synthesis-for-ppolicy branch from 7f7da8c to ed40cd0 Compare January 18, 2024 08:03
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe oharboe force-pushed the hierarchical-minimum-synthesis-for-ppolicy branch from ed40cd0 to a465668 Compare January 18, 2024 08:05
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe
Copy link
Collaborator Author

oharboe commented Jan 18, 2024

@maliberty Cell count and area are not correlated in an obvious way... my thinking is that the hieararchy is not taken into account. A module that has almost no cells could contain two modules that has lots of cells... In that case, which modules do you keep, which do you flatten?

I'm puzzled by the current policy. Is there a reason it was done this way or was this just the way it was implemented initially and as it turned out it gave better numbers than the triton partitioner?

image

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
@oharboe
Copy link
Collaborator Author

oharboe commented Feb 8, 2024

Using stat -tech cmos we get an estimate of number of transistors which correlate strongly to area after very coarse synthesis.

image

@oharboe
Copy link
Collaborator Author

oharboe commented Feb 8, 2024

The policy is very nearly identical...

image

Signed-off-by: Øyvind Harboe <oyvind.harboe@zylin.com>
with open('reports/nangate45/bp_multi/base/synth_cmos.txt', 'r') as file:
cmos = json.load(file)
transistors = {name.replace('\\', ''):
int(module['estimated_num_transistors'].replace('+', ''))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the '+' means there were gates that couldn't be counted and were assigned zero.

@oharboe oharboe closed this Feb 8, 2024
@oharboe oharboe deleted the hierarchical-minimum-synthesis-for-ppolicy branch August 22, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants