Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is core_region's clock controlled and gated? #11

Open
retrhelo opened this issue Apr 18, 2022 · 2 comments
Open

How is core_region's clock controlled and gated? #11

retrhelo opened this issue Apr 18, 2022 · 2 comments

Comments

@retrhelo
Copy link

I'm recently reading your paper "A RISC-V in-network accelerator for flexible high-performance low-power packet processing", along with the source code. And I find there're some mismatches between the paper and the source code, which are quite confusing for me.

I'm reading the source code on tag v0.6.1, and I make no changes to the source files. There's no significant changes for hardware design in hw/ according to git diff with branch master, so I think it's okay to consider v0.6.1 as "update-to-date".

There are connections in hw/deps/pulp_cluster/rtl/pulp_cluster.sv, that I believe play the role of clock-gating the core_region.

// line 1031
cluster_peripherals #(
...
) cluster_peripherals_i (
...
  .core_busy_i(core_busy),
  .core_clk_en_o(clk_core_en),
...
);

// line 1155
core_region #(
...
) core_region_i (
...
  .clock_en_i(clk_core_en[i]),
...
  .core_busy_o(core_busy[i]),
...
);

Looks like that this cluster_peripherals_i instance is controlling/clock-gating the RISC-V cores. However, the paper mentions that

If the HPU driver has no task/handler to execute, it stops the HPUs by clock-gating it.

But I didn't find any connection between HPU driver and cluster_peripherals_i in the source code... Yet I don't find much description about this instance in the paper. So here are my questions:

  1. In current implemenation, by which module is core controlled/clock-gated, and what behavior is the module to control the core?
  2. What role is cluster_peripherals_i playing in the design? I noticed that it manages "events" from timer, DMA and etc., but how do these events and their sources work as a part of the design?
@SalvatoreDiGirolamo
Copy link
Collaborator

SalvatoreDiGirolamo commented Apr 19, 2022

Hi!

In current implemenation, by which module is core controlled/clock-gated, and what behavior is the module to control the core?

You're right. That change didn't make it to the published code. To re-introduce it, it should be enough to drive the core clock from the HPU driver (https://github.com/spcl/pspin/blob/master/hw/src/pkt_scheduler/hpu_driver.sv): i.e., add a clk_o to the HPU driver and use that as clk_i of the core (https://github.com/spcl/pspin/blob/master/hw/deps/pulp_cluster/rtl/pulp_cluster.sv#L1167). The HPU driver can use its clk_i and the condition state_q==Idle (https://github.com/spcl/pspin/blob/master/hw/src/pkt_scheduler/hpu_driver.sv#L383) to gate the clock towards the core (i.e., the newly introduced clk_o).

Alternatively, HPU driver could have a core_clock_en_o signal that is combined with clk_i by the core itself (maybe this is cleaner). Or just reuse clk_core_en from cluster_peripherals (the one you mentioned).

I'll be happy to review a PR in case you implement this!

What role is cluster_peripherals_i playing in the design? I noticed that it manages "events" from timer, DMA and etc., but how do these events and their sources work as a part of the design?

I'm pretty sure that this unit is not used in current design version. In the first iteration, the DMA engine was communicating with the cores via events (e.g., to signal DMA completion), thus via cluster_peripherals_i. Now the DMA engine provides a per-core interface, iirc. I think it could be safely removed but I'd need to double check.

@retrhelo
Copy link
Author

Alternatively, HPU driver could just have a core_clock_en_o signal that is combined with clk_i by the core itself (maybe this is cleaner).

As the core_region module already has a clock_en_i input port, this solution may involve less changes to current design. I'll try to work it through.

I'm pretty sure that this unit is not used in current design version.

That's great. I'll try to remove it from the cluster to have a slimmer one. I understand that the DMAC is important to the cluster for tasks like moving data from L2 Cache into the L1 TCDM, so I'll be careful dealing with it.

Now the DMA engine provides a per-core interface, iirc.

I know little about this iirc interface, and don't find it in the code. I wonder where I can have a look at this interface? Maybe I can make some modifcation to the core_demux used by core_region to add a new iirc interface, so the core can access the DMAC directly.

I really appreciate your reply and it helps a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants