Skip to content

Commit

Permalink
[rtl] instruction prefetch buffer (IPB) improvements (#455)
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting authored Dec 14, 2022
2 parents 0c27b24 + 64dceda commit ba2fc97
Show file tree
Hide file tree
Showing 16 changed files with 75 additions and 44 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ mimpid = 0x01040312 => Version 01.04.03.12 => v1.4.3.12

| Date (*dd.mm.yyyy*) | Version | Comment |
|:-------------------:|:-------:|:--------|
| 13.12.2022 | 1.7.8.5 | code cleanup of FIFO module; improved **instruction prefetch buffer (IPB)** - IPD depth can be as small as "1" and will be adjusted automatically when enabling the `C` ISA extension; update hardware implementation results; [#455](https://github.com/stnolting/neorv32/pull/455) |
| 09.12.2022 | 1.7.8.4 | :sparkles: new option to add custom **R5-type** (4 source registers, 1 destination register) instructions to **Custom Functions Unit (CFU)**; [#452](https://github.com/stnolting/neorv32/pull/452) |
| 08.12.2022 | 1.7.8.3 | :bug: fix interrupt behavior when in user-mode; minor core rtl fixes; do not check registers specifiers in CFU instructions (i.e. using registers above `x15` when `E` ISA extension is enabled); [#450](https://github.com/stnolting/neorv32/pull/450) |
| 03.12.2022 | 1.7.8.2 | :sparkles: new option to add custom **R4-type** RISC-V instructions to **Custom Functions Unit (CFU)**; rework CFU hardware module, intrinsic library and example program; [#449](https://github.com/stnolting/neorv32/pull/449) |
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,10 +200,10 @@ for custom tightly-coupled co-processors, accelerators or interfaces
Implementation results for **exemplary CPU configurations** generated for an Intel Cyclone IV `EP4CE22F17C6` FPGA
using Intel Quartus Prime Lite 21.1 (no timing constrains, _balanced optimization_, f_max from _Slow 1200mV 0C Model_).

| CPU Configuration (version [1.7.7.8](https://github.com/stnolting/neorv32/blob/main/CHANGELOG.md)) | LEs | FFs | Memory bits | DSPs | f_max |
| CPU Configuration (version [1.7.8.5](https://github.com/stnolting/neorv32/blob/main/CHANGELOG.md)) | LEs | FFs | Memory bits | DSPs | f_max |
|:-----------------------|:----:|:----:|:----:|:-:|:-------:|
| `rv32i_Zicsr` | 1328 | 678 | 1024 | 0 | 130 MHz |
| `rv32i_Zicsr_Zicntr` | 1614 | 808 | 1024 | 0 | 130 MHz |
| `rv32i_Zicsr` | 1223 | 607 | 1024 | 0 | 130 MHz |
| `rv32i_Zicsr_Zicntr` | 1578 | 773 | 1024 | 0 | 130 MHz |
| `rv32imc_Zicsr_Zicntr` | 2338 | 992 | 1024 | 0 | 130 MHz |

Implementation results for an **exemplary SoC/Processor configurations** generated for a Xilinx Artix-7 `xc7a35ticsg324-1L` FPGA
Expand Down
10 changes: 5 additions & 5 deletions docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ just _exemplary_. If not otherwise mentioned all implementations use the default
[cols="<2,<8"]
[grid="topbot"]
|=======================
| HW version: | `1.7.7.8`
| HW version: | `1.7.8.5`
| Top entity: | `rtl/core/neorv32_cpu.vhd`
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
| Toolchain: | Quartus Prime Lite 21.1
Expand All @@ -261,10 +261,10 @@ just _exemplary_. If not otherwise mentioned all implementations use the default
[options="header",grid="rows"]
|=======================
| CPU ISA Configuration | LEs | FFs | MEM bits | DSPs | _f~max~_
| `rv32e` | 830 | 400 | 512 | 0 | 130 MHz
| `rv32i` | 834 | 400 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr` | 1328 | 678 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr_Zicntr` | 1614 | 808 | 1024 | 0 | 130 MHz
| `rv32e` | 720 | 360 | 512 | 0 | 130 MHz
| `rv32i` | 724 | 364 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr` | 1223 | 607 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr_Zicntr` | 1578 | 773 | 1024 | 0 | 130 MHz
| `rv32im_Zicsr_Zicntr` | 2087 | 983 | 1024 | 0 | 130 MHz
| `rv32imc_Zicsr_Zicntr` | 2338 | 992 | 1024 | 0 | 130 MHz
| `rv32imcb_Zicsr_Zicntr` | 3175 | 1247 | 1024 | 0 | 130 MHz
Expand Down
12 changes: 9 additions & 3 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -454,12 +454,18 @@ The state of this generic can be retrieved by software via the <<_mxisa>> CSR.
[cols="4,4,2"]
[frame="all",grid="none"]
|======
| **CPU_IPB_ENTRIES** | _natural_ | 2
| **CPU_IPB_ENTRIES** | _natural_ | 1
3+| This generic configures the number of entries in the CPU's instruction prefetch buffer.
The value has to be a power of two and has to be greater than or equal to two (>= 2).
Long linear sequences of code can benefit from an increased IPB size.
The value has to be a power of two and has to be greater than or equal to one (>= 1). The
IPB can help improving memory access latency. Furthermore, long linear code sequences will
benefit from an increased IPB size.
|======

[WARNING]
If the compressed ISA extension `_CPU_EXTENSION_RISCV_C_` (<<_cpu_extension_riscv_c>>) is enabled and the IPB depth
is set to 1, this configuration is internally overridden and the IPB will be implemented with **2** entries. This is required
for handling unaligned 32-bit instructions.


// ####################################################################################################################
:sectnums:
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/application_specific_configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift ope
* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
* Increase the CPU's instruction prefetch buffer size: if **no** instruction cache is implemented `CPU_IPB_ENTRIES` should be
quite large (recommended value is >= 8); if the instruction cache is implemented `CPU_IPB_ENTRIES` values above 4 are
rather inefficient
quite large
* _To be continued..._


Expand Down Expand Up @@ -55,7 +54,7 @@ also reduces program code size by approximately 30%.
* If not explicitly used/required, exclude the CPU standard counters `[m]instret[h]`
(number of instruction) and `[m]cycle[h]` (number of cycles) from synthesis by disabling the `Zicntr` ISA extension
(note, this is not RISC-V compliant).
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`) to its minimum (=1).
* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
* If you have unused DSP block available, you can map multiplication operations to those slices instead of
using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
Expand Down
14 changes: 9 additions & 5 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ entity neorv32_cpu is
-- Extension Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -120,10 +120,14 @@ architecture neorv32_cpu_rtl of neorv32_cpu is
constant XLEN : natural := 32; -- data path width
-- ----------------------------------------------------------------------------------------------

-- local constants --
-- local constants: additional register file read ports --
constant regfile_rs3_en_c : boolean := CPU_EXTENSION_RISCV_Zxcfu or CPU_EXTENSION_RISCV_Zfinx; -- 3rd register file read port (rs3)
constant regfile_rs4_en_c : boolean := CPU_EXTENSION_RISCV_Zxcfu; -- 4th register file read port (rs4)

-- local constant: instruction prefetch buffer depth --
constant ipb_override_c : boolean := (CPU_EXTENSION_RISCV_C = true) and (CPU_IPB_ENTRIES < 2); -- override IPB size: set to 2?
constant ipb_depth_c : natural := cond_sel_natural_f(ipb_override_c, 2, CPU_IPB_ENTRIES);

-- local signals --
signal ctrl : std_ulogic_vector(ctrl_width_c-1 downto 0); -- main control bus
signal imm : std_ulogic_vector(XLEN-1 downto 0); -- immediate
Expand Down Expand Up @@ -206,8 +210,8 @@ begin
-- Instruction prefetch buffer --
assert not (is_power_of_two_f(CPU_IPB_ENTRIES) = false) report
"NEORV32 CPU CONFIG ERROR! Number of entries in instruction prefetch buffer <CPU_IPB_ENTRIES> has to be a power of two." severity error;
assert not (CPU_IPB_ENTRIES < 2) report
"NEORV32 CPU CONFIG ERROR! Number of entries in instruction prefetch buffer <CPU_IPB_ENTRIES> has to be >= 2." severity error;
assert not (ipb_override_c = true) report
"NEORV32 CPU CONFIG WARNING! Overriding <CPU_IPB_ENTRIES> configuration (setting =2) because C ISA extension is enabled." severity warning;

-- PMP --
assert not (PMP_NUM_REGIONS > 0) report
Expand Down Expand Up @@ -276,7 +280,7 @@ begin
-- Tuning Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries is instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES => ipb_depth_c, -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
2 changes: 1 addition & 1 deletion rtl/core/neorv32_cpu_control.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ entity neorv32_cpu_control is
-- Tuning Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
46 changes: 32 additions & 14 deletions rtl/core/neorv32_fifo.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -140,30 +140,48 @@ begin
fifo_half_level_simple:
if (FIFO_DEPTH = 1) generate
half_o <= fifo.full;
end generate;
end generate; -- /fifo_half_level_simple

fifo_half_level_complex:
if (FIFO_DEPTH > 1) generate
level_diff <= std_ulogic_vector(unsigned(fifo.w_pnt) - unsigned(fifo.r_pnt));
half_o <= level_diff(level_diff'left-1) or fifo.full;
end generate;
end generate; -- /fifo_half_level_complex


-- FIFO Memory ----------------------------------------------------------------------------
-- FIFO Memory - Write --------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
fifo_write: process(clk_i)
begin
if rising_edge(clk_i) then
if (fifo.we = '1') then
if (FIFO_DEPTH = 1) then
fifo.buf <= wdata_i;
else
-- "real" FIFO memory (several entries) --
fifo_memory:
if (FIFO_DEPTH > 1) generate
fifo_write: process(clk_i)
begin
if rising_edge(clk_i) then
if (fifo.we = '1') then
fifo.data(to_integer(unsigned(fifo.w_pnt(fifo.w_pnt'left-1 downto 0)))) <= wdata_i;
end if;
end if;
end if;
end process fifo_write;
end process fifo_write;
fifo.buf <= (others => '0'); -- unused
end generate; -- /fifo_memory

-- simple register/buffer (single entry) --
fifo_buffer:
if (FIFO_DEPTH = 1) generate
fifo_write: process(clk_i)
begin
if rising_edge(clk_i) then
if (fifo.we = '1') then
fifo.buf <= wdata_i;
end if;
end if;
end process fifo_write;
fifo.data <= (others => (others => '0')); -- unused
end generate; -- /fifo_buffer


-- FIFO Memory - Read ---------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
-- "asynchronous" read --
fifo_read_async:
if (FIFO_RSYNC = false) generate
Expand All @@ -175,7 +193,7 @@ begin
rdata <= fifo.data(to_integer(unsigned(fifo.r_pnt(fifo.r_pnt'left-1 downto 0))));
end if;
end process fifo_read;
end generate;
end generate; -- /fifo_read_async

-- synchronous read --
fifo_read_sync:
Expand All @@ -190,7 +208,7 @@ begin
end if;
end if;
end process fifo_read;
end generate;
end generate; -- /fifo_read_sync


-- Output Gate ----------------------------------------------------------------------------
Expand Down
8 changes: 4 additions & 4 deletions rtl/core/neorv32_package.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ package neorv32_package is

-- Architecture Constants (do not modify!) ------------------------------------------------
-- -------------------------------------------------------------------------------------------
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01070804"; -- NEORV32 version - no touchy!
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01070805"; -- NEORV32 version - no touchy!
constant archid_c : natural := 19; -- official RISC-V architecture ID - hands off!

-- Check if we're inside the Matrix -------------------------------------------------------
Expand Down Expand Up @@ -1007,7 +1007,7 @@ package neorv32_package is
-- Tuning Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 2; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural := 1; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural := 4; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -1173,7 +1173,7 @@ package neorv32_package is
-- Tuning Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -1245,7 +1245,7 @@ package neorv32_package is
-- Extension Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries is instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
4 changes: 2 additions & 2 deletions rtl/core/neorv32_top.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ entity neorv32_top is
-- Tuning Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 2; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural := 1; -- entries in instruction prefetch buffer, has to be a power of 2, min 1

-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
Expand Down Expand Up @@ -564,7 +564,7 @@ begin
-- Extension Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries is instruction prefetch buffer, has to be a power of 2
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries is instruction prefetch buffer, has to be a power of 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
2 changes: 2 additions & 0 deletions rtl/system_integration/neorv32_ProcessorTop_stdlogic.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ entity neorv32_ProcessorTop_stdlogic is
-- Extension Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 1; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural := 4; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -311,6 +312,7 @@ begin
-- Extension Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
2 changes: 1 addition & 1 deletion rtl/system_integration/neorv32_SystemTop_AvalonMM.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ entity neorv32_top_avalonmm is
-- Extension Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 2; -- entries is instruction prefetch buffer, has to be a power of 2
CPU_IPB_ENTRIES : natural := 1; -- entries is instruction prefetch buffer, has to be a power of 1, min 1

-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
Expand Down
1 change: 1 addition & 0 deletions rtl/system_integration/neorv32_SystemTop_axi4lite.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,7 @@ begin
-- Extension Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => 2, -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
Loading

0 comments on commit ba2fc97

Please sign in to comment.