Github: http://github.com/ultraembedded/biriscv
- 32-bit RISC-V ISA CPU core.
- Superscalar (dual-issue) in-order 6 or 7 stage pipeline.
- Support RISC-V’s integer (I), multiplication and division (M), and CSR instructions (Z) extensions (RV32IMZicsr).
- Branch prediction (bimodel/gshare) with configurable depth branch target buffer (BTB) and return address stack (RAS).
- 64-bit instruction fetch, 32-bit data access.
- 2 x integer ALU (arithmetic, shifters and branch units).
- 1 x load store unit, 1 x out-of-pipeline divider.
- Issue and complete up to 2 independent instructions per cycle.
- Supports user, supervisor and machine mode privilege levels.
- Basic MMU support - capable of booting Linux with atomics (RV-A) SW emulation.
- Implements base ISA spec v2.1 and privileged ISA spec v1.11.
- Verified using Google's RISCV-DV random instruction sequences using cosimulation against C++ ISA model.
- Support for instruction / data cache, AXI bus interfaces or tightly coupled memories.
- Configurable number of pipeline stages, result forwarding options, and branch prediction resources.
- Synthesizable Verilog 2001, Verilator and FPGA friendly.
- Coremark: 4.1 CoreMark/MHz
- Dhrystone: 1.9 DMIPS/MHz ('legal compile options' / 337 instructions per iteration)
A sequence showing execution of 2 instructions per cycle;
- SiFive E76
- RV32IMAFC
- Dual issue in-order 8 stage pipeline
- 4 ALU units (2 early, 2 late)
- ✖️ Commercial closed source core/$$
- WD SweRV RISC-V Core EH1
- RV32IMC
- Dual issue in-order 9 stage pipeline
- 4 ALU units (2 early, 2 late)
- ✖️ System Verilog + auto signal hookup
- ✖️ No data cache option
- ✖️ Not able to boot Linux
- Boot Linux all the way to a functional userspace environment. ✔️
- Achieve competitive performance for this class of in-order machine (i.e. aim for 80% of WD SweRV CoreMark score). ✔️
- Reasonable PPA / FPGA resource friendly. ✔️
- Fit easily onto cheap hobbyist FPGAs (e.g. Xilinx Artix 7) without using all LUT resources and synthesize > 50MHz. ✔️
- Support various cache and TCM options. ✔️
- Be constructed using readable, maintainable and documented IEEE 1364-2001 Verilog. ✔️
- Simulate in open-source tools such as Verilator and Icarus Verilog. ✔️
- In later releases, add support for atomic extensions.
Booting the stock Linux 5.0.0-rc8 kernel built for RV32IMA to userspace on a Digilent Arty Artix 7 with biRISC-V (with atomic instructions emulated in the bootloader);
Based on my previous work;
To clone this project and its dependencies;
git clone --recursive https://github.com/ultraembedded/biriscv.git
To run a simple test image on the core RTL using Icarus Verilog;
# Install Icarus Verilog (Debian / Ubuntu / Linux Mint)
sudo apt-get install iverilog
# [or] Install Icarus Verilog (Redhat / Centos)
#sudo yum install iverilog
# Run a simple test image (test.elf)
cd tb/tb_core_icarus
make
The expected output is;
Starting bench
VCD info: dumpfile waveform.vcd opened for output.
Test:
1. Initialised data
2. Multiply
3. Divide
4. Shift left
5. Shift right
6. Shift right arithmetic
7. Signed comparision
8. Word access
9. Byte access
10. Comparision
Param Name | Valid Range | Description |
---|---|---|
SUPPORT_SUPER | 1/0 | Enable supervisor / user privilege levels. |
SUPPORT_MMU | 1/0 | Enable basic memory management unit. |
SUPPORT_MULDIV | 1/0 | Enable HW multiply / divide (RV-M). |
SUPPORT_DUAL_ISSUE | 1/0 | Support superscalar operation. |
SUPPORT_LOAD_BYPASS | 1/0 | Support load result bypass paths. |
SUPPORT_MUL_BYPASS | 1/0 | Support multiply result bypass paths. |
SUPPORT_REGFILE_XILINX | 1/0 | Support Xilinx optimised register file. |
SUPPORT_BRANCH_PREDICTION | 1/0 | Enable branch prediction structures. |
NUM_BTB_ENTRIES | 2 - | Number of branch target buffer entries. |
NUM_BTB_ENTRIES_W | 1 - | Set to log2(NUM_BTB_ENTRIES). |
NUM_BHT_ENTRIES | 2 - | Number of branch history table entries. |
NUM_BHT_ENTRIES_W | 1 - | Set to log2(NUM_BHT_ENTRIES_W). |
BHT_ENABLE | 1/0 | Enable branch history table based prediction. |
GSHARE_ENABLE | 1/0 | Enable GSHARE branch prediction algorithm. |
RAS_ENABLE | 1/0 | Enable return address stack prediction. |
NUM_RAS_ENTRIES | 2 - | Number of return stack addresses supported. |
NUM_RAS_ENTRIES_W | 1 - | Set to log2(NUM_RAS_ENTRIES_W). |
EXTRA_DECODE_STAGE | 1/0 | Extra decode pipe stage for improved timing. |
MEM_CACHE_ADDR_MIN | 32'h0 - 32'hffffffff | Lowest cacheable memory address. |
MEM_CACHE_ADDR_MAX | 32'h0 - 32'hffffffff | Highest cacheable memory address. |