Welcome to my GitHub repository dedicated to VLSI Physical Design for ASICs using open-source tools! Here, we embark on a journey that starts with processor specifications and leverages the power of the RISC-V ISA. We'll build processors from scratch, taking them through the entire RTL to GDS process that meets various Performance, Power, Area ( PPA ) and manufacturability requirements. The best part? We're doing it all with open-source tools, including the RISC-V toolchain OpenLane and many more.
-
Week 1: Build a RISC-V processor
-
Week 2: RTL Design using Verilog with SKY130 Technology
-
Week 3 and 4: Advance Physical Design ASIC Flow using OpenLane
-
Week 5: Efabless Carvel Tapeout
- OS: Ubuntu 20 +
- Memory: 200 GB
- RAM: 6 GB
- OS: Ubuntu 22.04.3 LTS x86_64
- Host: Nitro AN515-44 V1.04
- Kernel: 6.2.0-26-generic
- Shell: bash 5.1.16
- DE: GNOME 42.9
- Terminal: gnome-terminal
- CPU: AMD Ryzen 5 4600H with Radeon Graphics (12)
- GPU: AMD ATI 05:00.0 Renoir
- GPU: NVIDIA 01:00.0 NVIDIA Corporation TU117M
- Memory: 750 GB
Week 1:
- RISC-V GNU Toolchain: A comprehensive set of tools for compiling and building software to run on RISC-V processors.
- RISC-V ISA Simulator: A RISC-V simulator used for functional verification and testing of RISC-V code without needing actual hardware.
- RISC-V Proxy Kernel: The RISC-V Proxy Kernel, is a lightweight execution environment for running user-level applications on RISC-V processors.
- Iverilog: Iverilog is an open-source Verilog simulation and synthesis tool used for designing and verifying digital circuits written in Verilog.
Week 2:
- Yosys: Yosys is an open-source RTL synthesis tool used to convert digital designs written in HDLs like Verilog into netlists for FPGA or ASIC implementation.
- Iverilog: Iverilog is an open-source Verilog simulation and synthesis tool used for designing and verifying digital circuits written in Verilog.
Week 3 & 4:
- OpenLane: OpenLane is an open-source automated toolchain for designing ASICs from RTL to GDSII layout. It streamlines the ASIC design process by integrating various open-source tools, allowing for efficient chip development and tape-out without human intervention.
Week 5
- Caravel Efabless Github Build: Caravel is a standard SoC harness with on-chip resources to control and read/write operations from a user-dedicated space
Build toolchain and pre-requisite :
sudo apt update
chmod +x install_tools.sh
./install_tools.sh
chmod +x install_yosys.sh
./install_yosys.sh
Errors regarding tools installation are resolved in Resolve Errors Guide
Week 1: Build a RISC-V processor
DAY 1: Introduction to RISC-V ISA and GNU Compiler Toolchain
-
Apps: Application software, often referred to as "apps," performs specific tasks or functions for end-users.
-
System Software: This category acts as an intermediary between hardware components and user-facing applications. It provides essential services, manages resources, and enables application execution.
-
Operating System: The fundamental software managing hardware resources and offering services for users and applications. It controls memory, processes, files, and interfaces (e.g., Windows, macOS, Linux, Android).
-
Compiler: Translates high-level programming code( C ,C++ , java etc... ) into assembly-level language.
-
Assembler: Converts assembly language code into machine code ( 10101011100 ) for direct processor execution.
-
RTL (Register Transfer Level): Represents digital circuit behaviour using registers and data transfer operations.
-
Hardware: Physical components of a computer system or electronic device enabling various tasks.
RISC-V is an open-source Instruction Set Architecture (ISA) that has gained significant attention and adoption in computer architecture and semiconductor design. RISC architectures simplify instruction sets by focusing on a smaller set of instructions, each executable in a single clock cycle, leading to faster instruction execution.
-
R-Type: Register-type instructions, involving operations between registers. Example:
add
,and
,or
. -
I-Type: Immediate-type instructions, using immediate values for operations. Example:
addi
,ori
,lw
. -
S-Type: Store-type instructions, storing data from a register to memory. Example:
sw
,sb
. -
B-Type: Branch-type instructions, conditional branching based on comparisons. Example:
beq
,bne
,blt
. -
U-Type: Upper immediate-type instructions, used for large immediate values. Example:
lui
,auipc
. -
J-Type: Jump-type instructions, unconditional jumps within the program. Example:
jal
,jalr
.
In addition to base instructions there are more instructions that help in improving execution speed like Pseudo Instructions (li
and mv
), Multiply Extension Instructions (mul
, mulh
, mulhu
, and mulhsu
), Single and Double Precision Floating Point Extension and so on
The main objective of this lab is to compile simple C codes using gcc compiler
and run them on native hardware. Similarly, the goal is to compile the same code using riscv64-unknown-elf-gcc
, execute it on a RISC-V core within a simulator, and understand the process involved. The ultimate goal is to ensure that any high-level program written can be successfully executed on our hardware platform.
A simple c code to find the sum from 1 to N :
#include <stdio.h>
int main() {
int sum=0 , n=5;
for (int i=0;i<=n;++i)
{
sum = sum+i;
}
printf("The sum of numbers from 1 to %d is %d\n",n,sum);
return 0;
}
execution command :
gcc sum_1_n.c -o sum_1_n.o
./sum_1_n.o
output :
compile the same using RISC-V compiler and view the output
riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o sum_obj.o sum_1_n.c
spike pk sum_obj.o
Additional info :
-
-O1
: This flag sets the optimization level to low. It balances code size and execution speed while maintaining reasonable compilation times. -
-mabi=lp64
: This flag defines the ABI (Application Binary Interface) with 64-bit pointers and long integers. It's a common choice for 64-bit RISC-V systems. -
-march=rv64i
: This flag specifies the target architecture as the base integer-only RISC-V architecture for 64-bit systems. It focuses on the fundamental integer instructions.
To see the RISC-V disassembled code :
riscv64-unknown-elf-objdump -d sum_obj.o
To disassemble the object file and view its contents, use the following command:
riscv64-unknown-elf-objdump -d sum_obj.o | less
To navigate through less
use :
- Press /instance to search for a specific instance.
- Press ENTER to begin the search.
- To find the next occurrence, press n.
- To search for the previous occurrence, press N.
- To exit the less viewer, press ESC, type:q, and then press ENTER.
-O1 optimised main
Here we see that we have 15 lines of code in the mainNow let us compile the code using -Ofast
and see the line of execution
riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o sum_obj.o sum_1_n.c
-Ofast optimised main
here we can see that the code is executed in only 12 lines, which is due to the optimisation we applied
spike -d pk sum_obj.o
Unsigned numbers, also known as non-negative numbers, are numerical values that represent magnitudes without indicating direction or sign. Range : [0, (2^n)-1 ]
Signed numbers are numerical values that can represent both positive and negative magnitudes, along with zero. Range : Positive : [0 , 2^(n-1)-1] Negative : [-1 to 2^(n-1)]
let us run this C code to determine the range of integer types supported by RISC-V
#include <stdio.h>
#include <math.h>
int main() {
// Declare variables to hold the values
unsigned long long int a;
long long int b_max, b_min;
// Calculate and assign the maximum value of a 64-bit unsigned number
a = (unsigned long long int)(pow(2, 64) - 1);
// Calculate and assign the maximum value of a 64-bit signed number
b_max = (long long int)(pow(2, 63) - 1);
// Calculate and assign the minimum value of a 64-bit signed number
b_min = (long long int)(pow(2, 63) * (-1));
// Print the calculated values
printf("The max value of 64 bit unsigned number is %llu\n The max number of 64 bit signed number is %lld\n The min value of 64 bit signed number is %lld\n",a,b_max,b_min);
return 0;
}
Output of code snippet :
we can play around with different values, data to find their respect max and min values
Day 2: Introduction to ABI and Basic Verification Flow
In Day 2 of your course, you will understand the RISC-V instruction set architecture (ISA) by exploring the various fields of RISC-V instructions and their functions. This knowledge is crucial for gaining a comprehensive understanding of how RISC-V processors execute instructions and how programs are executed at the hardware level.
Operate on registers with fixed operand format. Examples: ADD, SUB, AND, OR, XOR, SLL, SRL, SRA, SLT, SLTU
Immediate operand and one register operand. Examples: ADDI, SLTI, XORI, LB, LH, LW, JALR
Store values from registers to memory. Examples: SB, SH, SW
Conditional branching based on comparisons. Examples: BEQ, BNE, BLT, BGE, BLTU, BGEU
Larger immediate field for encoding larger constants. Examples: LUI, AUIPC
Unconditional jumps and function calls. Example: JAL
- Opcode [7]: Indicates the operation type (arithmetic, logic, memory access, control flow) for the instruction, guiding the CPU's execution.
- rd (Destination Register) [5]: Represents the destination register, where the operation result will be stored after execution.
- rs1 (Source Register 1) [5]: Represents the first source register, holding the value used in the operation (typically the first operand).
- rs2 (Source Register 2) [5]: Represents the second source register, holding the value used in the operation (typically second operand).
- func7 and func3 (Function Fields) [7] [3]: Further specify opcode category and specific operation, enabling more instruction variations.
- imm (Immediate Value): Represents an embedded immediate constant within the instruction, used for offsets, constants, or data values.
In the context of computer architecture and programming, ABI stands for Application Binary Interface. It's a set of conventions and rules that dictate how different parts of a software system interact with each other at the binary level. The ABI defines details such as:
-
Calling Conventions: Specifies how function calls handle parameters and pass data, including the order of arguments, used registers, and stack frame management.
-
Register Usage: Defines how registers are allocated for passing parameters, returning values, and other purposes.
-
Data Alignment: Establishes rules for aligning data structures in memory to enhance access efficiency.
-
Stack Frame Layout: Determines how the stack is structured during function calls, managing local variable storage.
-
System Calls: Describes how applications request services from the operating system through system calls.
-
Exception Handling: Outlines how the system manages exceptions like hardware interrupts or software errors.
Data can be stored in the register by two methods :
- Directly store in registers
- Store into registers from memory
What sets RISC (Reduced Instruction Set Computer) architecture apart from CISC (Complex Instruction Set Computer) is its emphasis on simplicity and efficiency, particularly regarding memory operations.
In RISC, the load (L) and store (S) instructions play a fundamental role in memory access. They are used to efficiently transfer data between registers and memory. Additionally, arithmetic or logic operations often use register-to-register (reg-to-reg) instructions like ADD.
Consider adding two numbers from memory and storing the result back in memory:
LW R1, 0(R2) ; Load data from memory into register R1
LW R3, 4(R2) ; Load another data from memory into register R3
ADD R4, R1, R3 ; Add data in registers R1 and R3, store result in R4
SW R4, 8(R2) ; Store the result in R4 back into memory
In a little-endian system, the least significant byte (LSB) is stored at the lowest memory address, and the most significant byte (MSB) is stored at the highest memory address.
Memory Address: 0 1 2 3
Stored Value: 78 56 34 12
In a big-endian system, the most significant byte (MSB) is stored at the lowest memory address, and the least significant byte (LSB) is stored at the highest memory address.
Memory Address: 0 1 2 3
Stored Value: 12 34 56 78
This is an interesting lab where we write code along with assembly code. The C code calls the function to find the sum written in the ASM. we then display the results using c code again.
The algorithm will look like this :
c code snipet : custom_call.c
#include <stdio.h>
extern int load(int x, int y); // Declare the external "load" function
int main() {
int result = 0; // Initialize the result variable
int count = 9; // Initialize the count variable
result = load(0x0, count+1); // Call the "load" function with arguments
printf("Sum of numbers from 1 to 9 is %d\n", result); // Print the result
return 0; // Return 0 to indicate successful execution
}
ASM code snipet : load.s
.section .text # Text section where the code resides
.global load # Declare the function "load" as global
.type load, @function # Define the type of "load" as a function
load: # Start of the "load" function
# Initialize a4 with the value of a0 (copy value from a0 to a4)
add a4, a0, zero
# Copy the value of a1 to a2
add a2, a0, a1
# Initialize a3 with the value of a0 (copy value from a0 to a3)
add a3, a0, zero
loop: # Label for the loop
# Add the value in a3 to a4 (accumulate)
add a4, a3, a4
# Increment the value in a3 by 1
addi a3, a3, 1
# Compare a3 with a2 (comparison for loop termination)
blt a3, a2, loop # Branch to "loop" if a3 < a2
# Copy the accumulated value in a4 to a0 (result)
add a0, a4, zero
ret # Return from the function
-
Compilation: To compile C code and Assembly file use the command
riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o custom_call.o custom_call.c load.s
This would generate an object file custom_call.o. -
Execution: To execute the object file run the command
spike pk custom_call.o
Execution output :
Let us run our simple C code in a RISC-V CPU - PICORV-32 wirtten in verilog . Steps :
-
We convert our C program to a hex file and load it into the memory of the CPU
-
Make use of testbench to run the code
-
Display the results
The PicoRV32a design and the shell scripts are already built in a GitHub repo
cd git clone https://github.com/kunalg123/riscv_workshop_collaterals.git
Once installed navigate through the
riscv_workshop_collaterals/labs
Run the following command :chmod 777 rv32im.sh ./rv32im.sh
snap of testbench showing firmware.hex :
to make the process easy we make use of shell script: rv32im.sh
DAY 3: Digital Logic with TL-Verilog and Makerchip
Logic gates are fundamental components in digital circuits, playing a crucial role in manipulating binary information. In the binary system, 0 represents false or low voltage, while 1 represents true or high voltage.
Boolean operations form the foundation of Boolean algebra, a mathematical structure dealing with variables that can have values of true (1) or false (0).
A 2:1 multiplexer, or 2-to-1 mux, is a digital circuit that selects one of two input data lines and directs it to a single output line based on a control signal. The control signal determines which of the two input lines is transmitted to the output.
In Verilog, a hardware description language, the ternary operator can be used to implement multiplexers concisely. For example:
assign f = s ? x1 : x2;
This assigns the value of x1
to f
if s
is true; otherwise, it assigns the value of x2
.
Chaining multiple ternary operators can be done to implement more complex multiplexers:
assign f = sel[0] ? a : (sel[1] ? b : (sel[2] ? c : d));
This Verilog code selects a
if sel[0]
is true, b
if sel[1]
is true, c
if sel[2]
is true, and d
otherwise.
Makerchip is a powerful, free online environment provided by Redwood EDA for developing integrated circuits. It offers a seamless experience for coding, compiling, simulating, and debugging Verilog designs directly from your browser. This platform enables you to create digital sequential logic efficiently and quickly.
-
Go to Makerchip: - Visit http://makerchip.com/
-
Access IDE: - Click on "IDE" to enter the Integrated Development Environment.
-
Open Tutorials: - Navigate to the "Tutorials" section.
-
Load Pythagorean Example: - Select the "Validity Tutorial" and load the "Pythagorean Example."
-
Split Planes and Move Tabs: - Familiarize yourself with the interface by splitting planes and moving tabs to customize your workspace.
-
Zoom/Pan in Diagram: - Use the mouse wheel to zoom and pan within the diagram to get a closer look at the circuit.
-
Zoom Waveform:- Zoom in on the waveform by using the "Zoom In" button, allowing for a detailed examination.
-
Highlight in Diagram: - Click on
$bb_sq
in the waveform to highlight the corresponding element in the diagram.
To create an inverter using Makerchip, follow the steps below:
-
Open Examples: - Navigate to the "Examples" section under "Tutorials."
-
Load Makerchip Default Template:- Load the "Makerchip Default Template" to begin with a basic template for Verilog code.
-
Create an Inverter: - Add the following line at line 16, replacing
//..
:$out = ! $in1;
Ensure that the indentation is preserved with three spaces (no tabs).
-
Compile:- Click on the "E" menu to access the compilation options.
-
Compile the Code: - Compile the Verilog code to see the result of the inverter implementation.
- Sequential logic is sequenced by a clock signal.
- A D-FF transitions the next state to the current state on a rising clock edge.
- The circuit is constructed to enter a known state in response to a reset signal.
The Fibonacci series is a sequence where the next value is the sum of the previous two numbers. For example: 1, 1, 2, 3, 5, 8, ...
$num[31:0] = $reset ? 1 : (>>1$num + >>2$num);
In the Verilog code above, $num
represents the Fibonacci series. The value is set to 1 if there is a reset ($reset
), and otherwise, it is calculated as the sum of the two previous numbers.
A free-running counter is a simple counter that increments continuously.
$cnt[31:0] = $reset ? 0 : (1 + >>1$cnt);
In the Verilog code above, $cnt
represents the free-running counter. If there is a reset ($reset
), the counter is set to 0. Otherwise, it increments by 1 in each clock cycle.
Week 2: RTL Design and Synthesis
Day 1: Introduction to synthesis
-
Design: Design refers to the implementation of a digital circuit or system using Verilog code, or a set of Verilog codes, that is intended to fulfil specific functionality based on given specifications. It involves creating the logical structure of the circuit, including the arrangement of components, interconnections, and the overall behaviour of the system.
-
Testbench: A testbench is a specialized environment created to verify and validate the functionality of the design. It serves as a platform for applying various input stimuli to the design and observing the corresponding outputs. The testbench is responsible for generating test cases, monitoring the responses of the design, and comparing the obtained results against expected outcomes.
-
Simulator: A simulator is a software tool used to execute simulations of the Verilog design described in the code. It emulates the behaviour of the design under different scenarios by processing the input vectors provided by the testbench. The simulator models the propagation delays, logic gates, and other components defined in the Verilog code, allowing engineers to analyze how the design responds to different input conditions.
A simulator processes Verilog code, including both the design and the testbench. It continually monitors input signals for changes. When inputs change, the simulator evaluates the design's response based on the logic defined in the code. The output is updated accordingly. This process helps simulate the behaviour of the digital circuit and verify its functionality.
-
Testbench and Design: Create a testbench (stimulus environment) and a Verilog design to be tested.
-
iVerilog: Use the iVerilog simulator to process the testbench and design. It simulates the behaviour of the design based on the provided testbench inputs.
-
VCD File: The simulation generates a Value Change Dump (VCD) file. This file captures the changing values of signals over time during simulation.
-
GtkWave: Open the VCD file in GtkWave, a waveform viewer. GtkWave displays the signal waveforms over time, allowing you to visually analyze the behaviour of the design and verify its correctness.
For this lab, we will rely on the following tools:
iverilog:
This is an open-source simulator that we'll use for our simulations.
SKYWATER 130nm PDK:
This open-source Process Design Kit (PDK), generously provided by Google, serves as the foundation for our design and synthesis wor
- Begin by making a new directory using the command:
mkdir Week_2/Day_1
- Move into the newly created directory with
cd Week_2/Day_1
- Clone a specific repository into this location using:
git clone https://github.com/kunalg123/sky130RTLDesignAndSynthesisWorkshop.git
- This action will establish a directory named
sky130RTLDesignAndSynthesisWorkshop
within theWeek_2/Day_1
directory. - Inside the
sky130RTLDesignAndSynthesisWorkshop
directory, there will be two distinct folders:my_lib
: This folder houses the sky130 standard cell libraries in the liberty format, accompanied by various associated Verilog modules.verilog_files
: Within this folder, you'll find all the necessary source code and testbench components required for the lab exercises.
To get started, navigate to the verilog_files directory -> cd Week_2/Day_1/sky130RTLDesignAndSynthesisWorkshop/verilog_flies
-
Load Design and Testbench: Employ the command
iverilog good_mux.v tb_good_mux.v
to load both the design (good_mux.v) and its corresponding testbench (tb_good_mux.v). Upon successful loading, an executable named a.out will be generated. -
Generate Simulation Output: Execute the newly generated
./a.out
executable. This action will result in the creation of atb_good_mux.vcd
file. -
Visualize with GtkWave: Open GtkWave, and load the generated .vcd file (tb_good_mux.vcd). Utilize GtkWave's graphical user interface (GUI) to effectively debug and analyze the signals within the simulation.
GTKwave output :
let's have a look at how mux is designed good_mux.v
:
// Define a module named good_mux
module good_mux (input i0, input i1, input sel, output reg y);
always @ (*)
begin
if (sel)
y <= i1; // When sel is true, assign i1 to y
else
y <= i0; // When sel is false, assign i0 to y
end
endmodule
Lets look at the testbech file tb_good_mux.v
:
timescale 1ns / 1ps
// Define the testbench module
module tb_good_mux;
// Inputs
reg i0, i1, sel; // Input registers for data and select signal
// Outputs
wire y; // Output wire
// Instantiate the Unit Under Test (UUT)
good_mux uut (
.sel(sel), // Connect select signal to the UUT
.i0(i0), // Connect input 0 to the UUT
.i1(i1), // Connect input 1 to the UUT
.y(y) // Connect the output of the UUT to y
);
// Initialize simulation and dump VCD file
initial begin
$dumpfile("tb_good_mux.vcd"); // Specify the VCD file for waveform dumping
$dumpvars(0, tb_good_mux); // Dump variables for simulation
// Initialize Inputs
sel = 0; // Initialize select to 0
i0 = 0; // Initialize input 0 to 0
i1 = 0; // Initialize input 1 to 0
#300 $finish; // Finish simulation after 300 time units
end
// Generate clocking signals
always #75 sel = ~sel; // Toggle select signal every 75 time units
always #10 i0 = ~i0; // Toggle input 0 every 10 time units
always #55 i1 = ~i1; // Toggle input 1 every 55 time units
endmodule
-
Logic Synthesis: Transforming high-level circuit descriptions into optimized gate-level implementations.
-
Gate-Level Transformation: Converting abstract circuit representations into logic gate networks.
-
Optimization Techniques: Streamlining circuits by removing redundancy, minimizing gates, and optimizing fan-out.
-
Library Mapping: Using a standard cell library to select logic gates tailored to desired functions.
-
Technology Mapping: Mapping abstract logic gates to physical cells compatible with target technology.
-
Timing Analysis: Accounting for gate delays and optimizing paths to meet timing requirements.
-
Verification and Iteration: Repeating synthesis and verification stages until the design meets all goals.
-
Tool Dependence: Utilizing EDA tools for logic synthesis with algorithms and heuristics.
The Standard Cell Library is essential in logic design and synthesis:
-
Predefined Logic Gates: Contains logic gates like AND, OR, NOT, XOR, each with specific functions.
-
Characteristics: Gates have defined behaviour, delay, area, and power usage. Offers versions optimized for speed or power.
-
Compatibility: Tailored for specific technologies (CMOS, FPGA).
-
Hierarchy: Organized by complexity, from basic gates to flip-flops, and adders.
-
Formats: Available in formats like Liberty (.lib) files.
-
Customization: Supports creating custom cells for specific needs.
-
Design Impact: Choice of cells affects speed, area, and power.
Standard Cell Libraries bridge abstract designs to physical gate-level implementation, crucial for logic synthesis.
Yosys is an open-source framework for RTL (Register-Transfer Level) synthesis and optimization of digital designs. It's a command-line tool that takes Verilog (or other HDL) code as input and performs various synthesis and optimization tasks to produce a more efficient gate-level representation of the design.
Yosys can perform operations like technology mapping, constant propagation, optimization of logic structures, and much more. It's a versatile tool often used in digital design flows to generate gate-level netlists from high-level RTL descriptions.
To explore Yosys in more detail and access the Yosys manual, visit the official Yosys documentation: Yosys Manual
Yosys follows a structured flow for logic synthesis:
-
RTL Input: Begin with an RTL (Register-Transfer Level) description in HDL (Hardware Description Language) like Verilog.
-
Design Analysis: Perform design analysis to understand the structure, hierarchy, and functionality of the design.
-
HDL to Logic Gates: Yosys transforms the RTL description into a network of logic gates.
-
Technology Mapping: Map abstract logic gates to cells in the Standard Cell Library.
-
Optimization: Apply optimization techniques to reduce area, improve performance, and minimize power.
-
Timing Analysis: Analyze and optimize timing to meet specified constraints.
-
Gate-Level Netlist: Generate a gate-level netlist, representing the optimized design.
-
Output Formats: Yosys can produce output in various formats, including Verilog netlists or EDIF.
-
Verification and Testing: Verify the synthesized design's correctness through simulation and formal methods.
Yosys streamlines the process from RTL description to optimized gate-level implementation.
let us do a lab where we verify of Synthesized Netlist of good_mux.v
-
Synthesized RTL Netlist and Testbench: Provide the synthesized RTL Netlist and its corresponding testbench.
-
Simulation with Iverilog: Use Iverilog to simulate the netlist with the given testbench.
-
VCD File Generation: During simulation, a VCD (Value Change Dump) file is generated.
-
Waveform Comparison: Compare the waveform generated by the simulation to the waveform obtained from pre-synthesis.
-
GtkWave for Analysis: Use GtkWave to visually analyze the waveforms and compare them side by side.
-
Check for Match: Check if the post-synthesis waveform matches the expected pre-synthesis waveform.
This process ensures that the synthesized netlist behaves correctly, matching the intended functionality.
let's try to answer why we have so many cells in the standard cell library
In a Standard Cell Library, various types of cells, each optimized for specific design considerations, contribute to design flexibility:
-
High-Density Cells: Optimized for compact layouts, allowing more cells in a given area. Typically have slower operating speeds and lower power consumption.
-
High-Speed Cells: Designed to operate at faster speeds. May consume more power and have larger layouts due to increased complexity.
-
Power Efficient Cells: Prioritize low power consumption over high-speed operation. May have longer propagation delays to reduce power usage.
-
Mixed-Type Cells: Combine characteristics of high-speed and low-power cells. Useful when designs require a balance between speed and energy efficiency.
-
Temperature and Voltage Variants: Libraries might offer cells optimized for specific temperature ranges or voltage levels.
-
Complex Cells: Include more complex functionality like multiplexers, adders, and memory elements.
-
Inverter Variants: Inverters designed for different driving strengths or noise tolerances.
-
Different Fan-out Cells: Cells optimized for driving varying numbers of fan-out loads.
These diverse cell types cater to different design goals, enabling designers to make informed choices based on performance, area, and power requirements.
Steps to Realize good_mux Design using Yosys
To synthesize the good_mux
design using the sky130_fd_sc_hd__tt_025C_1v80.lib
library:
- Go to Directory: Navigate to the
verilog_files
directory.
-
Invoke Yosys: Start Yosys using the command
yosys
. -
Read Library: Load the library using
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
.
- Read Design: Read the
good_mux.v
design usingread_verilog good_mux.v
.
- Synthesis: Perform synthesis on the
good_mux
design usingsynth -top good_mux
.
- Generate Netlist: Generate a netlist using ABC logic synthesis with
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
.
- Show Realized Logic: Visualize the realized logic using
show
.
-
Write Netlist: Save the synthesized netlist using
write_verilog -noattr good_mux_netlist.v
. -
Edit Netlist: Open the netlist in a text editor with
!nvim good_mux_netlist.v
.
These steps transform the good_mux
design into logic gates from the sky130_fd_sc_hd__tt_025C_1v80.lib
library, using Yosys for logic synthesis.
Day 2: Timing libs, hierarchical vs. flat synthesis and efficient flop coding styles
A Liberty file, often denoted as ".lib" in VLSI (Very Large Scale Integration) design, is a critical technical resource. It encapsulates precise timing and power characteristics of standard cells within a semiconductor library. These characteristics include essential information such as cell delay, setup and hold times, power consumption, and more. The Liberty file is indispensable for accurate and efficient digital circuit design, enabling designers to analyze and optimize their circuits for performance, power efficiency, and timing accuracy.
Generalized naming format for VLSI
The naming convention for VLSI libraries typically follows the structure below:
<Foundry/Technology>_<LibraryCategory>_<LibraryName>_<LibraryVariant>_<Temperature>_<SupplyVoltage>.lib
-
<Foundry/Technology>
: Denotes the semiconductor foundry or technology process used for the library. -
<LibraryCategory>
: Signifies the category of the library, such as "fd" for fundamental or standard cell libraries. -
<LibraryName>
: Indicates the specific name of the library within the category, housing various standard cell designs. -
<LibraryVariant>
: Denotes the library variant or version, often reflecting specific characteristics or features. -
<Temperature>
: Represents the temperature at which the library is characterized, typically in degrees Celsius. -
<SupplyVoltage>
: Specifies the supply voltage at which the library is characterized, often in volts.
Using this generalized format, you can create consistent and informative library names that convey essential details about the library's characteristics and conditions of use in VLSI design.
The naming convention "sky130_fd_sc_hd__tt_025C_1v80.lib" that we are making use of can be broken down as follows:
sky130
: Denotes the technology or foundry.fd
: Signifies the library category. ( fd- foundation )sc
: Indicates the specific library name. ( sc-standard cell )hd
: Represents the library variant or version. ( high density )tt_025C
: Refers to the temperature (e.g., typical temperature 25°C).1v80
: Specifies the supply voltage (e.g., 1.80 volts).
Hierarchical synthesis is a design approach that involves breaking down a complex design into logical modules or blocks and synthesizing each module separately. Each module can have its own hierarchy and communicate with other modules through well-defined interfaces. This approach offers several advantages:
-
Enhanced Reusability: Individual modules can be designed and tested independently, making it easier to reuse them in other designs. This can save time and effort in future projects.
-
Improved Maintainability: Hierarchical synthesis promotes a clean and organized design structure. Debugging and making changes to specific modules are more manageable because they are isolated from the rest of the design.
-
Scalability: It is well-suited for large and complex designs as the hierarchy allows for a structured approach to managing complexity.
Follow these steps for hierarchical synthesis using Yosys:
-
Navigate to the
verilog_files
directory. -
Invoke Yosys using the command
yosys
. -
Once Yosys is running, enter the following sequence of commands:
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib read_verilog multiple_modules.v synth -top multiple_modules abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib show multiple_modules write_verilog -noattr multiple_modules_hier.v !nvim multiple_modules_hier.v
RTL of Hierarchical Synthesis
Flat synthesis is an alternative design approach where the entire design is synthesized as a single, monolithic entity. In this approach, all modules, submodules, and logic are flattened into a single level of hierarchy. This method is best suited for simpler designs where complexity is low, and maintainability is not a significant concern.
-
Simplicity: Flat synthesis is straightforward and may be appropriate for small, uncomplicated designs where hierarchy introduces unnecessary complexity.
-
Predictability: There is no hierarchy to manage, which can make it easier to predict how the design will behave.
To perform flat synthesis using Yosys, follow these steps:
-
Navigate to the
verilog_files
directory. -
Invoke Yosys using the command
yosys
. -
Once Yosys is running, enter the following sequence of commands:
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog multiple_modules.v
synth -top multiple_modules
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
flatten # This step is crucial because it flattens the design, removing any hierarchy and combining all modules into a single level. This is the key step in achieving a flat synthesis.
show
write_verilog -noattr multiple_modules_flat.v
!nvim multiple_modules_flat.v
Logic Gate Synthesis
- Utilization of NAND Gates: During the synthesis process of logic gates like OR and AND gates, it's common for the synthesis tool to favour the use of NAND gates. This preference stems from the fact that OR gates, in particular, often employ stacked PMOS transistors. These stacked PMOS transistors have lower electron mobility, necessitating larger aspect ratios to effectively drive logic levels.
Submodule-Level Synthesis
Submodule-level synthesis offers several advantages:
-
Reduced Synthesis Time: Submodule-level synthesis can significantly reduce synthesis time, especially in the context of large and complex designs.
-
Reuse of Submodules: When a specific submodule is called multiple times within a design, a time-saving strategy involves synthesizing it just once and then reusing it by integrating it into the main or top-level module.
-
Efficient Optimization: Submodules are often optimized more efficiently during synthesis compared to optimizing the entire top-level design. This optimization leads to improved overall design performance.
What is the need for Flip-Flops in designs? Flops are essential in digital circuits to mitigate the cumulative effects of glitches that can occur due to propagation delays in combinational logic. When multiple combinational blocks are interconnected, these glitches can accumulate and lead to erroneous states. Flops act as a buffer, storing the final stable value and eliminating any glitches before passing it to the next block.
We'll synthesize and explore the behaviour of different flip-flops in the following sections:
-
Asynchronous Reset Flip-Flop Files:
asyncres.v
(Design) andasyncres_tb.v
(Testbench) -
Asynchronous Set Flip-Flop Files:
asyncset.v
(Design) andasyncset_tb.v
(Testbench) -
Synchronous and Asynchronous Reset Flip-Flop Files:
sync_async_res.v
(Design) andsync_async_res_tb.v
(Testbench)
all these files are present under the week_2/day_2 section.
Here are the steps to synthesize flops in a digital design using Yosys and view the waveform using GtkWave:
-
Prepare the Design Files: Ensure you have the necessary design files, including your Verilog design (
dff.v
) and a testbench file (dff_tb.v
) for simulation. -
Synthesize Flops:
- Begin by invoking Yosys:
yosys
- Inside Yosys, follow these commands:
# Read the Liberty library file read_liberty -lib <PATH_TO_.lib_FILE>/sky130_fd_sc_hd__tt_025C_1v80.lib # Read the Verilog design file read_verilog dff.v # Specify the top module for synthesis synth -top dff # Map flip-flops to library cells dfflibmap -liberty <PATH_TO_.lib_FILE>/sky130_fd_sc_hd__tt_025C_1v80.lib # Perform technology mapping abc -liberty <PATH_TO_.lib_FILE>/sky130_fd_sc_hd__tt_025C_1v80.lib # Write the synthesized Verilog file write_verilog -noattr dff_mapped.v # Display the design in Yosys (optional) show
- Begin by invoking Yosys:
-
Simulate the Design:
- Use
iverilog
to compile your Verilog files and create an executable:iverilog dff.v dff_tb.v -o dff.out
- Run the simulation:
./dff.out
- Use
-
View the Waveform:
- Use GtkWave to view the simulation waveform:
gtkwave dff_tb.vcd
- Use GtkWave to view the simulation waveform:
These steps will guide us through the process of synthesizing flops, simulating the design, and visualizing the waveform for verification.
- Activating the asynchronous reset ('1') forces the stored value to '0'.
- On the positive clock edge, the stored value updates with the data input.
- Activating the asynchronous set input ('1') forces the stored value to '1'.
- On the positive clock edge, the stored value updates with the data input.
- Combines both asynchronous and synchronous reset features.
- Asynchronous reset ('1') immediately sets the stored value to '0'.
- Synchronous reset ('1') at the positive clock edge also sets the stored value to '0'.
- On the positive clock edge, the stored value updates with the data input.
In this section, we'll delve into the concept of optimization, exploring its role in enhancing overall design performance and achieving better Power, Performance, and Area (PPA) metrics. Our primary focus will be on identifying optimization opportunities through simple examples. On Day 3, we will delve deeper into optimization principles and engage in hands-on labs.
1.mult_2.v This is a simple design that multiples 2 to Input A and assigns it to the output Y.
module mul2 (input [2:0] a, output [3:0] y);
assign y = a * 2;
endmodule
Synthesis steps
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog mult_2.v
synth -top mul2
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
write_verilog -noattr mul2_netlist.v
!nvim mul2_netlist.v
Output of synthesis after optimization and its netlist :
Multiplying a number by 2 involves a right shift operation, which means adding a "0" bit at the end of the number. This optimization simplifies the process by directly appending a "0" instead of using a dedicated multiplier circuit.
2. mult_8.v This is a simple design that multiples 9 to input A and assigns it to Y.
module mult8 (input [2:0] a , output [5:0] y);
assign y = a * 9;
endmodule
Synthesis steps
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
read_verilog mult_2.v
synth -top mult8
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
show
write_verilog -noattr mult8_netlist.v
!gvim mult8_netlist.v
Output of synthesis after optimization and its netlist :
The mul8
operation is essentially a multiplication by (8+1), which can be achieved by appending three zeroes at the end of 'a' and adding 'a' to itself. In this process, a dedicated multiplier is not inferred, and only three bits are added.
Day 3: Combinational and Sequential Optimizations:
Optimization is crucial for achieving optimal performance, resource utilization, and power efficiency in digital circuits.
Why Optimization Matters
- Performance: Optimization enhances circuit performance, reducing latency and improving throughput.
- Area: Efficient designs occupy less physical space, reducing chip size and costs.
- Power: Optimized circuits consume less power, prolonging battery life and reducing heat generation.
Combinational Logic Optimization
- Constant Propagation: Substituting variables with constant values for faster execution.
- Boolean Logic Optimization: Simplifying logic expressions to reduce gate count and improve efficiency.
Sequential Logic Optimization
- Sequential Constant Propagation: Propagating constant values through sequential elements.
- State Optimization: Minimizing the number of states in finite state machines.
- Retiming: Reordering registers to meet timing constraints and enhance performance.
- Sequential Logic Cloning: Duplicating logic elements to optimize specific conditions and operations.
In this section we synthesise a few combinational designs and see how optimization takes place
Synthesis steps followed for all the design
# Read the Liberty library file
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
# Read the Verilog design file
read_verilog <design>.v
# Specify the top module for synthesis
synth -top <design_name>
# Perform combinational logic optimization
opt_clean -purge # Use this command to optimize the combinational logic before linking to ABC
# Link to ABC for technology mapping
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
# Display the design in Yosys
show
Design 1: opt_check.v
module opt_check (input a , input b , output y);
assign y = a?b:0;
endmodule
Rather than inferring a MUX we get an AND gate as ouput always assigned value of B or its zero rest of the time.
Design 2: opt_check2.v
module opt_check2 (input a , input b , output y);
assign y = a?1:b;
endmodule
Rather than a MUX we have a OR gate that is inferred . As the output is '1' ie A if A is 1. and B if A=0.
Design 4: opt_check3.v
module opt_check3 (input a , input b, input c , output y);
assign y = a?(c?b:0):0;
endmodule
Rather than a 4:1 MUX we have a 3 Input AND gate that is inferred. Because the output depends on all the 3 inputs ( if ternary operator choose 1st operation in all the case )
Design 4: multiple_module_opt.v
module sub_module1(input a , input b , output y);
assign y = a & b;
endmodule
module sub_module2(input a , input b , output y);
assign y = a^b;
endmodule
module multiple_module_opt(input a , input b , input c , input d , output y);
wire n1,n2,n3;
sub_module1 U1 (.a(a) , .b(1'b1) , .y(n1));
sub_module2 U2 (.a(n1), .b(1'b0) , .y(n2));
sub_module2 U3 (.a(b), .b(d) , .y(n3));
assign y = c | (b & n1);
endmodule
Synthesize the Design
# Read the Liberty library file
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
# Read the Verilog design file
read_verilog <design>.v
# Specify the top module for synthesis
synth -top <design_name>
# Map flip-flops to library cells
dfflibmap -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
# Perform technology mapping
abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib
# Display the design in Yosys
show
Simulate the Design
# Compile Verilog files
iverilog <design>.v <tb_design>.v -o <design>.out
# Run the simulation
./<design>.out
# View the waveform using GTKWAVE
gtkwave <tb_design>.vcd
Make sure to replace , <design_name>, and <tb_design> with the appropriate file names and module names as needed for your specific design and testbench.
Design 1: dff_const1.v
module dff_const1(input clk, input reset, output reg q);
always @(posedge clk, posedge reset)
begin
if(reset)
q <= 1'b0;
else
q <= 1'b1;
end
endmodule
Design 2: dff_const2.v
module dff_const2(input clk, input reset, output reg q);
always @(posedge clk, posedge reset)
begin
if(reset)
q <= 1'b1;
else
q <= 1'b1;
end
endmodule
Design 3: dff_const3.v
module dff_const3(input clk, input reset, output reg q);
reg q1;
always @(posedge clk, posedge reset)
begin
if(reset)
begin
q <= 1'b1;
q1 <= 1'b0;
end
else
begin
q1 <= 1'b1;
q <= q1;
end
end
endmodule
Design 1: counter_opt.v
module counter_opt (input clk , input reset , output q);
reg [2:0] count;
assign q = count[0];
always @(posedge clk ,posedge reset)
begin
if(reset)
count <= 3'b000;
else
count <= count + 1;
end
endmodule
Design 2: counter_opt2.v
module counter_opt (input clk , input reset , output q);
reg [2:0] count;
assign q = (count[2:0] == 3'b100);
always @(posedge clk ,posedge reset)
begin
if(reset)
count <= 3'b000;
else
count <= count + 1;
end
endmodule
Day 4: Gate Level simulation
Gate-level simulation is a crucial method in electronics design for verifying digital circuits at the level of individual logic gates and flip-flops. It offers several key benefits:
- Functionality Check: It allows for comprehensive functionality testing.
- Timing Verification: Ensures that timing requirements are met.
- Power Consumption Analysis: Assesses power consumption.
- Test Pattern Generation: Generates test patterns for integrated circuits.
This simulation operates at a lower abstraction level than higher-level simulations, making it essential for debugging and ensuring circuit correctness.
Usage
Gate-level simulation is typically used for post-synthesis verification to ensure that the design meets functionality and timing requirements. The required inputs include:
- Testbench: A testbench for the design.
- Synthesized Netlist: The netlist of the synthesized design.
- Gate-Level Verilog Models: Verilog models of the individual gates used in the design.
In cases where there's a discrepancy in simulation results for the post-synthesis netlist, it's referred to as a "synthesis simulation mismatch."
These steps outline the process of gate-level simulation, a critical phase in the verification and validation of digital circuit designs.
-
Write RTL Code: Begin by creating RTL (Register-Transfer Level) code to describe the digital circuit. Verify its functionality using a testbench.
iverilog <design>.v <tb_desgin>.v ./a.out gtkwave <tb_design>.vcd
-
Synthesize RTL: Perform RTL synthesis to convert the high-level RTL code into a gate-level netlist.
read_liberty -lib ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib read_verilog <design>.v synth -top blocking_caveat abc -liberty ../lib/sky130_fd_sc_hd__tt_025C_1v80.lib write_verilog -noattr <design_netlist>.v show
-
Compile and Simulate: Compile the gate-level netlist and simulate it using the same testbench that was used for RTL verification.
iverilog ../my_lib/verilog_model/primitives.v ../my_lib/verilog_model/sky130_fd_sc_hd.v <design_netlist>.v <desing_tb>.v ./a.out gtkwave <desing_tb>.vcd
-
Timing Analysis (If Necessary): If required, conduct timing analysis to ensure that the design meets timing constraints. Additionally, verify that the functionality matches expectations.
Synthesis-Simulation Mismatch
- Definition: Differences between a digital circuit's behaviour in RTL-level simulation and its behaviour post gate-level synthesis.
- Causes: Optimization, clock domain issues, library discrepancies, etc.
- Resolution: Ensure consistent tool versions, verify synthesis settings, debug with simulation tools, and follow best RTL coding practices.
- Importance: Crucial for reliable hardware implementation.
Blocking vs. Non-Blocking Statements Blocking Statements
- Execution: Sequentially, in the order they appear.
- Usage: Describe combinational logic, with execution order significance.
- Example:
a = b + c; // Waits for 'b' and 'c' before calculating 'a'
Non-Blocking Statements
- Execution: Concurrently, within procedural blocks.
- Usage: Model synchronous digital circuits, with parallel execution.
- Example:
always @(posedge clk)
begin
b <= a; // Concurrently scheduled assignment
c <= b; // Concurrently scheduled assignment
end
Caveats with Blocking Statements
- Sequential Execution: Blocking statements execute sequentially, potentially misrepresenting concurrent hardware behaviour.
- Order Dependency: The order of blocking statements can impact results, leading to race conditions.
- Combinational Logic: Primarily used for combinational logic modelling.
- Testbench Usage: Excessive use in testbenches can lead to simulation race conditions.
- Initialization Issues: Order-dependent initialization with blocking assignments can yield unexpected results.
- Mitigation: Use non-blocking statements for sequential logic modelling, employ good coding practices to minimize order dependencies, and enhance code clarity.
Design 1: ternary_operator_mux.v
module ternary_operator_mux (input i0 , input i1 , input sel , output y);
assign y = sel?i1:i0;
endmodule
Design 2:bad_mux.v
module bad_mux (input i0 , input i1 , input sel , output reg y);
always @ (sel)
begin
if(sel)
y <= i1;
else
y <= i0;
end
endmodule
Design 3: blocking_caveat.v
module blocking_caveat (input a , input b , input c, output reg d);
reg x;
always @ (*)
begin
d = x & c;
x = a | b;
end
endmodule