Zfinx
is an extension which changes all existing and future floating point extensions which use the F
floating point registers, so that instead they use the X
registers. Hence the name F-in-X
. This does not affect floating point instructions which are implemented as part of the Vector (v
) extension. Zfinx
additionally removes all
-
floating point load instructions (e.g.
FLW
) -
floating point store instructions (e.g.
FSW
) -
integer to/from floating point register move instructions (e.g.
FMV.X.W
)
In all cases the integer versions of these are required instead.
On a Zfinx
core the assembler syntax of floating point instructions changes so that they only refer to X
registers. Therefore on an RV32F core this is legal syntax:
FLW fa4, 12(sp) //load floating point data
FMADD.S fa1, fa2, fa3, fa4 //floating point arithmetic for RV32F
On a Zfinx
core, this syntax must be used as the F
registers are not implemented, and FLW
is not a supported instruction.
LW a4, 12(sp) //load integer data
FMADD.S a1, a2, a3, a4 //floating point arithmetic for RV32F Zfinx
Note that only the assembler syntax differs between the two FMADD.S
instructions, the encoding is the same.
The assembler syntax changes to avoid code-porting bugs, so that the registers must be updated and not just reused from non-Zfinx
code
Zfinx
may be used with any extensions which uses F
registers. The number of integer registers does not affect Zfinx
(I
or E
extensions)
although the relative sizes of XLEN
and FLEN
do affect the specification.
This specification uses D for 64-bit floating point, F for 32-bit floating point and Zfh for 16-bit floating point. Zfinx
behviour is only affected by the data width so future
formats are implicitly supported, e.g. 64 or 32-bit POSIT formats.
Zfinx
configurations
Architecture | Comment |
---|---|
RV32IFD Zfinx |
XLEN<FLEN |
RV32IFD Zfh Zfinx |
XLEN<FLEN |
RV32F Zfinx |
XLEN==FLEN |
RV32F Zfh Zfinx |
XLEN==FLEN |
RV64FD Zfinx |
XLEN==FLEN |
RV64FD Zfh Zfinx |
XLEN==FLEN |
RV64F Zfinx |
XLEN>FLEN |
RV64F Zfh Zfinx |
XLEN>FLEN |
Note that RV32FD [Zfh] Zfinx
requires register pairs so is more complex than the other cases.
RV128 and the Q
extension are not covered by this specification, but it is simple to extend this specification to include them.
The NaN-boxing behaviour of floating point arithmetic instructions is modified to suppress checking of sources only. Floating point results are always NaN-boxed to XLEN
bits.
NaN-boxing checking is removed as integer loads do not NaN-box their result, and so loading fewer than XLEN
bits (for example using LW
to load floating point data on an RV64 core) would otherwise require NaN-boxing in software which wastes performance and code-size
There are no other semantic differences for floating point instruction behaviour between a Zfinx
and a non-Zfinx
core, but there are some differences for special cases (such as x0
handling) as listed later in this specification.
If Zfinx
is specified then the compiler will have the following #define set
__riscv_zfinx
So software can use this to choose between Zfinx
or normal versions of floating point code.
Privileged code can detect whether Zfinx
is implemented by checking if:
-
mstatus.FS
is hardwired to zero, and -
misa.F
is 1 at reset, or is writeable
Non-privileged code can detect whether Zfinx
is implemented as follows.
li a0, 0 # set a0 to zero
#ifdef __riscv_zfinx
fneg.s a0, a0 # this will invert a0
#else
fneg.s fa0, fa0 # this will invert fa0
#endif
If a0 is non-zero then it’s a Zfinx
core, otherwise it’s a non-Zfinx core. Both branches result in the same encoding, but the assembly syntax is different for each variant
For Zfinx
cores mstatus.fs
is hardwired to zero, because all the integer registers already form part of the current context. Note however that fcsr
needs to be saved and restored. This gives a performance advantage when saving/restoring contexts.
Floating point instructions and fcsr
accesses do not trap if mstatus.fs
=0. This is different to non-Zfinx
cores.
For RV32D
, all D-extension instructions which are implemented with Zfinx
will access register pairs:
-
The specified register must be even, odd registers will cause an illegal instruction exception
-
Even registers will cause an even/odd pair to be accessed
-
Accessing Xn will cause the {Xn+1, Xn} pair to be accessed. For example if n = 2
-
X2 is the least significant half (bits [31:0]) for little endian mode
-
X3 the most significant half (bits [63:32]) for little endian mode
-
-
For big endian mode the register mapping is reversed, so X2 is the most significant half, and X3 is the least significant half.
-
-
X0 has special handling
-
Reading {X1, X0} will read all zeros
-
Writing {X1, X0} will discard the entire result, it will not write to X1
-
The register pairs are only used by the floating point arithmetic instructions. All integer loads and stores will only access XLEN
bits, not FLEN
.
Note:
-
Zp64 from the P-extension specifies consistent register pair handling.
-
Big endian mode is enabled in M-mode if
mstatus.MBE
=1, in S-mode ifmstatus.SBE
=1, or in U-mode ifmstatus.UBE
=1
If a floating point instruction targets x0 then it will still execute, and will set any required flags in fcsr
. It will not write to a target register. This matches the non-Zfinx
behaviour for
fcvt.w.s x0, f0
If the floating point source is invalid then it will set the fflags.NV
bit, regardless of whether Zfinx
is implemented. The target register is not written as it is x0.
If fcsr.RM
is in an illegal state then floating point instruction behaviour is the same whether the target register is x0 is not, i.e. targetting x0 doesn’t disable any execution side effects.
In the case of RV32D Zfinx
, register pairs are used. See above for x0 handling.
For Zfinx
the NaN-boxing is limited to XLEN
bits, not FLEN
bits. Therefore a FADD.S
executed on an RV64D
core will write a 64-bit value (the MSH will be all 1’s). On an RV32D Zfinx
core it will write a 32-bit register, i.e. a single X register only. This means there is semantic difference between these code sequences:
#ifdef __riscv_zfinx
fadd.s x2, x3, x4 # only write x2 (32-bits), x3 is not written
#else
fadd.s f2, f3, f4 # NaN-box 64-bit f2 register to 64-bits
#endif
NaN-box generation is supported by Zfinx
implementations. NaN-box checking is not supported by scalar floating point instructions. For example for RV64F
:
#ifdef __riscv_zfinx
lw[u] x1, 0(sp) # load 32-bits into x1 and sign / zero extend upper 32-bits
fadd.s x1, x1, x1 # use x1 but do not check source is Nan-boxed, NaN-box output
#else
flw.s f1, 0(sp) # load 32-bits into f1 and NaN-box to 64-bits (set upper 32-bits to 0xFFFFFFFF)
fadd.s f2, f1, f1 # check f1 is NaN-boxed, NaN-box output
#endif
Floating point loads are not supported on Zfinx
cores so x1 is not NaN-boxed in the example above, therefore the FADD.S
instruction does not check the input for NaN-boxing.
The result of FADD.S
is NaN-boxed, which means setting the upper half of the output register to all 1’s.
The table shows the effect of writing each possible width of value to the register file for all supported combinations. Note that Verilog syntax is used in the final column.
XLEN | Width of write to Xreg from FP instruction | Value written to Xreg |
---|---|---|
64 |
16 |
{48{1’b1}, result[15:0]} |
32 |
16 |
{16{1’b1}, result[15:0]} |
64 |
32 |
{32{1’b1}, result[31:0]} |
32 |
32 |
result[31:0] |
64 |
64 |
result[63:0] |
Little endian |
||
32 |
64 |
EvenXreg: result[31:0] Odd Xreg: result[63:32] special handling Xreg={0, 1} |
Big endian |
||
32 |
64 |
Odd Xreg: result[31:0] EvenXreg: result[63:32] special handling Xreg={0, 1} |
Therefore, for example, if an FADD.S
instruction is issued on an RV64F
core then the upper 32-bits will be set to one in the target integer register, or an FADD.H
(floating point add half-word) instruction will set the upper 48-bits to one.
Any references to F
registers, or removed instructions will cause assembler errors.
For example, the encoding for
FMADD.S <1>, <2>, <3>, <4>
will disassemble and execute as
FMADD.S f1, f2, f3, f4
on a non-Zfinx
core, or
FMADD.S x1, x2, x3, x4
on a Zfinx
core.
We considered allowing pseudo-instructions for the deleted instructions for easier code porting. For example allowing FLW to be a pseudo-instruction for LW, but decided not to. Because the register specifiers must change to integer registers, it makes sense to also remove the use of FLW etc. In this way the user is forced to rewrite their code for a Zfinx
core, reducing the chance of undiscovered porting bugs. This only affects assembly code, high level language code is unaffected as the compiler will target the correct architecture.
All floating point loads, stores and floating point to integer moves are removed on a Zfinx
core. The following three tables give suggested replacements.
Instruction | RV32F Zfh Zfinx | RV32D Zfh Zfinx | RV64F Zfh Zfinx | RV64D Zfh Zfinx | RV32F Zfinx | RV32D Zfinx | RV64F Zfinx | RV64D Zfinx |
---|---|---|---|---|---|---|---|---|
loads |
suggested replacement instructions |
|||||||
FLD frd, offset(xrs1) |
reserved |
LW,LW |
LD |
reserved |
LW, LW |
LD |
||
FLW frd, offset(xrs1) |
LW |
LW[U] and NaN-box in software |
LW |
LW[U] and NaN-box in software |
||||
FLH frd, offset(xrs1) |
LH[U] and NaN-box in software |
reserved |
||||||
C.FLD frd’, offset(xrs1’) |
reserved |
[C.]LW,[C.]LW |
[C.]LD |
reserved |
[C.]LW,[C.]LW |
[C.]LD |
||
C.FLDSP frd, uimm(x2) |
reserved |
C.LWSP,C.LWSP |
C.LDSP |
reserved |
C.LWSP,C.LWSP |
C.LDSP |
||
C.FLW frd, offset(xrs1) |
C.LW |
C.LW and NaN-box in software |
C.LW |
C.LW and NaN-box in software |
||||
C.FLWSP frd, uimm(x2) |
C.LWSP |
C.LWSP and NaN-box in software |
C.LWSP |
C.LWSP and NaN-box in software |
Instruction | RV32F Zfh Zfinx | RV32D Zfh Zfinx | RV64F Zfh Zfinx | RV64D Zfh Zfinx | RV32F Zfinx | RV32D Zfinx | RV64F Zfinx | RV64D Zfinx |
---|---|---|---|---|---|---|---|---|
stores |
suggested replacement instructions |
|||||||
FSD frd, offset(xrs1) |
reserved |
SW,SW |
SD |
reserved |
SW, SW |
SD |
||
FSW frd, offset(xrs1) |
SW |
|||||||
FSH frd, offset(xrs1) |
SH |
reserved |
||||||
C.FSD frd’, offset(xrs1’) |
reserved |
[C.]SW,[C.]SW |
[C.]SD |
reserved |
[C.]SW,[C.]SW |
[C.]SD |
||
C.FSDSP frd, uimm(x2) |
reserved |
C.SWSP,C.SWSP |
C.SDSP |
reserved |
C.SWSP,C.SWSP |
C.SDSP |
||
C.FSW frd, offset(xrs1) |
C.SW |
|||||||
C.FSWSP frd, uimm(x2) |
C.SWSP |
Instruction | RV32F Zfh Zfinx | RV32D Zfh Zfinx | RV64F Zfh Zfinx | RV64D Zfh Zfinx | RV32F Zfinx | RV32D Zfinx | RV64F Zfinx | RV64D Zfinx |
---|---|---|---|---|---|---|---|---|
moves |
suggested replacement instructions |
|||||||
FMV.X.D xrd, frs1 |
reserved |
MV,MV |
reserved |
MV |
reserved |
MV,MV |
reserved |
MV |
FMV.D.X frd, xrs1 |
reserved |
MV,MV |
reserved |
MV |
reserved |
MV,MV |
reserved |
MV |
FMV.X.W xrd, frs1 |
MV |
MV and sign extend in software |
MV |
MV and sign extend in software |
||||
FMV.W.X frd, xrs1 |
MV |
MV and NaN-box in software |
MV |
MV and NaN-box in software |
||||
FMV.X.H xrd, frs1 |
MV and sign extend in software |
reserved |
||||||
FMV.H.X frd, xrs1 |
MV and NaN-box in software |
reserved |
Notes:
-
Where a floating point load loads fewer than
XLEN
bits then software NaN-boxing in software is required to get the same semantics as a non-Zfinx
core -
Where a floating point move moves fewer than
XLEN
bits then either sign extension (if the target is anX
register) or NaN-boxing (if the target is anF
register) is required in software to get the same semantics
The B-extension is useful for sign extending and NaN-boxing.
To sign-extend using the B-extension:
FMV.X.H rd, rs1
is replaced by
SEXT.H rd, rs1
Without the B-extension two instructions are required: shift left 16 places, then arithmetic shift right 16 places.
NaN boxing in software is more involved, as the upper part of the register must be set to 1. The B-extension is also helpful in this case.
FMV.H.X a0, a1
is replaced by
C.ADDI a2, zero, -1
PACK a0, a1, a2
A non-Zfinx
core can run a Zfinx
binary. M-mode software can do this:
-
Set
mstatus.fs
=0 to cause every floating point instruction to trap -
When a floating point instruction traps, move the source operands from the X registers to the equivalent F registers (i.e. the same register numbers)
-
Set
mstatus.fs
to be non-zero -
Execute the original instruction which caused the trap
-
Move the result from the destination
F
register to theX
register /X
register pair (ForRV32D
) -
Set
mstatus.fs
=0 -
MRET
There are corner cases around the use of x0 and register pairs for RV32D
-
Two 32-bit
X
registers must be transferred to a single 64-bit F register to set up the source operands. This must be done by saving eachX
register to consecutive memory locations, and using a 64-bit floating point load (FLD
orC.FLD
) to load the data -
One 64-bit F register must be transferred to two 32-bit
X
registers to receive the result. This must be done with a 64-bit floating point store (FSD
orC.FSD
) and then two 32-bit loads (such asLW
orC.LW
). -
If the source register pair is {x1,x0}, the source data will read as all zeroes. Therefore f0 must be loaded with a 64-bit zero constant from memory.
-
If the destination register pair is {x1,x0} then the full output is discarded, do not transfer the resulting data to the {x1,x0} register pair which would result in the upper half being written to x1
A Zfinx
core cannot trap on floating point instructions by setting mstatus.fs
=0, so the reverse emulation isn’t possible. The code must be recompiled (or ported for assembler).
For details of the current calling conventions see:
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md
C
The ABI when using Zfinx
is the standard integer calling convention as listed in the table below.
The Zfinx
ABI can be thought of as being similar to using the softfloat routines to execute floating point functionality, but replacing the call to the softfloat function with the actual floating point ISA instruction.
Note that RV32D
Zfinx
requires register pair handling. This does not require an ABI change as long types are already supported using register pairs. It is likely to require some work in the compiler (according to Jim Wilson).
To reduce the area overhead of FPU hardware new configurations will make the F[N]MADD.*, F[N]MSUB.*
and FDIV.*, FSQRT.*`
instructions optional in hardware. This then gives the choice of implementing them in software instead by:
-
Taking an illegal instruction trap, and calling the required software routine in the trap handler. This requires that the opcodes are not reallocated and gives binary compatibility between cores with/without hardware support for
F[N]MADD.*, F[N]MSUB.*
andFDIV.*, FSQRT.*
, but is lower performance than option 2 -
Use the GCC options below so that a software library is used to execute them
This argument already exists for RISCV
gcc -mno-fdiv
This argument exists for other architectures (e.g. MIPs) but not for RISCV, so it needs to be added
gcc -mno-fused-madd
To achieve this we break all current and future floating point extensions into three parts: Zf*base
, Zfma
and Zfdiv
. Zfinx
is orthogonal, and so is an additional modifier to these as described below.
Options, all start with Zf | Meaning |
---|---|
Zfhbase |
Support half precision base instructions |
Zffbase |
Support single precision base instructions |
Zfdbase |
Support double precision base instructions |
Zfqbase |
Support quad precision base instructions |
Zfldstmv |
Support load,store and integer to/from FP move for all FP extensions |
Zfma |
Support multiply-add for all FP extensions |
Zfdiv |
Support div/sqrt for all FP extensions |
Zfinx |
Share the integer register file for all FP extensions |
So the Zfldstmv
, Zfma
, Zfdiv
, Zfinx
options apply to all floating point extensions, including future ones. This keeps the support regular across the different options.
Therefore RV32FD Zfh Zfinx
can also be expressed as:
rv32_Zfhbase_Zffbase_Zfdbase_Zfma_Zfdiv_Zfinx
Also RV32FD Zfh
can be expressed as:
rv32_Zfhbase_Zffbase_Zfdbase_Zfldstmv_Zfma_Zfdiv
The options are designed to be additive, none of them remove instructions.
Small embedded cores which need to implement floating point extensions have some options:
-
Use software emulation of floating point instructions, so don’t implement a hardware FPU which gives minimum core area
-
The floating point library can be large, and expensive in terms of ROM or flash storage, costing power and energy consumption
-
The performance of this solution is very low
-
-
Low core area floating point implementations
-
Share the integer registers for floating point instructions (
Zfinx
)-
Will cause more register spills/fills than having a separate register file, but the effect of this is application dependant
-
No need for special instructions such as load and stores to access floating point registers, and moves between integer and floating point registers
-
-
There are still performance/area tradeoffs to make for the FPU design itself
-
e.g. pipelined versus iterative
-
-
Optionally remove multiply-add instructions to save area in the FPU and a register file read port
-
Optionally remove divide/square root instructions to to save area in the FPU
-
-
Dedicated FPU registers, and higher performance FPU implementations use the most area
-
Separate floating point registers allow fewer register spills/fills, and can also be used for integer code to prevent spilling to memory
-
There are the same performance/area tradeoffs for the FPU design
-
Zfinx
is implemented to allow core area reduction as the area of the F
register file is significant, for example:
-
RV32IF Zfinx
saves 1/2 the register file state compared toRV32IF
-
RV32EF Zfinx
saves 2/3 the register file state compared toRV32EF
Therefore Zfinx
should allow for small embedded cores to support floating point with
-
Minimal area increase
-
Similar context switch time as an integer only core
-
there are no
F
registers to save/restore
-
-
Reduced code size by removing the floating point library