`Zfinx` Specification

Overview

Zfinx is an extension which changes all existing and future floating point extensions which use the F floating point registers, so that instead they use the X registers. Hence the name F-in-X. This does not affect floating point instructions which are implemented as part of the Vector (v) extension. Zfinx additionally removes all

floating point load instructions (e.g. FLW)
floating point store instructions (e.g. FSW)
integer to/from floating point register move instructions (e.g. FMV.X.W)

In all cases the integer versions of these are required instead.

On a Zfinx core the assembler syntax of floating point instructions changes so that they only refer to X registers. Therefore on an RV32F core this is legal syntax:

FLW     fa4, 12(sp)        //load floating point data
FMADD.S fa1, fa2, fa3, fa4 //floating point arithmetic for RV32F

On a Zfinx core, this syntax must be used as the F registers are not implemented, and FLW is not a supported instruction.

LW      a4, 12(sp)     //load integer data
FMADD.S a1, a2, a3, a4 //floating point arithmetic for RV32F Zfinx

Note that only the assembler syntax differs between the two FMADD.S instructions, the encoding is the same.

The assembler syntax changes to avoid code-porting bugs, so that the registers must be updated and not just reused from non-Zfinx code

Zfinx may be used with any extensions which uses F registers. The number of integer registers does not affect Zfinx (I or E extensions) although the relative sizes of XLEN and FLEN do affect the specification.

This specification uses D for 64-bit floating point, F for 32-bit floating point and Zfh for 16-bit floating point. Zfinx behviour is only affected by the data width so future formats are implicitly supported, e.g. 64 or 32-bit POSIT formats.

Table 1. supported Zfinx configurations

Architecture	Comment
RV32IFD Zfinx	XLEN<FLEN
RV32IFD Zfh Zfinx	XLEN<FLEN
RV32F Zfinx	XLEN==FLEN
RV32F Zfh Zfinx	XLEN==FLEN
RV64FD Zfinx	XLEN==FLEN
RV64FD Zfh Zfinx	XLEN==FLEN
RV64F Zfinx	XLEN>FLEN
RV64F Zfh Zfinx	XLEN>FLEN

Note that RV32FD [Zfh] Zfinx requires register pairs so is more complex than the other cases.

RV128 and the Q extension are not covered by this specification, but it is simple to extend this specification to include them.

Semantic Differences

The NaN-boxing behaviour of floating point arithmetic instructions is modified to suppress checking of sources only. Floating point results are always NaN-boxed to XLEN bits.

NaN-boxing checking is removed as integer loads do not NaN-box their result, and so loading fewer than XLEN bits (for example using LW to load floating point data on an RV64 core) would otherwise require NaN-boxing in software which wastes performance and code-size

There are no other semantic differences for floating point instruction behaviour between a Zfinx and a non-Zfinx core, but there are some differences for special cases (such as x0 handling) as listed later in this specification.

Discovery

If Zfinx is specified then the compiler will have the following #define set

__riscv_zfinx

So software can use this to choose between Zfinx or normal versions of floating point code.

Privileged code can detect whether Zfinx is implemented by checking if:

mstatus.FS is hardwired to zero, and
misa.F is 1 at reset, or is writeable

Non-privileged code can detect whether Zfinx is implemented as follows.

li a0, 0 # set a0 to zero

#ifdef __riscv_zfinx

fneg.s a0, a0 # this will invert a0

#else

fneg.s fa0, fa0 # this will invert fa0

#endif

If a0 is non-zero then it’s a Zfinx core, otherwise it’s a non-Zfinx core. Both branches result in the same encoding, but the assembly syntax is different for each variant

mstatus.fs

For Zfinx cores mstatus.fs is hardwired to zero, because all the integer registers already form part of the current context. Note however that fcsr needs to be saved and restored. This gives a performance advantage when saving/restoring contexts.

Floating point instructions and fcsr accesses do not trap if mstatus.fs=0. This is different to non-Zfinx cores.

Register pair handling for XLEN < FLEN

For RV32D, all D-extension instructions which are implemented with Zfinx will access register pairs:

The specified register must be even, odd registers will cause an illegal instruction exception
Even registers will cause an even/odd pair to be accessed
1. Accessing Xn will cause the {Xn+1, Xn} pair to be accessed. For example if n = 2
  1. X2 is the least significant half (bits [31:0]) for little endian mode
  2. X3 the most significant half (bits [63:32]) for little endian mode
2. For big endian mode the register mapping is reversed, so X2 is the most significant half, and X3 is the least significant half.
X0 has special handling
1. Reading {X1, X0} will read all zeros
2. Writing {X1, X0} will discard the entire result, it will not write to X1

The register pairs are only used by the floating point arithmetic instructions. All integer loads and stores will only access XLEN bits, not FLEN.

Note:

Zp64 from the P-extension specifies consistent register pair handling.
Big endian mode is enabled in M-mode if mstatus.MBE=1, in S-mode if mstatus.SBE=1, or in U-mode if mstatus.UBE=1

x0 register target

If a floating point instruction targets x0 then it will still execute, and will set any required flags in fcsr. It will not write to a target register. This matches the non-Zfinx behaviour for

fcvt.w.s x0, f0

If the floating point source is invalid then it will set the fflags.NV bit, regardless of whether Zfinx is implemented. The target register is not written as it is x0.

If fcsr.RM is in an illegal state then floating point instruction behaviour is the same whether the target register is x0 is not, i.e. targetting x0 doesn’t disable any execution side effects.

In the case of RV32D Zfinx, register pairs are used. See above for x0 handling.

NaN-boxing

For Zfinx the NaN-boxing is limited to XLEN bits, not FLEN bits. Therefore a FADD.S executed on an RV64D core will write a 64-bit value (the MSH will be all 1’s). On an RV32D Zfinx core it will write a 32-bit register, i.e. a single X register only. This means there is semantic difference between these code sequences:

#ifdef __riscv_zfinx

fadd.s x2, x3, x4 # only write x2 (32-bits), x3 is not written

#else

fadd.s f2, f3, f4 # NaN-box 64-bit f2 register to 64-bits

#endif

NaN-box generation is supported by Zfinx implementations. NaN-box checking is not supported by scalar floating point instructions. For example for RV64F:

#ifdef __riscv_zfinx

lw[u] x1, 0(sp)   # load 32-bits into x1 and sign / zero extend upper 32-bits
fadd.s x1, x1, x1 # use x1 but do not check source is Nan-boxed, NaN-box output

#else

flw.s  f1, 0(sp)  # load 32-bits into f1 and NaN-box to 64-bits (set upper 32-bits to 0xFFFFFFFF)
fadd.s f2, f1, f1 # check f1 is NaN-boxed, NaN-box output

#endif

Floating point loads are not supported on Zfinx cores so x1 is not NaN-boxed in the example above, therefore the FADD.S instruction does not check the input for NaN-boxing. The result of FADD.S is NaN-boxed, which means setting the upper half of the output register to all 1’s.

The table shows the effect of writing each possible width of value to the register file for all supported combinations. Note that Verilog syntax is used in the final column.

Table 2. NaN-boxing for supports configurations

XLEN	Width of write to Xreg from FP instruction	Value written to Xreg
64	16	{48{1’b1}, result[15:0]}
32	16	{16{1’b1}, result[15:0]}
64	32	{32{1’b1}, result[31:0]}
32	32	result[31:0]
64	64	result[63:0]
Little endian
32	64	EvenXreg: result[31:0] Odd Xreg: result[63:32] special handling Xreg={0, 1}
Big endian
32	64	Odd Xreg: result[31:0] EvenXreg: result[63:32] special handling Xreg={0, 1}

Therefore, for example, if an FADD.S instruction is issued on an RV64F core then the upper 32-bits will be set to one in the target integer register, or an FADD.H (floating point add half-word) instruction will set the upper 48-bits to one.

Assembly Syntax and Code Porting

Any references to F registers, or removed instructions will cause assembler errors.

For example, the encoding for

FMADD.S <1>, <2>, <3>, <4>

will disassemble and execute as

FMADD.S f1, f2, f3, f4

on a non-Zfinx core, or

FMADD.S x1, x2, x3, x4

on a Zfinx core.

We considered allowing pseudo-instructions for the deleted instructions for easier code porting. For example allowing FLW to be a pseudo-instruction for LW, but decided not to. Because the register specifiers must change to integer registers, it makes sense to also remove the use of FLW etc. In this way the user is forced to rewrite their code for a Zfinx core, reducing the chance of undiscovered porting bugs. This only affects assembly code, high level language code is unaffected as the compiler will target the correct architecture.

Replaced Instructions

All floating point loads, stores and floating point to integer moves are removed on a Zfinx core. The following three tables give suggested replacements.

Table 3. replacements for floating point load instructions

Instruction	RV32F Zfh Zfinx	RV32D Zfh Zfinx	RV64F Zfh Zfinx	RV32F Zfinx	RV32D Zfinx	RV64F Zfinx
loads	suggested replacement instructions
FLD frd, offset(xrs1)	reserved	LW,LW	LD	reserved	LW, LW	LD
FLW frd, offset(xrs1)	LW		LW[U] and NaN-box in software	LW		LW[U] and NaN-box in software
FLH frd, offset(xrs1)	LH[U] and NaN-box in software			reserved
C.FLD frd’, offset(xrs1’)	reserved	[C.]LW,[C.]LW	[C.]LD	reserved	[C.]LW,[C.]LW	[C.]LD
C.FLDSP frd, uimm(x2)	reserved	C.LWSP,C.LWSP	C.LDSP	reserved	C.LWSP,C.LWSP	C.LDSP
C.FLW frd, offset(xrs1)	C.LW		C.LW and NaN-box in software	C.LW		C.LW and NaN-box in software
C.FLWSP frd, uimm(x2)	C.LWSP		C.LWSP and NaN-box in software	C.LWSP		C.LWSP and NaN-box in software

Table 4. replacements for floating point store instructions

Instruction	RV32F Zfh Zfinx	RV32D Zfh Zfinx	RV64F Zfh Zfinx	RV32F Zfinx	RV32D Zfinx	RV64F Zfinx
stores	suggested replacement instructions
FSD frd, offset(xrs1)	reserved	SW,SW	SD	reserved	SW, SW	SD
FSW frd, offset(xrs1)	SW
FSH frd, offset(xrs1)	SH			reserved
C.FSD frd’, offset(xrs1’)	reserved	[C.]SW,[C.]SW	[C.]SD	reserved	[C.]SW,[C.]SW	[C.]SD
C.FSDSP frd, uimm(x2)	reserved	C.SWSP,C.SWSP	C.SDSP	reserved	C.SWSP,C.SWSP	C.SDSP
C.FSW frd, offset(xrs1)	C.SW
C.FSWSP frd, uimm(x2)	C.SWSP

Table 5. replacements for floating point move instructions

Instruction	RV32F Zfh Zfinx	RV32D Zfh Zfinx	RV64F Zfh Zfinx	RV64D Zfh Zfinx	RV32F Zfinx	RV32D Zfinx	RV64F Zfinx	RV64D Zfinx
moves	suggested replacement instructions
FMV.X.D xrd, frs1	reserved	MV,MV	reserved	MV	reserved	MV,MV	reserved	MV
FMV.D.X frd, xrs1	reserved	MV,MV	reserved	MV	reserved	MV,MV	reserved	MV
FMV.X.W xrd, frs1	MV		MV and sign extend in software		MV		MV and sign extend in software
FMV.W.X frd, xrs1	MV		MV and NaN-box in software		MV		MV and NaN-box in software
FMV.X.H xrd, frs1	MV and sign extend in software				reserved
FMV.H.X frd, xrs1	MV and NaN-box in software				reserved

Notes:

Where a floating point load loads fewer than XLEN bits then software NaN-boxing in software is required to get the same semantics as a non-Zfinx core
Where a floating point move moves fewer than XLEN bits then either sign extension (if the target is an X register) or NaN-boxing (if the target is an F register) is required in software to get the same semantics

The B-extension is useful for sign extending and NaN-boxing.

To sign-extend using the B-extension:

FMV.X.H rd, rs1

is replaced by

SEXT.H rd, rs1

Without the B-extension two instructions are required: shift left 16 places, then arithmetic shift right 16 places.

NaN boxing in software is more involved, as the upper part of the register must be set to 1. The B-extension is also helpful in this case.

FMV.H.X a0, a1

is replaced by

C.ADDI a2, zero, -1

PACK a0, a1, a2

Emulation

A non-Zfinx core can run a Zfinx binary. M-mode software can do this:

Set mstatus.fs=0 to cause every floating point instruction to trap
When a floating point instruction traps, move the source operands from the X registers to the equivalent F registers (i.e. the same register numbers)
Set mstatus.fs to be non-zero
Execute the original instruction which caused the trap
Move the result from the destination F register to the X register / X register pair (For RV32D)
Set mstatus.fs=0
MRET

There are corner cases around the use of x0 and register pairs for RV32D

Two 32-bit X registers must be transferred to a single 64-bit F register to set up the source operands. This must be done by saving each X register to consecutive memory locations, and using a 64-bit floating point load (FLD or C.FLD) to load the data
One 64-bit F register must be transferred to two 32-bit X registers to receive the result. This must be done with a 64-bit floating point store (FSD or C.FSD) and then two 32-bit loads (such as LW or C.LW).
If the source register pair is {x1,x0}, the source data will read as all zeroes. Therefore f0 must be loaded with a 64-bit zero constant from memory.
If the destination register pair is {x1,x0} then the full output is discarded, do not transfer the resulting data to the {x1,x0} register pair which would result in the upper half being written to x1

A Zfinx core cannot trap on floating point instructions by setting mstatus.fs=0, so the reverse emulation isn’t possible. The code must be recompiled (or ported for assembler).

ABI

For details of the current calling conventions see:

https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md C The ABI when using Zfinx is the standard integer calling convention as listed in the table below.

The Zfinx ABI can be thought of as being similar to using the softfloat routines to execute floating point functionality, but replacing the call to the softfloat function with the actual floating point ISA instruction.

Note that RV32D Zfinx requires register pair handling. This does not require an ABI change as long types are already supported using register pairs. It is likely to require some work in the compiler (according to Jim Wilson).

Floating Point Configurations To Reduce Area

To reduce the area overhead of FPU hardware new configurations will make the F[N]MADD.*, F[N]MSUB.* and FDIV.*, FSQRT.*` instructions optional in hardware. This then gives the choice of implementing them in software instead by:

Taking an illegal instruction trap, and calling the required software routine in the trap handler. This requires that the opcodes are not reallocated and gives binary compatibility between cores with/without hardware support for F[N]MADD.*, F[N]MSUB.* and FDIV.*, FSQRT.*, but is lower performance than option 2
Use the GCC options below so that a software library is used to execute them

This argument already exists for RISCV

gcc -mno-fdiv

This argument exists for other architectures (e.g. MIPs) but not for RISCV, so it needs to be added

gcc -mno-fused-madd

To achieve this we break all current and future floating point extensions into three parts: Zf*base, Zfma and Zfdiv. Zfinx is orthogonal, and so is an additional modifier to these as described below.

Options, all start with Zf	Meaning
Zfhbase	Support half precision base instructions
Zffbase	Support single precision base instructions
Zfdbase	Support double precision base instructions
Zfqbase	Support quad precision base instructions
Zfldstmv	Support load,store and integer to/from FP move for all FP extensions
Zfma	Support multiply-add for all FP extensions
Zfdiv	Support div/sqrt for all FP extensions
Zfinx	Share the integer register file for all FP extensions

So the Zfldstmv, Zfma, Zfdiv, Zfinx options apply to all floating point extensions, including future ones. This keeps the support regular across the different options.

Therefore RV32FD Zfh Zfinx can also be expressed as:

rv32_Zfhbase_Zffbase_Zfdbase_Zfma_Zfdiv_Zfinx

Also RV32FD Zfh can be expressed as:

rv32_Zfhbase_Zffbase_Zfdbase_Zfldstmv_Zfma_Zfdiv

The options are designed to be additive, none of them remove instructions.

Rationale, why implement Zfinx?

Small embedded cores which need to implement floating point extensions have some options:

Use software emulation of floating point instructions, so don’t implement a hardware FPU which gives minimum core area
1. The floating point library can be large, and expensive in terms of ROM or flash storage, costing power and energy consumption
2. The performance of this solution is very low
Low core area floating point implementations
1. Share the integer registers for floating point instructions (Zfinx)
  1. Will cause more register spills/fills than having a separate register file, but the effect of this is application dependant
  2. No need for special instructions such as load and stores to access floating point registers, and moves between integer and floating point registers
2. There are still performance/area tradeoffs to make for the FPU design itself
  1. e.g. pipelined versus iterative
3. Optionally remove multiply-add instructions to save area in the FPU and a register file read port
4. Optionally remove divide/square root instructions to to save area in the FPU
Dedicated FPU registers, and higher performance FPU implementations use the most area
1. Separate floating point registers allow fewer register spills/fills, and can also be used for integer code to prevent spilling to memory
2. There are the same performance/area tradeoffs for the FPU design

Zfinx is implemented to allow core area reduction as the area of the F register file is significant, for example:

RV32IF Zfinx saves 1/2 the register file state compared to RV32IF
RV32EF Zfinx saves 2/3 the register file state compared to RV32EF

Therefore Zfinx should allow for small embedded cores to support floating point with

Minimal area increase
Similar context switch time as an integer only core
1. there are no F registers to save/restore
Reduced code size by removing the floating point library

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zfinx_spec.adoc

Zfinx_spec.adoc

`Zfinx` Specification

Overview

Semantic Differences

Discovery

mstatus.fs

Register pair handling for XLEN < FLEN

x0 register target

NaN-boxing

Assembly Syntax and Code Porting

Replaced Instructions

Emulation

ABI

Floating Point Configurations To Reduce Area

Rationale, why implement Zfinx?

Files

Zfinx_spec.adoc

Latest commit

History

Zfinx_spec.adoc

File metadata and controls

Zfinx Specification

Overview

Semantic Differences

Discovery

mstatus.fs

Register pair handling for XLEN < FLEN

x0 register target

NaN-boxing

Assembly Syntax and Code Porting

Replaced Instructions

Emulation

ABI

Floating Point Configurations To Reduce Area

Rationale, why implement Zfinx?

`Zfinx` Specification