IDIV_C and ISDIV_C instructions #26

SChernykh · 2019-02-21T09:53:32Z

What's the point in having them? ISDIV_C compiles to a lot of code:

	; ISDIV_C r4, -692911499
	mov rax, -893288710803585809
	imul r12
	xor eax, eax
	sar rdx, 25
	sets al
	add rdx, rax
	add r12, rdx

And can be clearly made faster on ASIC. And since it's essentially multiplication by constant, maybe replace these two instructions with explicit multiplication by 64-bit constant? Then we won't need to handle all edge cases and the code would become something like this:

	; IMUL_C r4, -893288710803585809
	mov rax, -893288710803585809
	imul r12
	add r12, rdx

P.S. Since constants are only 32-bit, we can define this instruction as multiplication by reciprocal without further edge case handling:

a *= (2^64-1)/C if C != 0
a *= some fixed 64-bit value if C = 0

The text was updated successfully, but these errors were encountered:

tevador · 2019-02-21T10:15:18Z

And can be clearly made faster on ASIC

It's not so clear to me. The 64-bit constant needs to be calculated using division and modulo operations. CPUs have to compile to machine code, so this calculation is done as part of the compilation step.

Having those instructions forces the ASIC to have a hardware divider while avoiding the high cost of division for CPUs in the main loop.

The ISDIV_C (signed) instruction could theoretically be removed, because it compiles to more instructions on average. The unsigned variant is much shorter, so I'd prefer to keep it.

; IDIV_C r5, 624165039
mov rax, 15866829597104432181
mul r13
shr rdx, 29
add r13, rdx

SChernykh · 2019-02-21T10:18:04Z

@tevador I'm talking about all these excess instructions that come after multiplication. We can still leave requirement to calculate 64-bit reciprocal, but leave only multiplication (see my updated comment above).

P.S. And we'll be able to use imul instruction to do this which is more flexible in general and faster than mul on some CPUs.

tevador · 2019-02-21T10:37:07Z

a *= (2^64-1)/C if C != 0
a *= some fixed 64-bit value if C = 0

That's not a bad idea. So it would become something like:

mov rax, 29554273182 ; = 0xffffffffffffffff / 624165039
imul r13, rax

The only potential issue here is that the constant would be 33-34 bits in most cases, which could be optimized by an ASIC. CPUs always do 64x64 multiplication.

SChernykh · 2019-02-21T10:49:00Z

The only potential issue here is that the constant would be 33-34 bits in most cases, which could be optimized by an ASIC. CPUs always do 64x64 multiplication.

It's easy to fix:

a *= (2^(64+N)-1)/C if C != 0

Where you select N such that the reciprocal is full 64-bit. So basically the same reciprocal calculation algorithm you used before.

tevador · 2019-02-21T10:53:15Z

@SChernykh Agreed.

So this

; IDIV_C r5, 624165039
mov rax, 15866829597104432181
mul r13
shr rdx, 29
add r13, rdx

will become

; IDIV_C r5, 624165039
mov rax, 15866829597104432181
imul r13, rax

and the signed variant can be removed.

SChernykh · 2019-02-21T11:17:25Z

Perfect! But the instruction should renamed to RCP_MUL_C or something like this.

SChernykh closed this as completed Feb 21, 2019

tevador mentioned this issue Feb 22, 2019

Performance and portability testing #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDIV_C and ISDIV_C instructions #26

IDIV_C and ISDIV_C instructions #26

SChernykh commented Feb 21, 2019 •

edited

Loading

tevador commented Feb 21, 2019

SChernykh commented Feb 21, 2019 •

edited

Loading

tevador commented Feb 21, 2019

SChernykh commented Feb 21, 2019

tevador commented Feb 21, 2019

SChernykh commented Feb 21, 2019

IDIV_C and ISDIV_C instructions #26

IDIV_C and ISDIV_C instructions #26

Comments

SChernykh commented Feb 21, 2019 • edited Loading

tevador commented Feb 21, 2019

SChernykh commented Feb 21, 2019 • edited Loading

tevador commented Feb 21, 2019

SChernykh commented Feb 21, 2019

tevador commented Feb 21, 2019

SChernykh commented Feb 21, 2019

SChernykh commented Feb 21, 2019 •

edited

Loading

SChernykh commented Feb 21, 2019 •

edited

Loading