Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDIV_C and ISDIV_C instructions #26

Closed
SChernykh opened this issue Feb 21, 2019 · 6 comments
Closed

IDIV_C and ISDIV_C instructions #26

SChernykh opened this issue Feb 21, 2019 · 6 comments

Comments

@SChernykh
Copy link
Collaborator

SChernykh commented Feb 21, 2019

What's the point in having them? ISDIV_C compiles to a lot of code:

	; ISDIV_C r4, -692911499
	mov rax, -893288710803585809
	imul r12
	xor eax, eax
	sar rdx, 25
	sets al
	add rdx, rax
	add r12, rdx

And can be clearly made faster on ASIC. And since it's essentially multiplication by constant, maybe replace these two instructions with explicit multiplication by 64-bit constant? Then we won't need to handle all edge cases and the code would become something like this:

	; IMUL_C r4, -893288710803585809
	mov rax, -893288710803585809
	imul r12
	add r12, rdx

P.S. Since constants are only 32-bit, we can define this instruction as multiplication by reciprocal without further edge case handling:

a *= (2^64-1)/C if C != 0
a *= some fixed 64-bit value if C = 0
@tevador
Copy link
Owner

tevador commented Feb 21, 2019

And can be clearly made faster on ASIC

It's not so clear to me. The 64-bit constant needs to be calculated using division and modulo operations. CPUs have to compile to machine code, so this calculation is done as part of the compilation step.

Having those instructions forces the ASIC to have a hardware divider while avoiding the high cost of division for CPUs in the main loop.

The ISDIV_C (signed) instruction could theoretically be removed, because it compiles to more instructions on average. The unsigned variant is much shorter, so I'd prefer to keep it.

; IDIV_C r5, 624165039
mov rax, 15866829597104432181
mul r13
shr rdx, 29
add r13, rdx

@SChernykh
Copy link
Collaborator Author

SChernykh commented Feb 21, 2019

@tevador I'm talking about all these excess instructions that come after multiplication. We can still leave requirement to calculate 64-bit reciprocal, but leave only multiplication (see my updated comment above).

P.S. And we'll be able to use imul instruction to do this which is more flexible in general and faster than mul on some CPUs.

@tevador
Copy link
Owner

tevador commented Feb 21, 2019

a *= (2^64-1)/C if C != 0
a *= some fixed 64-bit value if C = 0

That's not a bad idea. So it would become something like:

mov rax, 29554273182 ; = 0xffffffffffffffff / 624165039
imul r13, rax

The only potential issue here is that the constant would be 33-34 bits in most cases, which could be optimized by an ASIC. CPUs always do 64x64 multiplication.

@SChernykh
Copy link
Collaborator Author

The only potential issue here is that the constant would be 33-34 bits in most cases, which could be optimized by an ASIC. CPUs always do 64x64 multiplication.

It's easy to fix:

a *= (2^(64+N)-1)/C if C != 0

Where you select N such that the reciprocal is full 64-bit. So basically the same reciprocal calculation algorithm you used before.

@tevador
Copy link
Owner

tevador commented Feb 21, 2019

@SChernykh Agreed.

So this

; IDIV_C r5, 624165039
mov rax, 15866829597104432181
mul r13
shr rdx, 29
add r13, rdx

will become

; IDIV_C r5, 624165039
mov rax, 15866829597104432181
imul r13, rax

and the signed variant can be removed.

@SChernykh
Copy link
Collaborator Author

Perfect! But the instruction should renamed to RCP_MUL_C or something like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants