Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Machine-Prime, simplified nearest_prime #84

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

JASory
Copy link
Contributor

@JASory JASory commented Nov 29, 2024

Added the low-memory variant of Machine-prime (using a modified BPSW test) as an optional dependency. The Criterion benchmark shows that it is approximately 16x faster than the existing version. Machine-Prime's other variants are faster still but require more memory and this is already a memory-intensive library.

MP is no-std and usable in const functions as well.

Simplified the next_prime and previous_prime functions

Added the low-memory variant of Machine-prime (using a modified BPSW test) as an optional dependency. The Criterion benchmark shows that it is approximately 16x faster than the existing version. 

Simplified the next and previous_prime functions
@JSorngard
Copy link
Owner

JSorngard commented Nov 29, 2024

Very impressive!

The is_prime function gets a ~440% increase in throughput on my machine when the feature is on:
bild

The plot of the execution time distributions is quite funny as well:
bild
Red is with the feature off and blue is with it on.
Quite a bit faster.

Cargo.toml Show resolved Hide resolved
@JSorngard JSorngard changed the base branch from main to machine_prime_integration November 29, 2024 12:30
@JSorngard JSorngard merged commit 53a50a7 into JSorngard:machine_prime_integration Nov 29, 2024
4 of 6 checks passed
@JASory
Copy link
Contributor Author

JASory commented Nov 29, 2024

The difference in benchmark is interesting. I consistently get 16x speed up. But that seems to be because I wrote/optimised it on lower-end hardware. I wonder if the trial division is somehow getting vectorised on your machine. Either way the worst-case is equivalent to only 2.5 fermat tests, so it'll beat it out pretty much regardless of what compiler optimisations are done.

On i5-5300u processor

time 12.021 ms -> 0.666 ms

Throughput 0.831M /s -> 15M/s

@JSorngard
Copy link
Owner

JSorngard commented Nov 29, 2024

Interesting :o
Just using godbolt.org does not reveal any use of vector registers as far as I can tell from reading over the assembly, but modern processors are such massively complex beasts that there might be a technique I am not aware of that speeds it up a lot on my CPU.

My CPU is a 5800X3D, but I don't think the extra cache should help in this situation, it must be something else.

Regardless, I appreciate your excellent contribution!

JSorngard added a commit that referenced this pull request Nov 29, 2024
… by @JASory in #84 (#87)

* Undo typed stride as that introduces a runtime branch. Use suggestion by @JASory in #84

* Add debug assert
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants