Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CHANGELOG.md
entrySummary
Replace
zerocopy
dependency withunsafe
code (up from 12 to 17 instances).Add benchmarks for some SIMD / wide types.
Remove two
#[inline(never)]
attributes which were apparently motivated by benchmark results, but caused more harm than help with the new benches.Motivation
zerocopy
I'm not a big fan of this, but together with #1575 it removes the dependency on
zerocopy
v0.8, so is probably an improvement.Project Safe Transmute
If this project lands safe transmute support into the standard library, we would of course want to use that.
Details
Replacing
zerocopy::transmute!
withcore::mem::transmute
is easy and results in identical code generation (tested withStdRng
andSmallRng
); this reverts a change in #1349.Replacing the
fill
impls is more complex but I believe acceptable; this reverts a change in #1502.In both cases, this would have resulted in a usage of
unsafe
in a macro where safety depends on a type passed by the macro caller. In the first case I decided to inline the three macro usages while in the second I prefixed the macro name withunsafe_
.Benchmark results
Unfinished business?
The
Simd
andm128i
etc. type generation should be equivalent, but they're not in terms of code; theSimd
impls currently usefill
to avoid moreunsafe
code here.Notice from the above that
u32x4
,u16x8
andu8x16
are the same size asu128
andm128i
but cost about twice as much to generate here. This indicates thefill
code may be sub-optimal.Additionally, the
m128i
impl performed even worse when transmuting au128
value (~4.3ns or +%130) which, as far as I can tell, is purely because theu128
value is returned viarax, rdx
while the__m128i
value is returned viardx, r10
(withrax
equal to the struct address). I don't understand this.