Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
改进前
改进后
加速比
matrix_randomize: 6.18x
matrix_transpose: 4.21x
matrix_multiply: 62.52x
matrix_RtAR: 58.85x
优化方法
下面这些函数你是如何优化的?是什么思路?用了老师上课的哪个知识点?
_mm_stream_si32
绕过缓存写入。_mm_stream_ps
绕过缓存写入,但这要求首先计算4次random
,有可能变成CPU-bound. 或者可以设计一个每次输出一个128比特向量的random.simple_partitioner
自带的莫顿码遍历_mm_stream_si32
绕过缓存写入out(x, y) += lhs(x, t) * rhs(t, y);
让t和y不要动,否则就是跳着遍历效率低。static
关键字简单池化避免重复分配销毁。