Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
04Sample
卷积执行样例
output[h, w, oc] += input[h + fw, w + fh, ic] * kernel[fw, fh, c, oc]
应改为
output[h, w, oc] += input[h + fw, w + fh, ic] * kernel[fw, fh, ic, oc] ?
03MobileParallel
减少卷积核个数的设计
Ci 改为 C1
02CPUISA
指令例子解析
图中的立即数转化为十进制就是 350,这条指令会将寄存器 Addr2 中的值与立即数 355 相加,并将结果存储在 Addr1 寄存器中。
改为
图中的立即数转化为十进制就是 350,这条指令会将寄存器 Addr2 中的值与立即数 350 相加,并将结果存储在 Addr1 寄存器中。
01Works
以 Intel Exon 8280 这款芯片为例
11659 个比特(byte)数据,$AX+Y$ 将在 89 ns 的时间内传输 16 比特(C/C++中 double 数据类型所占的内存空间是 8 bytes)数据
byte 和 bit 没有区分开?一个double 8 字节,2个double 是 16 字节,一个字节8bit,应该是128 bit?
04History
“H100 一共有 8 组 GPC、66 组 TPC、132 组 SM,总计有 16896 个 CUDA 核心、528 个 Tensor 核心、50MB 二级缓存。显存为新一代 HBM3,容量 80 GB,位宽 5120-bit,带宽高达 3 TB/s。”
这一句的下面那个图是GH100的图,而不是H100的图?