Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amend statements #283

Merged
merged 1 commit into from
Oct 9, 2024
Merged

Conversation

BoneInscri
Copy link
Contributor

04Sample
卷积执行样例
output[h, w, oc] += input[h + fw, w + fh, ic] * kernel[fw, fh, c, oc]
应改为
output[h, w, oc] += input[h + fw, w + fh, ic] * kernel[fw, fh, ic, oc] ?

03MobileParallel
减少卷积核个数的设计
Ci 改为 C1

02CPUISA
指令例子解析
图中的立即数转化为十进制就是 350,这条指令会将寄存器 Addr2 中的值与立即数 355 相加,并将结果存储在 Addr1 寄存器中。
改为
图中的立即数转化为十进制就是 350,这条指令会将寄存器 Addr2 中的值与立即数 350 相加,并将结果存储在 Addr1 寄存器中。

01Works
以 Intel Exon 8280 这款芯片为例
11659 个比特(byte)数据,$AX+Y$ 将在 89 ns 的时间内传输 16 比特(C/C++中 double 数据类型所占的内存空间是 8 bytes)数据
byte 和 bit 没有区分开?一个double 8 字节,2个double 是 16 字节,一个字节8bit,应该是128 bit?

04History
“H100 一共有 8 组 GPC、66 组 TPC、132 组 SM,总计有 16896 个 CUDA 核心、528 个 Tensor 核心、50MB 二级缓存。显存为新一代 HBM3,容量 80 GB,位宽 5120-bit,带宽高达 3 TB/s。”
这一句的下面那个图是GH100的图,而不是H100的图?

@chenzomi12 chenzomi12 merged commit 3952d94 into chenzomi12:main Oct 9, 2024
@chenzomi12
Copy link
Owner

感谢贡献!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants