benchmark results

i used two different seeds i refer to as the "special seed" and the "benchmark seed" to evaluate the performance of our implementation.

i opted for logarithmic scaling on all y-axes in our graphs to enhance the visibility of function shapes/gradients over the marginal differences.

the findings highlight the importance of considering load size and parallel processing efficiency in computational analysis. variation in seed choice has minimal impact compared to load size on results. larger datasets show sharper decline in runtime with increasing processes, while smaller ones exhibit diminishing returns and eventual increase in runtime due to overhead. parallel efficiency decreases with more processes, with the b-case starting at a loir efficiency than the s-case, indicating higher compute-intensity in the latter.

i also found out that the best patch size is 29, which leads to an average runtime of 0.702 seconds.

metrics

speedup

what difference does parallelization make?
$S_a(n,p) = \frac{T_{\text{seq}}(n)}{T_{\text{par}}(n,p)}$ = absolute speedup
$S_r(n,p) = \frac{T_{\text{par}}(n, 1)}{T_{\text{par}}(n,p)}$ = relative speedup
where:
- $n$ = input size
- $p$ = number of processors
- $T_{\text{par}}(n,p)$ = parallel runtime
- $T_{\text{seq}}(n)$ = sequential runtime

parallelization efficiency

what difference does each processor make?
$E(n,p) = \frac{T_{\text{seq}}(n)}{p \cdot T_{\text{par}}(n,p)} = \frac{1}{p} \cdot S_a(n,p)$

raw data

special seed

size	p	mean runtime (s)	speed-up	par. eff.
155	1	0.276419	1	1.42473
155	2	0.14614	1.89147	1.34742
155	4	0.0850049	3.2518	1.15824
155	8	0.0684728	4.03691	0.71894
155	16	0.0727858	3.7977	0.338169
155	24	0.0845242	3.27029	0.194137
155	32	0.0975262	2.8343	0.126191
1100	1	13.42	1	1.40137
1100	2	6.66189	2.01444	1.41149
1100	4	3.37751	3.97333	1.39203
1100	8	1.72664	7.7723	1.36149
1100	16	0.908652	14.7691	1.29356
1100	24	0.654567	20.5021	1.19713
1100	32	0.564249	23.7838	1.04156

the speed-up was calculated using $p=1$ as the reference point.

the parallel efficiency was calculated using an average of the sequential runtime.

srun -p q_student -t 1 -N 1 -c 32 python3 julia.py --size 155 --nprocs 1 # 155;20;1;0.39382300106808543
srun -p q_student -t 1 -N 1 -c 32 python3 julia.py --size 1100 --nprocs 1 # 1100;20;1;18.806384983938187

benchmark seed

size	p	mean runtime (s)	speed-up	par. eff.
155	1	0.394086	1	0.711286
155	2	0.210897	1.86862	0.66456
155	4	0.121818	3.23504	0.575259
155	8	0.0862224	4.57057	0.406373
155	16	0.0852527	4.62256	0.205497
155	24	0.0963721	4.08921	0.121191
155	32	0.108531	3.6311	0.0807109
1100	1	19.1049	1	0.68691
1100	2	9.67163	1.97536	0.678447
1100	4	4.90018	3.89883	0.669536
1100	8	2.4526	7.78967	0.66885
1100	16	1.28631	14.8525	0.637646
1100	24	0.900398	21.2183	0.607295
1100	32	0.746145	25.6049	0.549633

same as before, just with a different seed.

srun -p q_student -t 1 -N 1 -c 32 python3 julia.py --size 155 --nprocs 1 --benchmark # 155;20;1;0.2803074959665537
srun -p q_student -t 1 -N 1 -c 32 python3 julia.py --size 1100 --nprocs 1 --benchmark # 1100;20;1;13.123375411145389

system specs

the system used for benchmarking was a hydra node with the following specs:

bopc23s9@hydra-head:~$ lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              1
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
Stepping:                        4
CPU MHz:                         1000.151
CPU max MHz:                     3700.0000
CPU min MHz:                     1000.0000
BogoMIPS:                        4200.00
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        16 MiB
L3 cache:                        22 MiB
NUMA node0 CPU(s):               0-15
...

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.vscode		.vscode
assets		assets
data		data
docs		docs
src		src
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

benchmark results

metrics

raw data

system specs

About

Languages

License

sueszli/julia.py

Folders and files

Latest commit

History

Repository files navigation

benchmark results

metrics

raw data

system specs

About

Topics

Resources

License

Stars

Watchers

Forks

Languages