Skip to content
This repository has been archived by the owner on Dec 22, 2022. It is now read-only.

Benchmarking support, cmake version++, and sample csr() #99

Merged
merged 4 commits into from
Dec 23, 2021
Merged

Conversation

neoblizz
Copy link
Member

No description provided.

@neoblizz
Copy link
Member Author

neoblizz commented Dec 23, 2021

This is brilliant. It actually outputs markdown (or other types such as json):

Devices

[0] NVIDIA GeForce GTX 1080

  • SM Version: 610 (PTX Version: 610)
  • Number of SMs: 20
  • SM Default Clock Rate: 1733 MHz
  • Global Memory: 7203 MiB Free / 8191 MiB Total
  • Global Memory Bus Peak: 320 GB/sec (256-bit DDR @5005MHz)
  • Max Shared Memory: 96 KiB/SM, 48 KiB/Block
  • L2 Cache Size: 2048 KiB
  • Maximum Active Blocks: 32/SM
  • Maximum Active Threads: 2048/SM, 1024/Block
  • Available Registers: 65536/SM, 65536/Block
  • ECC Enabled: No

Log

Run:  [1/1] parallel_for [Device=0]
Pass: Cold: 0.048905ms GPU, 0.095916ms CPU, 0.50s total GPU, 10224x

Benchmark Results

parallel_for

[0] NVIDIA GeForce GTX 1080

Samples CPU Time Noise GPU Time Noise
10224x 95.916 us 120.59% 48.905 us 41.27%

@neoblizz neoblizz marked this pull request as ready for review December 23, 2021 23:12
@neoblizz neoblizz merged commit 55294c9 into dev Dec 23, 2021
@neoblizz neoblizz deleted the bench branch December 23, 2021 23:13
@neoblizz neoblizz added the 🏭 build Build related issues. label Jun 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🏭 build Build related issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement benchmarking support for essentials using NVIDIA/nvbench.
1 participant