-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1D NBODY scores #51
Comments
How could you add 2 CPUs and this many GPUs on same mainboard? Okay, that could be a simple driver issue, I had it once too. One beta opencl 2.0 platform, one real opencl 1.2 platform. This happens for Intel laptop too. What were the PCI-e multipliers? I guess device-0 was 16x and others much less like 2x ? I'm adding this but will update too when you give these pci-e vmware infos. Thank you. Why would gtx trail behind all? PCI-e 1x ? Did you use pci-e razer? Maybe that was operating system giving less bandwidth to that card for some reason? If streaming is already disabled, what happens if you try the following?
if there is no zeroCopy field for New version needs both device-side streaming parameter and array's zero copy field to be set to true for zero copy access to host data but it is slow when you access many times. |
Maybe your 8 GPU system needs millions of particles to compute efficiently. 32k elements only a data overhead for 8 GPUs.
I re-wrote it for 1M particles in benchmark page, it took 8.1 seconds for RX550+R7_240. |
Disabling any "stream" or "zero copy" should make it 8 times faster for your system. New version also has |
Hey. They are two different systems. The VM is just 1 CPU. The other PC has 8 GPUs. Here is a dump of clinfo. Maybe this answers some questions?
|
ok, two cpu issue must be: amd-app version + intel's own implementation. so I was right about 1x riser but I didn't expect all of them being 1x. Also these info do not show which one has most pci-e bandwidth. Maybe operating system handles it. Did you try 1M particles version that I put in benchmark page? Impressive system by the way. Mining? |
Yup, I see that, the CPU is listed once for Intel and once for AMD. Yes, all on 1X, so that they fit :) Kind of like this. I have not yet. I will try today or tomorrow. Yes, Ethereum. But I am more interested in the future of Ethereum network utility such as the Golem token. |
Nice open air case for overclocking. PCI-e 2.0 at 1x mode should be 300-400 MB/s in reality and only for big arrays(at least 8-10 MB). So its normal. But when you run 1M particles, system would show its value. For now it must be only 40-50 MB/s for 128kB data. Also all GPUs did not copy whole data. They accessed RAM in a chaotic manner, which must have made it even more slower. 1M version does not have streaming so it should be ok, you can re-test 32k version with streaming disbled too. |
v1.3.3 will have fully functional "task pool" feature that you can feed independent workloads to it and it schedules them to idle GPUs to keep them busy, even if they are not load-balancable (1 kernel goes to a GPU, another kernel goes to another GPU, if there is any idle). For v1.3.2, it has very limited capabilites, uses only single command queue per device and synchronizes each task. This could be a part of backend for something similar to "Golem token"(just the compute part of course). |
v1.4.1_update4 now properly targets OpenCL 1.2. This should work for some failing functions such as atom_xchg() in kernel codes. I didn't try on Nvidia as I don't have any (yet). |
VMware Virtual Machine, development, CPU only.
drivers: http://registrationcenter-download.intel.com/akdlm/irc_nas/9022/opencl_runtime_16.1.1_x64_setup.msi
ASUS Z270A, 8 GPU build, production
The text was updated successfully, but these errors were encountered: