-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newer Nvidia GPUs with more than 4GB memory, report max 4GB memory under CUDA #1773
Comments
I think 'int' is the result of function call, it is the status. Function return 'Max Memory Value' in its first argument which is size_t. |
This seems to be a "feature" of the CUDA SDK. Older versions always report the memory as 32bit integer. Exhibit 1, Exhibit 2. I guess one has to experiment which CUDA SDK reports the correct amount and use this to built the Client with. The client itself uses a size_t to retrieve the value which than gets converted into a double. If we compile on a 64bit machine the size_t should be big enough. |
@RichardHaselgrove, you've been working with the similar staff while fixing #2706. |
And even longer than that. We had a problem when the first 2GB cards came out, with garbage sizes reported for a while. Berkeley couldn't run to the latest hardware, so Rom remoted into a machine I'd just built to debug and create 26f9e38. The 4 GB reporting limit still applies (and has been reported again in the context of the RTX 2080 release), but it would involve delving into the NVidia SDKs to find a usable 64bit implementation which could co-exist with our legacy 32-bit code and wouldn't lose our support for older cards. |
All right, I got a code fix in for this problem. The problem lies in the use of cuDeviceTotalMem which is a 32bit value. Using cuDeviceTotalMem_v2 will fix this for GPUs with more than 4GB. It needs further testing with older GPUs and in 32bit OS.
|
And the corresponding explanation:
|
The above explanation may be a bit confusing as it was concatenated together from multiple different posts I wrote on Seti@Home GPUUG team forum. |
@Ville-Saari, if it's your code above, could you please make a proper PR? |
I'm working on converting this code for both Windows and Linux. The original patch is giving me compiler warnings on VS 2013, but working. If anybody wants to re-code it in accordance with BOINC house style, please feel free. The theory is simple: if _v2 versions of cuDeviceTotalMem and cuMemGetInfo exist in the active library, use them. Otherwise, keep the originals. I think the _v2 libraries became available around 2012, with CUDA 4. |
Those 32 bit checks won't actually produce any code as they are constant expressions that make the compiier remove the test and just conditionally compile one of the resulting branches depending on what kind of architecture it is compiling for. I think using those 32 bit versions of the calls without _v2 in their name when running on 64 bit system may even be dangerous. The fact that the total memory query even works at all with the 32 bit call relies on the code running on a little-endian system and the variable being explicitly initialized to 0, so the result ends up in the less significant half of the 64 bit integer. That function is meant to receive a pointer to a 32 bit value. |
I don't know why this is still hanging out there. I've been using Ville's corrected code for years now. One simple change to a client module and problem is solved. Why haven't anybody simply merged the code? |
Probably because nobody have created a proper PR for this. |
Why can't you do that? You are a developer. Forget about Ville doing it. He left BOINC once Seti finished. No sign of him since then. |
Actually, he replied back two years ago: #1773 (comment) |
So since that is never going to happen, i.e. he had left no contact information to get ahold of him, this issue will never be fixed? |
If he gave his permissions and you have a proof of it - I can use that code and create a PR.
Usually you have to reimplement/fix the functionality without using the original code. Otherwise it's a violation of intellectual property rights. |
So I need to find an explicit permission message from Ville somewhere in all the threads he posted. |
Vitalii, I have these forum messages as the closest I can come for an explicit permission case for Ville's code fix.
Here you are:
And Juan's reply right after that.
|
OK, you only got the executables. So how did the code snippet end up here on #1773? I see Ville posted an explanation here of the problem and its solution, so I can assume he posted the code snippet. Why doesn't that satisfy the requirements? |
OK, it looks like Jord got the code from Ville from the posts. |
Checked my PMs. Yes, Juan declined to send me code (collective decision of GPUUG), but that was to do with GPU spoofing to break the 'max tasks per GPU' cache limit. I was sent code to address this current issue, but that was by Ian&Steve C. - and very late in the lifetime of the SETI@Home project, when we were all getting very tired and irritable. The associated discussion thread in the forum was eventually hidden. |
Late last night I saw I was being mentioned, and I must say I didn't even remember any of this. So I looked around and found that I got a PM at Seti from Juan BFP with the code, and one following with a long explanation. I'll add that to this thread. It may explain the problems you guys talk about in the other ticket. |
|
Found one more private message from Juan, which I read as permission to use the code:
|
Well, at one level, we don't need any permissions from anyone. The calls we're using are all documented in the CUDA Toolkit Documentation (currently at v11.7.0) - at section 6.5, Device Management, of the CUDA Driver API |
It's what Vitalii was asking for:
|
I think David - with #4757 - has somewhat pre-empted that request. For the record, the majority of the _v2 variants aren't explicitly covered in the toolkit documentation, but are listed in https://forums.developer.nvidia.com/t/driverapi-problem-with-cudevicetotalmem/25585 (January 2012). |
@KeithMyers and @Ageless93 posted two different diffs. Both of them are looks similar to what is done by David in #4757 but the last one is not working properly as far as I see from the comments. |
Yes, but I don't think it was his code, hence why he never made the PR. I think, going by what Juan posted to me (and what I quoted above for the second time, as I quoted it before I see on the 21th March 2020) it was his code. |
@Ageless93, ok, then it would be nice to reach to Juan somehow and kindly ask them to share their work. |
Is there a case where #4757 isn't working? |
@davidpanderson, according to this report it doesn't work on linux: #4757 (comment) |
It was always Ville's code. he wrote it. Juan was just acting as a liaison until Ville reached out himself in here. Both Juan and Ville have been MIA and unreachable for over a year. that's why neither of them have been here to contribute comments. we already tried reaching both of them. |
As reported on the BOINC forums, with newer Nvidia GPUs with more than 4 gigabyte video memory, BOINC reports they have a max of 4096MB for CUDA. OpenCL detects the memory correctly.
Checking https://github.com/BOINC/boinc/blob/master/client/gpu_nvidia.cpp, line 195, total memory is checked with an int. An int is 32bit, thus the maximum memory it can address is 4GB. In this case I think it needs a long long int here (referencing http://en.cppreference.com/w/cpp/language/types) to be able to detect over 4GB of memory (64bit).
The text was updated successfully, but these errors were encountered: