-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array might not be zero-initialized #79882
Comments
Tagging subscribers to this area: @dotnet/gc Issue DetailsDescriptionWe have found, that elements of an array might not be always initialized to the default value. For example, after calling new double[size], some elements might be nonzero. We have started noticing this problem after upgrading to .net7. It seems, that this happens especially when the GC heap size limit is set. (Initially, we observed this with 8 GB limit.) Reproduction StepsRun this C# program:
With GC HeapHardLimit set to 1 GB
Expected behaviorWe expect that Exception is not thrown, because elements of the array should be initialized to zero. Actual behaviorWith high probability, Exception is thrown, because an element is not zero. Regression?We haven't noticed this issue in .net6. Known WorkaroundsNo response Configuration.net version 7.0.100 Other informationNo response
|
Anecdotally, I can reproduce this on Windows 11 ARM. I cannot reproduce this on macOS with Apple Silicon. |
Thanks for reporting this issue. Unfortunately, its not reproing on my machine. Are you running with WKS or SVR GC? Would it be possible to share a dump when its non-zero (calling @cshung @PeterSolMS fyi. |
@mangod9 Dump is too big for GitHub even zipped, so attached here. https://1drv.ms/u/s!AqFBIHiaPFEtk0Q6Wf5oOTFCj1KW?e=kN6uHm I used a slightly modified version of the code, which is below. using System;
using System.Diagnostics;
Console.WriteLine(Environment.ProcessId);
var rand = new Random();
for (int k = 0; k < 1_000_000; k++)
{
Test(k);
}
Console.Write("Succeeded");
void Test(int iteration)
{
int size = (int)Math.Pow(10, 5 + rand.NextDouble() * (1 + rand.NextDouble()));
var m = new double[size];
for (int i = 0; i < size; i++)
{
if (m[i] != 0)
{
throw new Exception(m[i].ToString() + "; " + i + "/" + size + "; iteration " + iteration.ToString());
}
}
for (int i = 0; i < size; i++)
{
m[i] = 21;
}
} With a {
"configProperties": {
"System.GC.HeapHardLimit": 1000000000
}
}
This does not reproduce when I use the Server GC. When I run with |
Hi, its WorkstationGC, Console App, all default settings, except the System.GC.HeapHardLimit. Than you, @vcsjones |
Yes. Result:
|
It just seems to be some uninitialized value. |
Also of note: I can still reproduce this with |
ok, could you please share the dump for that case then? The one you shared has the array elements all set to 21, which is weird too. I want to check if only a few elements are uninitialized within it (most likely some heap corruption under low memory conditions) |
@mangod9 https://1drv.ms/u/s!AqFBIHiaPFEtk0Uw3R1wx0oNckIS?e=TNpBHb (I zipped it this time). |
Changing GC to older version with |
This doesn't repro for me on x64 either Intel or AMD. As the dumps are from ARM64, will try that next. |
Have you set the GC heap limit? We have easily reproduced this on different machines, both amd and intel. I will send memory dump for x64, if it helps. Btw, it usually fails withing a few seconds, so the number of iterations may be lowered if it runs too long. |
Interesting... Yes, I set the memory limit, but with an environment variable. Could you attach your .runtimeconfig.json file? |
Here is the dump for x64. I had to zip it, split into two files, and then change the suffix from zip.001 to 001.zip (you might need to change it back and use 7-zip to unpack it), because otherwise github refused to upload it, so sorry for the inconvenience. |
Environment variable works as well, but I think the value must be hexadecimal in that case.
|
Thank you for the dump! Right, for the environment variable, value must be hex, I put 3B9ACA00 which should be the same thing... |
Ok, now I can repro the issue... |
Purely curiosity, but what was the factor(s) in you not being able to reproduce it originally? |
Not quite sure why it wasn't reproing for me earlier. Could have been an incorrect hard limit specification. The symptoms indicate some incorrect handling at region boundaries. I modified the code to assign the array in the for (int i = 0; i < size; i++)
m[i] = iteration; with that I see this output, indicating a previous array instance is somehow getting assigned to the new instance possibly as part of GC relocation. Interestingly it doesnt repro with
And the offending array object is usually at the region boundary:
|
The bug is understood and fixed. Feel free to skip to the end for the fixes if the causes section appears too technical. CausesThere are two underlying issues there, both have something to do with overlapped commit. Overlapped commits:Suppose some memory address range was reserved but not committed, by the time we commit it, the operating system will make sure the memory we get is zero-initialized. Later on, we can de-commit them. The current process is no longer eligible to access the de-committed memory (accessing it would lead to AV), and the memory is eligible to be used by other processes. Suppose we commit again, we may or may not get back the same physical memory, but the operating system will guarantee the memory is zero-initialized again. Given the API, we could commit the same address range before de-committing it. Technically, this is a bug. But most operating systems will simply forgive us and did nothing about it. However, that does not guarantee the memory is zero-initialized. Instead, the memory just stays what it was, and that's why we observe the dirty content in the arrays. What's wrong in our particular case is that we forgot to de-commit and leading the future overlapped commits. Missed de-commits:
On the mark arraysWhen we are done with a GC, we determine if we should return some memory back to the operating system through de-commits. In this test case, we often find large chunks of memory left unused, so we often de-commit. Mark array is a data structure that we use to maintain information related to a region. So whenever we de-commit a region, we should also de-commit the corresponding mark array. But we didn't. Later on, when the region is needed again, we commit the corresponding mark array again. Whenever we commit, however, we increase a counter so that we know how much memory we committed, and when it reaches the heap hard limit, we need to back off. Without de-committing, the counter is double-counted. After some number of iterations, the double counting will lead us to the memory limit, and we started to hit the second issue. On the regions failing to commit mark arraysAfter exhausting the committed counter by the mark array, we are starting to fail to commit for the mark array. In that case, the region is no longer useful, and we have to delete it. Before my bug fix, the region is deleted directly to the global region allocator without de-committing first, so if the region is used again, the code assumes whatever we get from the region allocator is cleanly committed (because we did call commit on those memory addresses, assuming it is not committed, but that's an overlapping commit because we didn't de-commit), so we end up with dirty data in the user arrays. FixesI have a work-in-progress PR here that should fix the problem. The change simply de-commits the memory as we should have. Before the change, the code hit the exception almost right away. After the change (on the same base), the code survived at least 20k iterations on debug build, so it should be good functionally. Performance-wise, however, this is doing more de-commit operations, so it is likely to regress some scenarios. Although it isn't clear to me if that is avoidable. This particular test case is churning the LOH, and it is well known that churning the LOH is a bad idea for the GC, so I wouldn't worry too much about the performance of this test case in particular, but we need to make sure the general case is not regressed. If we are not in a hurry. I'd love to do more performance analysis on this before we merge this PR. Preferably after the vacation when the full team is available to work on it. |
Thanks @cshung for investigating. Wonder why this issue doesnt repro under GCStress? |
No idea why. This isn't the first time I see a situation like this. In general, I observed that these memory corruption repros are often quite fragile. Little changes to it often lead to not repro. Therefore, in practice, I tend to make as little change to the repro or the runtime as necessary for the investigation. For example, in this investigation - I hit an assert while running under debug mode and figured out the first bug related to the de-committing of the mark array. So the natural thing to do is to fix it. However, once I fixed it, on my box, the test case no longer repro after running for a whole night with that fix on. But it is not clear to me how can we get to a state where we have a dirty array content with just the mark array issue. In order to figure that out, I chose a less invasive approach to simply comment out the assert, and that's how I found the second bug. It is not perfect, and it doesn't explain why, but this pragmatic approach worked. I recommend it. |
Description
We have found, that elements of an array might not be always initialized to the default value. For example, after calling new double[size], some elements might be nonzero. We have started noticing this problem after upgrading to .net7. It seems, that this happens especially when the GC heap size limit is set. (Initially, we observed this with 8 GB limit.)
Reproduction Steps
Run this C# program:
With GC HeapHardLimit set to 1 GB
Expected behavior
We expect that Exception is not thrown, because elements of the array should be initialized to zero.
Actual behavior
With high probability, Exception is thrown, because an element is not zero.
Regression?
We haven't noticed this issue in .net6.
Known Workarounds
No response
Configuration
.net version 7.0.100
MS Windows 10 Pro, x64
Other information
No response
The text was updated successfully, but these errors were encountered: