-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure: System.Security.Cryptography.X509Certificates.Tests.CertificateCreation.CertificateRequestChainTests/CreateChain_Hybrid #25979
Comments
Not much to go on but we can start tracking .. |
But NTE_INTERNAL_ERROR is the moral equivalent of Debug.Assert. A late check says that something previously went wrong and didn't notice, and now it doesn't know what went wrong or why, just that hum was not the right answer. |
Same cause on a different test
What can we about this kind of flakiness? It happened on the same configuration, thus should we disable it for Win81? |
I don't think we can disable all of crypto on Win8.1 😄. I've reached out to Windows CNG to see if they have any advice. |
Interesting that it's the same test as before. While sending data to Windows I noticed that the second ever report of this error was for RSA (not DSA) and on Windows 10. The first was DSA, Windows version unknown, a different test. If there's a semi-reliable way to force this to happen I'd love to send a repro to Windows and let them catch it in action. |
I would try to run the test on a Windows 81 machine in a loop (200-500 times should suffice). |
Failed again but this time in System.Net.Security.Tests: |
Windows suggests that this error code mostly means there's a disk problem (e.g. disk full). If the frequency has started to pick up on this then perhaps our Windows 8.1 machines/images need some sort of cleanup. |
@MattGal disk space problems on win81 (see above). Can you please verify? |
Even if the disk/drive as a whole is fine, it may be that the CNG key directories are having issues (https://docs.microsoft.com/en-us/windows/desktop/seccng/key-storage-and-retrieval#key-directories-and-files). The most likely culprit being |
What's your recommendation? Move all tests to Outerloop for that one configuration? Or could there be a way to diagnose the health of the machine? |
It would literally mean every crypto test and every System.Net.Security test. So, I don't think that would be a good strategy (particularly if the problem is environmental).
I'm open to suggestion. Adding a cctor which prints the size on disk remaining and number of files in Can we inject a diagnostic |
@bartonjs / @ViktorHofer Unfortunately the machine(s) that ran your tests have long since been recycled, but I am very skeptical of the "disk full" theory going on here. The Windows 8.1 VMs get created with 100+ GB free disk space on the C:\ drive where %APPDATA% is located, refuse to take work if there's less than 3 GB available on this disk, and get deleted whenever there isn't active work to be done. Have you actually run this on a Windows 8.1 / Server 2K12R2 machine? It might be an actual problem, though it could be caused by other work items messing with the machine state too. |
@MattGal Yep, my main workstation is 2K12R2. I've never seen NTE_INTERNAL_ERROR on it. |
@bartonjs good to hear it works on your box :) I'm standing up a machine to investigate and will share a repro if I get you one. |
@bartonjs no repro, I'm letting it loop a few hundred times on a machine with the same image to check it out. I also checked the execution history of a00007K and noted only 40 other work items ran before it (none of which, in a spot check, logged "now ruining certs on the machine", though they could be at fault) ... this would indicate the odds of the disk being full when that happeend are close to 0. I think the best possible way to investigate this would be to catch it at a time when the machine having the trouble is still alive, and let me know via Teams so I can make it not get cleaned up, and jump to it. |
Thanks a lot for spending time on this. I will let you when it happens again. |
Just hit something similar in in the test System.Security.Cryptography.Dsa.Tests.DSASignVerify_Array.InvalidKeySize_DoesNotInvalidateKey Starting: System.Security.Cryptography.Cng.Tests (parallel test collections = on, max threads = 2) System.Security.Cryptography.Dsa.Tests.DSASignVerify_Array.InvalidKeySize_DoesNotInvalidateKey [FAIL] Internal.Cryptography.CryptoThrowHelper+WindowsCryptographicException : An internal consistency check failed. Stack Trace: /_/src/libraries/System.Security.Cryptography.Cng/src/System/Security/Cryptography/CngKey.Create.cs(54,0): at System.Security.Cryptography.CngKey.Create(CngAlgorithm algorithm, String keyName, CngKeyCreationParameters creationParameters) /_/src/libraries/System.Security.Cryptography.Cng/src/Internal/Cryptography/CngAlgorithmCore.cs(72,0): at Internal.Cryptography.CngAlgorithmCore.GetOrGenerateKey(Int32 keySize, CngAlgorithm algorithm) /_/src/libraries/System.Security.Cryptography.Cng/src/System/Security/Cryptography/DSACng.Key.cs(24,0): at System.Security.Cryptography.DSACng.get_Key() /_/src/libraries/System.Security.Cryptography.Cng/src/System/Security/Cryptography/DSACng.Key.cs(51,0): at System.Security.Cryptography.DSACng.GetDuplicatedKeyHandle() /_/src/libraries/Common/src/System/Security/Cryptography/DSACng.SignVerify.cs(178,0): at System.Security.Cryptography.DSACng.ComputeQLength() /_/src/libraries/Common/src/System/Security/Cryptography/DSACng.SignVerify.cs(156,0): at System.Security.Cryptography.DSACng.AdjustHashSizeIfNecessary(ReadOnlySpan`1 hash, Span`1 stackBuf) /_/src/libraries/Common/src/System/Security/Cryptography/DSACng.SignVerify.cs(32,0): at System.Security.Cryptography.DSACng.CreateSignature(Byte[] rgbHash) /_/src/libraries/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/DSA.cs(136,0): at System.Security.Cryptography.DSA.SignData(Byte[] data, Int32 offset, Int32 count, HashAlgorithmName hashAlgorithm) /_/src/libraries/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/DSA.cs(88,0): at System.Security.Cryptography.DSA.SignData(Byte[] data, HashAlgorithmName hashAlgorithm) /_/src/libraries/Common/tests/System/Security/Cryptography/AlgorithmImplementations/DSA/DSASignVerify.cs(13,0): at System.Security.Cryptography.Dsa.Tests.DSASignVerify_Array.SignData(DSA dsa, Byte[] data, HashAlgorithmName hashAlgorithm) /_/src/libraries/Common/tests/System/Security/Cryptography/AlgorithmImplementations/DSA/DSASignVerify.cs(119,0): at System.Security.Cryptography.Dsa.Tests.DSASignVerify.InvalidKeySize_DoesNotInvalidateKey() System.Security.Cryptography.Rsa.Tests.KeyGeneration.GenerateMaxKey [SKIP] Condition(s) not met: "IsStressModeEnabled" Finished: System.Security.Cryptography.Cng.Tests === TEST EXECUTION SUMMARY === System.Security.Cryptography.Cng.Tests Total: 1165, Errors: 0, Failed: 1, Skipped: 1, Time: 78.518s |
Interesting. This is the only failure like this in the past 1000+ runnings of this work item, but the machine is actually still around; the machine this failed on actually passed this test around 2020-07-21 12:16 UTC, then failed it in this run at 2020-07-21 14:52, running 32 other work items in between all of which passed and didn't crash. If there's something super interesting about the machine doing this, ping me ASAP and I can try to keep it around, otherwise I'd chalk this up to an unusual windows state issue. |
failed again in job: runtime-coreclr libraries-jitstress2-jitstressregs 20200808.1 failed test: System.Security.Cryptography.X509Certificates.Tests.DynamicChainTests.BuildInvalidSignatureTwice(endEntityErrors: NotSignatureValid, intermediateErrors: NoError, rootErrors: UntrustedRoot) Error message
|
To quote @bartonjs :
|
@bartonjs since this has been open for more than 3 years and we don't have any good path forward to fix or repro this would it make sense to perhaps add retry logic inside |
Assuming this query below catches all failures covered by this issue (?) then it's failed 4 times in the last 2 months. Failures were within either @bartonjs @vcsjones thoughts about some kind of workaround here? without a plan I guess we'll just hit this in a PR periodically, albeit not super often. TestResults
| join kind=inner WorkItems on WorkItemId
| join kind=inner Jobs on JobId
| where Finished >= now(-60d)
| where Result == "Fail"
| where FriendlyName startswith "System.Security.Cryptography."
| where Message contains "consistency check"
| project
Pipeline = tostring(parse_json(Properties).DefinitionName),
Pipeline_Configuration = tostring(parse_json(Properties).configuration),
OS = QueueName,
Arch = tostring(parse_json(Properties).architecture)
//Test = Type1,
//Result
,Finished,
//Duration,
Method,
//Build = tostring(parse_json(Properties).BuildNumber),
Message,
FriendlyName//,
,StackTrace
|
We added a retry a long while ago, I don't believe it helped. The underlying OS team's response, essentially, was that if we reimaged our test machines more often we'd probably never see the problems (and that they've never seen these sorts of errors on normal user machines). |
For every queue listed in the above query we stay up to date with the most recent Azure gallery images for Server 2K12 R2 and 2016, so the images aren't too old, and most of these machines only live a few hours (pick any work item where this failed and I can demonstrate this via Kusto query). Is it possible that the problem isn't "we don't reimage often enough"? |
No hit in last 10 days (Runfo says 3 hits per month), removing blocking-clean-ci label. |
Just failed here on Windows 10:
|
There are two things we can do here that might help
The first isn't for 7; the second we can try once higher priority issues run out (which may or may not still be during 7). |
New hit in release/7.0 branch - 9/12 Rolling run 13309: Platform net7.0-windows-Release-x64-NativeAOT_Release-(Windows.Nano.1809.Amd64.Open)windows.10.amd64.serverrs5.open:
|
Another hit on different test in release/7.0 branch - 9/13 Rolling run 16085: Platform: net7.0-windows-Release-x64-CoreCLR_release-(Windows.Nano.1809.Amd64.Open)windows.10.amd64.serverrs5.open
|
@bartonjs not sure if we were just lucky to get the failure 2x recently, or if it is more common now from some reason. |
Another hit in release/7.0-rc2 branch - 9/22 Rolling run 26747: Platform: net7.0-windows-Release-x64-CoreCLR_release-(Windows.Nano.1809.Amd64.Open)windows.10.amd64.serverrs5.open
|
@jeffhandley this seems to be happening quite a bit lately seemingly on the mono configurations. Can you have someone take a look? |
The test
System.Security.Cryptography.X509Certificates.Tests.CertificateCreation.CertificateRequestChainTests/CreateChain_Hybrid
has failed.Failing configurations:
Runfo Tracking Issue: buildinvalidsignaturetwice
Build Result Summary
Known Issue Error Message
Fill the error message using known issues guidance.
The text was updated successfully, but these errors were encountered: