-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows 11 Get-TpmSupportedFeature fails triggering failures in system log #852
Comments
Updated the title to be more focused on the direct issue at hand |
I will have a look at it when I get a chance. Had you tried previous Win11 versions? |
Howdy! I haven't tried previous win11 versions -- however I do have some new, semi bizarre, news. One of my coworkers installed Windows 11, using swtpm 0.7.3, under Ubuntu 24.04 and somehow it's working fine in that scenario. So tomorrow I will gatherer the component info from above for that OS and anything else we can discover. This may imply the problem lies in EL9 or in some configuration ovirt is using. (his setup is just libvirt/kvm based, doesn't involved ovirt) |
That definitely sounds promising! I won't be able to test upgrading libtpm and reactivating tpm on windows until tomorrow -- but we did get the component versions from Ubuntu 24.04: swtpm: 0.7.3-0ubuntu5 I'll try upgrading libtpms on Rocky 9 tomorrow to see if it makes a difference! Is there anything in the localca configs that could have any likelihood of having an effect btw? Nothing stood out to me. |
Especially compare the libtpms versions since this is the vTPM implementation. v0.9.1 is quite old and you seem to be using it. v0.9.6 is recent and may have some of the necessary bugfixes:
|
Unlikely I would say. |
Mostly just adding some notes here while I wait for a reinstall. ovirt is running swtpm like this: Simply updating libtpms to 0.9.6 did not immediately help. (I reenabled the TPM in ovirt and rebooted the machine and immediately the errors came back) However, I have now reset swtpm to the version that comes with the OS, 0.8.0, and made sure libtpms was still at 0.9.6, and performed a full recreation of the VM to make sure everything is fresh and it is in the process of reinstalling right now. |
On the Ubuntu 24.04 box that is working, the call is: One difference is that one is storing it's state and such on gluster while the other is a simple direct file system (xfs). Not sure why that would matter but bringing it up. Just to compare again -- swtpm configs in etc are---
one that doesn't:
basically precisely the same. |
Another data point -- this also worked just fine under Ubuntu 22.04 with the following components: Again I'm noticing the newer libtpms as you mentioned previously. =) |
Ok after rebuild of the VM, I'm still getting the same TPM error. I'm starting to wonder if Gluster is the problem somehow. I had found an issue in proxmox's issue tracker where swtpm was having problems when gluster was involved, but that appears to have something to do with how proxmox references their storage rather than anything to do with the storage itself. Any ideas of what else I could try? (and can you think of any reason gluster might cause a problem?) |
Well I just looked back at the command line under ovirt and -- none of those paths are in gluster so that shoots down that idea. =/ |
Edit: duh -- this is because I connected to it remotely. Nothing truly new here. Some new information included in the event log that I hadn't seen before:
|
Interesting. As root on the ub24 box, if I try to print states of a running tpm I get:
On the rl9 box, I don't get a perm denied:
|
What do you mean by 'rebuild of the VM'? My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%). What I would do at this point is build libtpms v0.9.6 (from git) and make sure that it gets installed in the right place and overwrites the old version of libtpms (with which Win11 had issues). Then try Win11 with this. If success, then build libtpms v0.9.3 (from git) and install it and start VM again. If it fails at this point it's something in libtpms.
I never used gluster. If you could eliminate this variable maybe that would help. So, if vTPM is working while the VM is in UEFI (there's a menu there where you can make changes to the TPM config like the choice of active PCR banks) then that would eliminate gluster IMO. |
I definitely am able to mess with it in the UEFI menu -- I tried that at one point in my desperate search for a solution. ;) Rebuild the VM just means I'm deleting the VM completely and recreating it. This deletes the previous swtpm files and regenerates them all from scratch. I don't know that there's even any point to doing that -- but as you can probably tell I'm grasping at straws so trying everything. Plus I thought maybe there's a chance a different version of swtpm might build its files slightly differently. I'll poke around with libtpms some more and see if I come across anything. I also just put selinux into permissive mode to see if that was causing a problem but no... it is not. =/ |
It's quite possible that this is due to (incomplete) AppArmor profile on Ubuntu. |
Aha -- so it would seem: TBH the syntax looks sane to me. That all said -- I'm not sure print-states works exactly the same with 0.7.3 as it idoes 0.8.x. If I put it in complain mode it looks like it's trying to take over the tpm instead of just looking at it:
It fails saying: So at least it didn't -really- mess with it. Anyway that was a bit of a tangent... Can you think of any good reason to downgrade the Rocky 9 box to 0.7.3 like Ubuntu has instead of sticking with 0.8.0? (or 0.8.2 which I temporarily updated to) |
Very random but -- every time I use wget on this host I'm getting:
(and then it proceeds to download what I asked for) That's from the host itself, not the VM. |
Iirc this is output from/due to the Intel TSS2 pkcs11 driver. |
So probably not related? Ok. Is there any reason at all I should continue to rebuild the tpm data files when doing this testing or is there practically no way that would be a problem? Like if I just do the equiv of "stop swtpm, replace rpm, start swtpm" should that be enough to debug what i'm doing? it most certainly gets annoying waiting for windows 11 to reinstall =) |
Do not reinstall win11 just to test vTPM. You may need to figure out what the uuid is of your VM (virsh dumpxml ) and then remove the swtpm state files once you stopped the VM: |
You're not going to believe this but:
It was the 2%! I upgraded openssl from 3.0.7 to fedora 37's 3.0.9 and the problem went away! Specifically I grabbed: https://rpmfind.net/linux/fedora/linux/updates/37/Everything/x86_64/Packages/o/openssl-3.0.9-1.fc37.x86_64.rpm (and -libs of course) Thanks so much for pointing me in the right direction! And I hope this little adventure helps you in future reports that might come in about it. =) |
Great! So we now know what the problem and solution is. If it's OpenSSL I would tell your distro. |
Done: https://bugs.rockylinux.org/view.php?id=6931 |
You're right, RHEL 9.4 uses |
And from swtpm_setup.conf:
Sure doesn't LOOK like sha1 is enabled to me. How odd. (unless of course I'm misreading this) |
Hm. I don't know what this is. If SHA 1 was being used for TPM2 selftests, which it isn't, the TPM 2 would be in failure mode if it didn't work and you wouldn't have said "Get-TPM, tpm.msc, etc all show a happy healthy TPM chip". So I don't know what is trying to get a signature with SHA-1 here and fails. Maybe Windows is trying it and the TPM doesn't do it? |
Yeah it's strictly the Get-TpmSupportedFeature call that triggers the error -- what that's asking for I do not know but I'd guess that this is Window's fault at some level. ;) but at least there's a workaround. I DO need to make sure setting that isn't going to interfere with our ability to join our active directory domain, but I doubt it will. I wonder if there's a way to see specifically what is being requested when get-tpmsupportedfeature is run. |
Since libvirt won't provide logging level parameters to swtpm you have to hook gdb onto the swtpm process and run the following command before you start the command that's causing the failure. I hope you have control over the Win tool that's actually causing the issue so that the request and response become visible in the log:
Now swtpm writes all TPM commands and responses into You can then run PS: you may have to install debuginfo packages: |
Here is the log of what happened when I ran: Get-TpmSupportedFeature |
Looking for a failure on the responses I found this command/response pair here that indicates a failure (it continued afterwards without failures):
The command that failed is TPM_CC_CertifyCreation with failure 0x101:
It's not clear to me why it is failing. You could try to set a break point on |
It's been .. probably over a decade since I did any serious debugging with gdb so I'm a tad rusty. So I'm attaching here what I saw in case this means anything to you. After the last line it continued on without intervention. |
Thanks for the log but there's nothing in there that would indicate a failure let alone the reason for the failure... |
Hrm that's unfortunate. Well -- I mean from my perspective I'm "happy enough" with the solution. I'm happy to help you debug it more if you really want to, since I have an active way to test it, but if you'd rather let it be I'm ok with that too since I have a viable workaround. What do you think? |
Of course I would be curious what is causing the failure but you would have to step through the code, possibly multiple times, with an eye on the value returned from the function or the functions it calls, especially the ones further down the call stack doing the RSA or ECDSA signature. [I would first set a breakpoint in this function again and then set breakpoints after function calls and at the/each return statement. Once you are in this function you can use Though for now we know that enabling SHA1 signing for OpenSSL on the host resolves the issue. It seems to be related to RSA/ECDSA signing with SHA1 that OpenSSL refuses to do due to the policy (sha1 signing not being allowed anymore for some time now) - so in a way we already know where it's coming from. It would be unfortunate that Microsoft hasn't updated the tool and switched it to SHA256 signing. |
Ok -- well probably tomorrow I'll dive into that a bit more and see what I can figure out! |
Once you have entered |
Hey -- I haven't made much of any headway poking around in gdb so far. One interesting thing that occurred, however, is that Windows started giving me slightly more interesting information. I'm not real sure what triggered it -- but it says that the TPM failed to execute a command, for the following two ordinals: Not sure if that's helpful in any way -- and for all i know that's just the result of gdb holding up the response while i poke around. But I wanted to share it in case there's any aha type of moment. The only slightly interesting but probably not interesting at all thing I've come across so far is that hashAlg is set to 13. For some reason gdb seems to get to a point where, despite me 'step'ing forward, it suddenly lurches forward to continue. Seemingly out of nowhere. |
Oh and neither of these break points ever caught: |
The TPM commands may time out while you single step through them and the application oR driver/OS may report this. |
Would it be helpful in any way for me to capture the same log level 10 output from a -successful- run with SHA1 enabled and upload that? |
Sure, you could do that. I would not expect any failures in such a log. |
CryptRsaSign and CrytptEccSign should be correct. Otherwise maybe add a breakpoint in CryptSign also.
|
One thing I'm noticing right off the bat is that it's FAR longer. |
As far as CryptRsaSign and CrytptEccSign -- I'm guessing it never makes it to those then. Hrm. I'll try CryptSign in a moment. |
There are no serious error messages in this log. What can be found are responses like these here where the TPM cannot store another object due to space restrictions. The client then does a TPM_SaveContext and tries again the failed command after creating space and then it works.
|
That's interesting -- that log is the "successful" run with sha1 enabled. Kind of surprised it has any failures at all. So this one has some minor little errors but no big deal. The 'bad' one fails on TPM2_CertifyCreation --- and i'm assuming works fine in this second run? |
I was able to break at CryptSign btw -- any chance part of the problem with CryptRsaSign and CrytptEccSign is that I'm running patched-by-redhat swtpm 0.8.0 and libtpms 0.9.1? Maybe those weren't there in those older versions? That all said I'm still not seeing anything that jumps out to me. There's so many checks and such that it's a little hard for me to follow without being more familiar with the code. (meaning I don't really know what to look for, what's "bad", that kinda thing) but I definitely don't see anything obvious happening. =/ |
Where did it go from there?
These functions have been around for quite a while... in fact CryptSign will call these OR call CryptHmacSign().
Once you break in CryptSign I would use 'display result' and then step through this function using 'next'. Hopefully it would show the value of result instead of indicating that it has been optimized out. |
I'm attaching a very long log of me poking around -- where you can clearly see CryptRsaSign being called so not sure why it didn't find it before. Afterwards I did a focused check using the method you descripted in case that is helpful: I did stop the "try3" one a little short but basically it proceeded back to main loop shortly after. But the result value appears to not have lined up with a success. |
not clear to me what's going on here ... |
Yeah ... if only we had the source to the microsoft TPM driver... <_< Well. I'm out of ideas. If I have any further ones I'll give them a shot and see what I can figure out, and if I have any solutions/suggestions/findings I'll share them with you in the future. I'm going to turn the ovirt stuff back over to my coworker so he can finish setting up a clean cluster and testing a few things. Thank you for working with me on this and maybe someday we'll figure out what is really going on or Microsoft will fix their driver. It hasn't been updated since 2006. At this point I suggest we close this issue as "has a workaround" basically. (I'll leave that to you -- in case you disagree and want to keep pursuing it =) ) |
Thanks for investigating. |
Describe the bug
Windows 11 is able to install just fine with swtpm backing it, and Get-TPM, tpm.msc, etc all show a happy healthy TPM chip, but when the SCCM agent tries to install (and retries over and over), we get a TPM error in the system log that repeats over and over as SCCM continues to try to access the virtual chip. Furthermore, Get-TpmSupportedFeature returns absolutely nothing, which I suspect is at the root of the problem, and each run of it generates an error.
Required: To Reproduce (without these steps your issue may be deleted)
Steps to reproduce the behavior/issue showing all commands on command line, needed XML or JSON (if necessary), etc.:
Note: This is going to be difficult unless you have an SCCM environment to test with. However if my suspicion is correct, we can simply test it by executing Get-TpmSupportedFeature
Expected behavior
It should return:
Desktop (please complete the following information):
Versions of relevant components
Log files
swtpmlog.txt
Please note I see zero errors from the host OS perspective... just from within Windows itself.
Event Viewer log entry reads:
The initialization of the Trusted Platform Module (TPM) failed. The TPM may be in failure mode. To allow diagnosis, contact the TPM manufacturer with the attached information.
With the following details:
swtpmwinevent.txt
Additional context
As far as I can tell, SCCM tries to access the TPM's features, finds nothing, and croaks out during initialization. It then retries immediately, failing over and over and over. A temporary workaround I have is to install the OS with the TPM assigned, then once it's installed 'rip out' the TPM in ovirt's configs. After that everything works fine. (Windows doesn't actually NEED the TPM unless you are doing something like bitlocker)
The text was updated successfully, but these errors were encountered: