Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows 11 Get-TpmSupportedFeature fails triggering failures in system log #852

Closed
ncstate-daniel opened this issue May 29, 2024 · 59 comments
Labels
OpenSSL 3.0 Issue related to OpenSSL 3.0 Rocky Linux 9 Issue in Rocky Linux 9 SHA1-Signature Failure Signing a SHA1 fails Win11 Windows 11 Windows

Comments

@ncstate-daniel
Copy link

Describe the bug

Windows 11 is able to install just fine with swtpm backing it, and Get-TPM, tpm.msc, etc all show a happy healthy TPM chip, but when the SCCM agent tries to install (and retries over and over), we get a TPM error in the system log that repeats over and over as SCCM continues to try to access the virtual chip. Furthermore, Get-TpmSupportedFeature returns absolutely nothing, which I suspect is at the root of the problem, and each run of it generates an error.

Required: To Reproduce (without these steps your issue may be deleted)

Steps to reproduce the behavior/issue showing all commands on command line, needed XML or JSON (if necessary), etc.:

Note: This is going to be difficult unless you have an SCCM environment to test with. However if my suspicion is correct, we can simply test it by executing Get-TpmSupportedFeature

  1. Install a fresh Windows 11 machine under, in this case, ovirt under Rocky Linux 9 (including the expected os drivers for virtio and whatnot)
  2. Connect to your new Windows 11 machine and run Get-TpmSupportedFeature in powershell (it should return absolutely nothing, may even pause)
  3. Open Event Viewer and go to the System log (you should see a system error for TPM, attached details below)
  4. Repeat the powershell command multiple times and watch it generate new System logs

Expected behavior

It should return:

PS C:\Users\daniel.admin> get-tpmsupportedfeature
key attestation

Desktop (please complete the following information):

  • OS: Windows 11
  • Version 23H2

Versions of relevant components

  • swtpm: 0.7.3, 0.8.0, 0.8.2 (tried all three)
  • libtpms: 0.9.1
  • openssl: 3.0.7
  • gnutls: 3.8.3
  • ovirt: 4.5.6
  • Host OS: Rocky Linux 9

Log files
swtpmlog.txt
Please note I see zero errors from the host OS perspective... just from within Windows itself.

Event Viewer log entry reads:
The initialization of the Trusted Platform Module (TPM) failed. The TPM may be in failure mode. To allow diagnosis, contact the TPM manufacturer with the attached information.
With the following details:
swtpmwinevent.txt

Additional context
As far as I can tell, SCCM tries to access the TPM's features, finds nothing, and croaks out during initialization. It then retries immediately, failing over and over and over. A temporary workaround I have is to install the OS with the TPM assigned, then once it's installed 'rip out' the TPM in ovirt's configs. After that everything works fine. (Windows doesn't actually NEED the TPM unless you are doing something like bitlocker)

@ncstate-daniel ncstate-daniel changed the title Windows 11 w/ SCCM spamming failures in system log Windows 11 Get-TpmSupportedFeature fails triggering failures in system log May 29, 2024
@ncstate-daniel
Copy link
Author

Updated the title to be more focused on the direct issue at hand

@stefanberger
Copy link
Owner

I will have a look at it when I get a chance. Had you tried previous Win11 versions?

@ncstate-daniel
Copy link
Author

Howdy! I haven't tried previous win11 versions -- however I do have some new, semi bizarre, news. One of my coworkers installed Windows 11, using swtpm 0.7.3, under Ubuntu 24.04 and somehow it's working fine in that scenario. So tomorrow I will gatherer the component info from above for that OS and anything else we can discover. This may imply the problem lies in EL9 or in some configuration ovirt is using. (his setup is just libvirt/kvm based, doesn't involved ovirt)

@ncstate-daniel
Copy link
Author

That definitely sounds promising! I won't be able to test upgrading libtpm and reactivating tpm on windows until tomorrow -- but we did get the component versions from Ubuntu 24.04:

swtpm: 0.7.3-0ubuntu5
libtpms: 0.9.3-0ubuntu4
openssl: 3.0.13-0ubuntu3.1
gnutls: 3.8.3-1.1ubuntu3.1
ovirt: N/A
Host OS: Ubuntu 24.04

I'll try upgrading libtpms on Rocky 9 tomorrow to see if it makes a difference!

Is there anything in the localca configs that could have any likelihood of having an effect btw? Nothing stood out to me.

@stefanberger
Copy link
Owner

stefanberger commented May 30, 2024

Especially compare the libtpms versions since this is the vTPM implementation. v0.9.1 is quite old and you seem to be using it. v0.9.6 is recent and may have some of the necessary bugfixes:

version 0.9.7 [not yet released]:
  - tpm2: Fix issue in CryptParameterEncryption() (TPM 2 errata v1.4)
  - tpm2: Sync fix in TPM2_PolicyAuthorize() with upstream
  - tpm2: Sync CryptParameterDecrypt implementation  with upstream
  - tpm2: Fix issue related to CryptGenerateKeyDes (TPM 2 errata v1.4)
  - tpm2: Check size of TPM2B_NAME buffer before reading 2 bytes from it

version 0.9.6:
  - tpm2: Check size of buffer before accessing it (CVE-2023-1017 & -1018)

version 0.9.5:
  - tpm2: Do not set RSA_FLAG_NO_BLINDING on RSA keys anymore
  - tpm2: Fix a potential overflow expression (coverity)
  - tpm2: Fix size check in CryptSecretDecrypt

version 0.9.4:
  - tpm: #undef printf in case it is #define'd (OSS-Fuzz)
  - tpm2: Check return code of BN_div()
  - tpm2: Initialize variables due to gcc complaint (s390x, false positive)
  - tpm12: Initialize variables due to gcc complaint (s390x, false positive)
  - build-sys: Fix configure script to support _FORTIFY_SOURCE=3

version 0.9.3:
  - build-sys: Add probing for -fstack-protector
  - tpm2: Do not call EVP_PKEY_CTX_set0_rsa_oaep_label() for label of size
    (OSSL 3)

version 0.9.2:
  - tpm2: When writing state initialize s_ContextSlotMask if not set

@stefanberger
Copy link
Owner

Is there anything in the localca configs that could have any likelihood of having an effect btw? Nothing stood out to me.

Unlikely I would say.

@ncstate-daniel
Copy link
Author

Mostly just adding some notes here while I wait for a reinstall.

ovirt is running swtpm like this:
/usr/bin/swtpm socket --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/20-mesmer-wn11.chass.nc-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/fb32dca3-4e05-43f6-9482-da650272ce9d/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/mesmer-wn11.chass.ncsu.edu-swtpm.log --terminate --tpm2

Simply updating libtpms to 0.9.6 did not immediately help. (I reenabled the TPM in ovirt and rebooted the machine and immediately the errors came back) However, I have now reset swtpm to the version that comes with the OS, 0.8.0, and made sure libtpms was still at 0.9.6, and performed a full recreation of the VM to make sure everything is fresh and it is in the process of reinstalling right now.

@ncstate-daniel
Copy link
Author

On the Ubuntu 24.04 box that is working, the call is:
/usr/bin/swtpm socket --ctrl type=unixio,path=/run/libvirt/qemu/swtpm/8-skeksis-swtpm.sock,mode=0600 --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600 --log file=/var/log/swtpm/libvirt/qemu/skeksis-swtpm.log --terminate --tpm2
I see no real difference there.

One difference is that one is storing it's state and such on gluster while the other is a simple direct file system (xfs). Not sure why that would matter but bringing it up.

Just to compare again -- swtpm configs in etc are---
one that works:

root@mystic:/home/foreman_re# cat /etc/swtpm*
statedir = /var/lib/swtpm-localca
signingkey = /var/lib/swtpm-localca/signkey.pem
issuercert = /var/lib/swtpm-localca/issuercert.pem
certserial = /var/lib/swtpm-localca/certserial
--platform-manufacturer Fedora
--platform-version 2.1
--platform-model QEMU
# Program invoked for creating certificates
create_certs_tool= /usr/bin/swtpm_localca
create_certs_tool_config = /etc/swtpm-localca.conf
create_certs_tool_options = /etc/swtpm-localca.options
# Comma-separated list (no spaces) of PCR banks to activate by default
active_pcr_banks = sha256

one that doesn't:

[root@hss53vm foreman_re]# cat /etc/swtpm*
statedir = /var/lib/swtpm-localca
signingkey = /var/lib/swtpm-localca/signkey.pem
issuercert = /var/lib/swtpm-localca/issuercert.pem
certserial = /var/lib/swtpm-localca/certserial
--platform-manufacturer Fedora
--platform-version 2.1
--platform-model QEMU
# Program invoked for creating certificates
create_certs_tool= /usr/bin/swtpm_localca
create_certs_tool_config = /etc/swtpm-localca.conf
create_certs_tool_options = /etc/swtpm-localca.options
# Comma-separated list (no spaces) of PCR banks to activate by default
active_pcr_banks = sha256

basically precisely the same.

@ncstate-daniel
Copy link
Author

Another data point -- this also worked just fine under Ubuntu 22.04 with the following components:
swtpm: 0.6.3-0ubuntu3.2
libtpms: 0.9.3-0ubuntu1.22.04.1
openssl: 3.0.2-0ubuntu1.15
gnutls: 3.7.3-4ubuntu1.5
ovirt: N/A
Host OS: Ubuntu 22.02

Again I'm noticing the newer libtpms as you mentioned previously. =)

@ncstate-daniel
Copy link
Author

ncstate-daniel commented May 30, 2024

Ok after rebuild of the VM, I'm still getting the same TPM error. I'm starting to wonder if Gluster is the problem somehow. I had found an issue in proxmox's issue tracker where swtpm was having problems when gluster was involved, but that appears to have something to do with how proxmox references their storage rather than anything to do with the storage itself. Any ideas of what else I could try? (and can you think of any reason gluster might cause a problem?)

@ncstate-daniel
Copy link
Author

Well I just looked back at the command line under ovirt and -- none of those paths are in gluster so that shoots down that idea. =/

@ncstate-daniel
Copy link
Author

ncstate-daniel commented May 30, 2024

Edit: duh -- this is because I connected to it remotely. Nothing truly new here.

Some new information included in the event log that I hadn't seen before:

The description for Event ID 27 from source TPM cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event: 

30
405
1229081856
1398218784
542396493
0
0
1
538513443
1455670
0

The system cannot find the file specified

@ncstate-daniel
Copy link
Author

Interesting. As root on the ub24 box, if I try to print states of a running tpm I get:

root@mystic:/home/foreman_re# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600
swtpm: SWTPM_NVRAM_Lock_Dir: Could not open lockfile: Permission denied

On the rl9 box, I don't get a perm denied:

[root@hss52vm x86_64]# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/fb32dca3-4e05-43f6-9482-da650272ce9d/tpm2,mode=0600
{ "type": "swtpm", "states": [] }

@stefanberger
Copy link
Owner

Ok after rebuild of the VM, I'm still getting the same TPM error.

What do you mean by 'rebuild of the VM'?

My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%).

What I would do at this point is build libtpms v0.9.6 (from git) and make sure that it gets installed in the right place and overwrites the old version of libtpms (with which Win11 had issues). Then try Win11 with this. If success, then build libtpms v0.9.3 (from git) and install it and start VM again. If it fails at this point it's something in libtpms.

I'm starting to wonder if Gluster is the problem somehow. I had found an issue in proxmox's issue tracker where swtpm was having problems when gluster was involved, but that appears to have something to do with how proxmox references their storage rather than anything to do with the storage itself. Any ideas of what else I could try? (and can you think of any reason gluster might cause a problem?)

I never used gluster. If you could eliminate this variable maybe that would help. So, if vTPM is working while the VM is in UEFI (there's a menu there where you can make changes to the TPM config like the choice of active PCR banks) then that would eliminate gluster IMO.

@ncstate-daniel
Copy link
Author

I definitely am able to mess with it in the UEFI menu -- I tried that at one point in my desperate search for a solution. ;)

Rebuild the VM just means I'm deleting the VM completely and recreating it. This deletes the previous swtpm files and regenerates them all from scratch. I don't know that there's even any point to doing that -- but as you can probably tell I'm grasping at straws so trying everything. Plus I thought maybe there's a chance a different version of swtpm might build its files slightly differently.

I'll poke around with libtpms some more and see if I come across anything. I also just put selinux into permissive mode to see if that was causing a problem but no... it is not. =/

@stefanberger
Copy link
Owner

Interesting. As root on the ub24 box, if I try to print states of a running tpm I get:

root@mystic:/home/foreman_re# swtpm socket --print-states --tpmstate dir=/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2,mode=0600
swtpm: SWTPM_NVRAM_Lock_Dir: Could not open lockfile: Permission denied

It's quite possible that this is due to (incomplete) AppArmor profile on Ubuntu.

@ncstate-daniel
Copy link
Author

Aha -- so it would seem:
[609320.685646] audit: type=1400 audit(1717082960.702:4389): apparmor="DENIED" operation="open" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=53739 comm="swtpm" requested_mask="wc" denied_mask="wc" fsuid=0 ouid=131

TBH the syntax looks sane to me. That all said -- I'm not sure print-states works exactly the same with 0.7.3 as it idoes 0.8.x. If I put it in complain mode it looks like it's trying to take over the tpm instead of just looking at it:

[610152.692707] audit: type=1400 audit(1717083792.711:4427): apparmor="ALLOWED" operation="open" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=54072 comm="swtpm" requested_mask="wc" denied_mask="wc" fsuid=0 ouid=131
[610152.692723] audit: type=1400 audit(1717083792.711:4428): apparmor="ALLOWED" operation="truncate" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=54072 comm="swtpm" requested_mask="w" denied_mask="w" fsuid=0 ouid=131
[610152.692765] audit: type=1400 audit(1717083792.711:4429): apparmor="ALLOWED" operation="file_lock" class="file" profile="swtpm" name="/var/lib/libvirt/swtpm/af05c934-da3b-4fd1-9ca6-e7eb6003d168/tpm2/.lock" pid=54072 comm="swtpm" requested_mask="wk" denied_mask="wk" fsuid=0 ouid=131

It fails saying:
swtpm: SWTPM_NVRAM_Lock_Dir: Could not lock access to lockfile: Resource temporarily unavailable

So at least it didn't -really- mess with it.

Anyway that was a bit of a tangent... Can you think of any good reason to downgrade the Rocky 9 box to 0.7.3 like Ubuntu has instead of sticking with 0.8.0? (or 0.8.2 which I temporarily updated to)

@ncstate-daniel
Copy link
Author

Very random but -- every time I use wget on this host I'm getting:

ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-tabrmd.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpmrm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tcti-swtpm.c:614:Tss2_Tcti_Swtpm_Init() Cannot connect to swtpm TPM socket
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-swtpm.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-mssim.so.0
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:254:tctildr_get_default() No standard TCTI could be loaded
ERROR:tcti:src/tss2-tcti/tctildr.c:428:Tss2_TctiLdr_Initialize_Ex() Failed to instantiate TCTI
ERROR:fapi:src/tss2-fapi/api/Fapi_Initialize.c:227:Fapi_Initialize_Finish() Initializing TCTI. ErrorCode (0x000a000a)

(and then it proceeds to download what I asked for)

That's from the host itself, not the VM.

@stefanberger
Copy link
Owner

Very random but -- every time I use wget on this host I'm getting:

ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-tabrmd.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpmrm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
ERROR:tcti:src/tss2-tcti/tcti-device.c:452:Tss2_Tcti_Device_Init() Failed to open specified TCTI device file /dev/tpm0: No such file or directory
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-device.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tcti-swtpm.c:614:Tss2_Tcti_Swtpm_Init() Cannot connect to swtpm TPM socket
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-swtpm.so.0
WARNING:tcti:src/util/io.c:262:socket_connect() Failed to connect to host 127.0.0.1, port 2321: errno 111: Connection refused
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:154:tcti_from_file() Could not initialize TCTI file: libtss2-tcti-mssim.so.0
ERROR:tcti:src/tss2-tcti/tctildr-dl.c:254:tctildr_get_default() No standard TCTI could be loaded
ERROR:tcti:src/tss2-tcti/tctildr.c:428:Tss2_TctiLdr_Initialize_Ex() Failed to instantiate TCTI
ERROR:fapi:src/tss2-fapi/api/Fapi_Initialize.c:227:Fapi_Initialize_Finish() Initializing TCTI. ErrorCode (0x000a000a)

Iirc this is output from/due to the Intel TSS2 pkcs11 driver.

@ncstate-daniel
Copy link
Author

So probably not related? Ok.

Is there any reason at all I should continue to rebuild the tpm data files when doing this testing or is there practically no way that would be a problem? Like if I just do the equiv of "stop swtpm, replace rpm, start swtpm" should that be enough to debug what i'm doing? it most certainly gets annoying waiting for windows 11 to reinstall =)

@stefanberger
Copy link
Owner

stefanberger commented May 30, 2024

So probably not related? Ok.

Is there any reason at all I should continue to rebuild the tpm data files when doing this testing or is there practically no way that would be a problem? Like if I just do the equiv of "stop swtpm, replace rpm, start swtpm" should that be enough to debug what i'm doing? it most certainly gets annoying waiting for windows 11 to reinstall =)

Do not reinstall win11 just to test vTPM. You may need to figure out what the uuid is of your VM (virsh dumpxml ) and then remove the swtpm state files once you stopped the VM: rm -rf /var/lib/libvirt/swtpm/<uuid>. Once you restart the VM swtpm_localca will be called and create the initial state. You should see this in the log (/var/log/swtpm/libvirt/qemu/<vmname>-swtpm.log).

@ncstate-daniel
Copy link
Author

ncstate-daniel commented May 30, 2024

You're not going to believe this but:

My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%).

It was the 2%! I upgraded openssl from 3.0.7 to fedora 37's 3.0.9 and the problem went away!

Specifically I grabbed: https://rpmfind.net/linux/fedora/linux/updates/37/Everything/x86_64/Packages/o/openssl-3.0.9-1.fc37.x86_64.rpm (and -libs of course)

Thanks so much for pointing me in the right direction! And I hope this little adventure helps you in future reports that might come in about it. =)

@stefanberger
Copy link
Owner

You're not going to believe this but:

My guess is that the issue has nothing to do with the version of swtpm but either libtpms (98%) or libtpms interface with openssl (2%).

It was the 2%! I upgraded openssl from 3.0.7 to fedora 37's 3.0.9 and the problem went away!

Great! So we now know what the problem and solution is. If it's OpenSSL I would tell your distro.
Ideally your distro would also update libtpms after the CVE fixes etc. that went in in the meantime.

@stefanberger stefanberger added OpenSSL 3.0 Issue related to OpenSSL 3.0 Rocky Linux 9 Issue in Rocky Linux 9 labels May 30, 2024
@ncstate-daniel
Copy link
Author

Done: https://bugs.rockylinux.org/view.php?id=6931
I don't know if they'd be willing to upgrade beyond what RHEL9 provides, so I may need to take it up to Red Hat instead, but figured I would start from here and see where things go.

@stefanberger
Copy link
Owner

Done: https://bugs.rockylinux.org/view.php?id=6931
I don't know if they'd be willing to upgrade beyond what RHEL9 provides, so I may need to take it up to Red Hat instead, but figured I would start from here and see where things go.

You're right, RHEL 9.4 uses openssl-3.0.7-27.el9.x86_64.rpm. Strange but I haven't heard anything about this so far. Maybe they applied some necessary patches 'hidden' in release version '-27'.

@ncstate-daniel
Copy link
Author

ncstate-daniel commented Jun 19, 2024

Screenshot 2024-06-19 110155

And from swtpm_setup.conf:

# Program invoked for creating certificates
create_certs_tool= /usr/bin/swtpm_localca
create_certs_tool_config = /etc/swtpm-localca.conf
create_certs_tool_options = /etc/swtpm-localca.options
# Comma-separated list (no spaces) of PCR banks to activate by default
active_pcr_banks = sha256

Sure doesn't LOOK like sha1 is enabled to me. How odd. (unless of course I'm misreading this)

@stefanberger
Copy link
Owner

Hm. I don't know what this is. If SHA 1 was being used for TPM2 selftests, which it isn't, the TPM 2 would be in failure mode if it didn't work and you wouldn't have said "Get-TPM, tpm.msc, etc all show a happy healthy TPM chip". So I don't know what is trying to get a signature with SHA-1 here and fails. Maybe Windows is trying it and the TPM doesn't do it?

@ncstate-daniel
Copy link
Author

Yeah it's strictly the Get-TpmSupportedFeature call that triggers the error -- what that's asking for I do not know but I'd guess that this is Window's fault at some level. ;) but at least there's a workaround. I DO need to make sure setting that isn't going to interfere with our ability to join our active directory domain, but I doubt it will. I wonder if there's a way to see specifically what is being requested when get-tpmsupportedfeature is run.

@stefanberger
Copy link
Owner

stefanberger commented Jun 19, 2024

Since libvirt won't provide logging level parameters to swtpm you have to hook gdb onto the swtpm process and run the following command before you start the command that's causing the failure. I hope you have control over the Win tool that's actually causing the issue so that the request and response become visible in the log:

$ sudo gdb pid 1321582
[...]
(gdb) call (int)log_set_level(10)
$1 = 0
(gdb) c

Now swtpm writes all TPM commands and responses into /var/log/swtpm/libvirt/qemu/<vm name>-swtpm.log.

You can then run call (int)log_set_level(0) to turn off the logging again.

PS: you may have to install debuginfo packages: sudo dnf debuginfo-install swtpm libtpms.

@ncstate-daniel
Copy link
Author

swtpm_log_win11_tpm.txt

Here is the log of what happened when I ran: Get-TpmSupportedFeature

@stefanberger
Copy link
Owner

Looking for a failure on the responses I found this command/response pair here that indicates a failure (it continued afterwards without failures):

 SWTPM_IO_Read: length 225
 80 02 00 00 00 E1 00 00 01 4A 80 00 00 02 80 00
 00 02 00 00 00 49 02 00 00 00 00 20 47 B6 8C 4E
 73 6D 5B EB 8E FD C8 2F 28 CF FA 90 B3 62 F9 E9
 55 5A 8D F2 5D 64 E7 25 8A 82 A9 C7 00 00 20 54
 1D 28 84 3A 59 71 B3 DC 9C E9 DC B4 44 2B 0F 09
 A9 38 DD 80 7E B9 AD 0D 06 98 E7 22 75 F7 81 00
 14 3C 86 1B 8F BA 1E 7D 9E B3 63 00 66 E8 A4 EB
 6E C3 03 D5 7B 00 20 02 B3 D6 FC 4F 8B 2C A5 E5
 EF BE 23 88 B4 9E 85 1A 7A 69 7E 07 20 FF 73 89
 BD 7A 6E 8E 3A AC AD 00 10 80 21 40 00 00 01 00
 40 C2 42 6B 94 F4 7C FB DB A7 13 CD 9A 56 5E 5C
 F1 78 0E 0D 86 FC 94 5C 1B 8A F2 B3 DF AC 0F 9F
 27 10 41 16 13 92 14 A2 6B 2F 0F BF 0D B8 78 76
 FB 7F 04 38 17 93 1C CA 6E 26 A7 CB 59 82 B7 49
 D1
 SWTPM_IO_Write: length 10
 80 01 00 00 00 0A 00 00 01 01

The command that failed is TPM_CC_CertifyCreation with failure 0x101:

$ tssreturncode 0x101
TPM_RC_FAILURE - commands not being accepted because of a TPM failure
 80 02 00 00 00 E1 00 00 01 4A                      header: CC = TPM_CC_CertifyCreation
 80 00 00 02                                        handle of previously loaded key
 80 00 00 02                                        handle of previously loaded key

 00 00 00 49                                        authSize 
 
 02 00 00 00                                        sessionHandle

 00 20 47 B6 8C 4E                                  32-byte TPM2B buffer -- nonce?
 73 6D 5B EB 8E FD C8 2F 28 CF FA 90 B3 62 F9 E9
 55 5A 8D F2 5D 64 E7 25 8A 82 A9 C7 
 
 00                                                 sessionAttributes

 00 20 54                                           32-byte TPM2B buffer -- hmac?
 1D 28 84 3A 59 71 B3 DC 9C E9 DC B4 44 2B 0F 09
 A9 38 DD 80 7E B9 AD 0D 06 98 E7 22 75 F7 81

 00                                                 20-byte TPM2B_DATA qualifyingHash
 14 3C 86 1B 8F BA 1E 7D 9E B3 63 00 66 E8 A4 EB
 6E C3 03 D5 7B 

 00 20 02 B3 D6 FC 4F 8B 2C A5 E5                   32-byte TPM2B_DIGEST creationHash
 EF BE 23 88 B4 9E 85 1A 7A 69 7E 07 20 FF 73 89
 BD 7A 6E 8E 3A AC AD 

 00 10                                              TPM_ALG_NULL	   -- inScheme

 80 21                                              TPM_ST_CREATION    -- start of creationTicket

 40 00 00 01                                        TPM_RH_OWNER (hierarchy)
 00                                                 64-byte TPM2B_DIGEST
 40 C2 42 6B 94 F4 7C FB DB A7 13 CD 9A 56 5E 5C
 F1 78 0E 0D 86 FC 94 5C 1B 8A F2 B3 DF AC 0F 9F
 27 10 41 16 13 92 14 A2 6B 2F 0F BF 0D B8 78 76
 FB 7F 04 38 17 93 1C CA 6E 26 A7 CB 59 82 B7 49
 D1

It's not clear to me why it is failing. You could try to set a break point on TPM2_CertifyCreation and single-step through it.

@ncstate-daniel
Copy link
Author

It's been .. probably over a decade since I did any serious debugging with gdb so I'm a tad rusty. So I'm attaching here what I saw in case this means anything to you. After the last line it continued on without intervention.
swtpm_log_trace.txt

@stefanberger
Copy link
Owner

Thanks for the log but there's nothing in there that would indicate a failure let alone the reason for the failure...

@ncstate-daniel
Copy link
Author

Hrm that's unfortunate. Well -- I mean from my perspective I'm "happy enough" with the solution. I'm happy to help you debug it more if you really want to, since I have an active way to test it, but if you'd rather let it be I'm ok with that too since I have a viable workaround. What do you think?

@stefanberger
Copy link
Owner

Of course I would be curious what is causing the failure but you would have to step through the code, possibly multiple times, with an eye on the value returned from the function or the functions it calls, especially the ones further down the call stack doing the RSA or ECDSA signature.

[I would first set a breakpoint in this function again and then set breakpoints after function calls and at the/each return statement. Once you are in this function you can use list to show the function and break <line number> to set a breakpoint at a specific line. And then display $ret or print $ret could show you what the value of ret is, unless it has been optimized out. ]

Though for now we know that enabling SHA1 signing for OpenSSL on the host resolves the issue. It seems to be related to RSA/ECDSA signing with SHA1 that OpenSSL refuses to do due to the policy (sha1 signing not being allowed anymore for some time now) - so in a way we already know where it's coming from. It would be unfortunate that Microsoft hasn't updated the tool and switched it to SHA256 signing.

@ncstate-daniel
Copy link
Author

Ok -- well probably tomorrow I'll dive into that a bit more and see what I can figure out!

@stefanberger
Copy link
Owner

Ok -- well probably tomorrow I'll dive into that a bit more and see what I can figure out!

Once you have entered TPM2_CertifyCreation you could set breakpoints at CryptRsaSign and CryptEccSign and issue continue and it will probably then stop next in either one of them and you would single step through those with an eye on return values...

@ncstate-daniel
Copy link
Author

Hey -- I haven't made much of any headway poking around in gdb so far. One interesting thing that occurred, however, is that Windows started giving me slightly more interesting information. I'm not real sure what triggered it -- but it says that the TPM failed to execute a command, for the following two ordinals:
378
374

Not sure if that's helpful in any way -- and for all i know that's just the result of gdb holding up the response while i poke around. But I wanted to share it in case there's any aha type of moment.

The only slightly interesting but probably not interesting at all thing I've come across so far is that hashAlg is set to 13.

For some reason gdb seems to get to a point where, despite me 'step'ing forward, it suddenly lurches forward to continue. Seemingly out of nowhere.

@ncstate-daniel
Copy link
Author

Oh and neither of these break points ever caught:
CryptRsaSign and CryptEccSign
That said:
(gdb) break CryptRsaSign
Function "CryptRsaSign" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (CryptRsaSign) pending.
(gdb) break CryptEccSign
Function "CryptEccSign" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 3 (CryptEccSign) pending.

@stefanberger
Copy link
Owner

The TPM commands may time out while you single step through them and the application oR driver/OS may report this.

@ncstate-daniel
Copy link
Author

Would it be helpful in any way for me to capture the same log level 10 output from a -successful- run with SHA1 enabled and upload that?

@stefanberger
Copy link
Owner

Would it be helpful in any way for me to capture the same log level 10 output from a -successful- run with SHA1 enabled and upload that?

Sure, you could do that. I would not expect any failures in such a log.

@stefanberger
Copy link
Owner

CryptRsaSign and CrytptEccSign should be correct. Otherwise maybe add a breakpoint in CryptSign also.

grep -r -E "(CryptRsaSign|CryptEccSign)" src/tpm2/crypto/openssl/
src/tpm2/crypto/openssl/consttime.txt:CryptEccSignature.c:
src/tpm2/crypto/openssl/CryptEccSignature.c:/*            $Id: CryptEccSignature.c 1658 2021-01-22 23:14:01Z kgoldman $      */
src/tpm2/crypto/openssl/CryptEccSignature.c:/* 10.2.12 CryptEccSignature.c */
src/tpm2/crypto/openssl/CryptEccSignature.c:#include "CryptEccSignature_fp.h"
src/tpm2/crypto/openssl/CryptEccSignature.c:/* 10.2.12.3.6 CryptEccSign() */
src/tpm2/crypto/openssl/CryptEccSignature.c:CryptEccSign(
src/tpm2/crypto/openssl/CryptRsa.c:/* 10.2.17.4.17 CryptRsaSign() */
src/tpm2/crypto/openssl/CryptRsa.c:CryptRsaSign(
src/tpm2/crypto/openssl/CryptRsa.c:CryptRsaSign(

@ncstate-daniel
Copy link
Author

One thing I'm noticing right off the bat is that it's FAR longer.

swtpm_log_win11_working_tpm.txt

@ncstate-daniel
Copy link
Author

As far as CryptRsaSign and CrytptEccSign -- I'm guessing it never makes it to those then. Hrm. I'll try CryptSign in a moment.

@stefanberger
Copy link
Owner

One thing I'm noticing right off the bat is that it's FAR longer.

swtpm_log_win11_working_tpm.txt

There are no serious error messages in this log. What can be found are responses like these here where the TPM cannot store another object due to space restrictions. The client then does a TPM_SaveContext and tries again the failed command after creating space and then it works.

 SWTPM_IO_Write: length 10
 80 01 00 00 00 0A 00 00 09 02

$ tssreturncode 0x902
TPM_RC_OBJECT_MEMORY - out of memory for object contexts

@ncstate-daniel
Copy link
Author

That's interesting -- that log is the "successful" run with sha1 enabled. Kind of surprised it has any failures at all.

So this one has some minor little errors but no big deal. The 'bad' one fails on TPM2_CertifyCreation --- and i'm assuming works fine in this second run?

@ncstate-daniel
Copy link
Author

I was able to break at CryptSign btw -- any chance part of the problem with CryptRsaSign and CrytptEccSign is that I'm running patched-by-redhat swtpm 0.8.0 and libtpms 0.9.1? Maybe those weren't there in those older versions?

That all said I'm still not seeing anything that jumps out to me. There's so many checks and such that it's a little hard for me to follow without being more familiar with the code. (meaning I don't really know what to look for, what's "bad", that kinda thing) but I definitely don't see anything obvious happening. =/

@stefanberger
Copy link
Owner

I was able to break at CryptSign btw

Where did it go from there?

-- any chance part of the problem with CryptRsaSign and CrytptEccSign is that I'm running patched-by-redhat swtpm 0.8.0 and libtpms 0.9.1? Maybe those weren't there in those older versions?

These functions have been around for quite a while... in fact CryptSign will call these OR call CryptHmacSign().

That all said I'm still not seeing anything that jumps out to me. There's so many checks and such that it's a little hard for me to follow without being more familiar with the code. (meaning I don't really know what to look for, what's "bad", that kinda thing) but I definitely don't see anything obvious happening. =/

Once you break in CryptSign I would use 'display result' and then step through this function using 'next'. Hopefully it would show the value of result instead of indicating that it has been optimized out.

@ncstate-daniel
Copy link
Author

I'm attaching a very long log of me poking around -- where you can clearly see CryptRsaSign being called so not sure why it didn't find it before.

swtpm_log_trace_try2.txt

Afterwards I did a focused check using the method you descripted in case that is helpful:

swtpm_log_trace_try3.txt

I did stop the "try3" one a little short but basically it proceeded back to main loop shortly after. But the result value appears to not have lined up with a success.

@stefanberger
Copy link
Owner

not clear to me what's going on here ...

@ncstate-daniel
Copy link
Author

Yeah ... if only we had the source to the microsoft TPM driver... <_<

Well. I'm out of ideas. If I have any further ones I'll give them a shot and see what I can figure out, and if I have any solutions/suggestions/findings I'll share them with you in the future. I'm going to turn the ovirt stuff back over to my coworker so he can finish setting up a clean cluster and testing a few things.

Thank you for working with me on this and maybe someday we'll figure out what is really going on or Microsoft will fix their driver. It hasn't been updated since 2006.

At this point I suggest we close this issue as "has a workaround" basically. (I'll leave that to you -- in case you disagree and want to keep pursuing it =) )

@stefanberger stefanberger added the SHA1-Signature Failure Signing a SHA1 fails label Jun 21, 2024
@stefanberger
Copy link
Owner

Thanks for investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenSSL 3.0 Issue related to OpenSSL 3.0 Rocky Linux 9 Issue in Rocky Linux 9 SHA1-Signature Failure Signing a SHA1 fails Win11 Windows 11 Windows
Projects
None yet
Development

No branches or pull requests

2 participants