Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coreos.ignition.ssh.key failing in kola-azure #1553

Open
marmijo opened this issue Aug 22, 2023 · 3 comments · Fixed by coreos/afterburn#1074
Open

coreos.ignition.ssh.key failing in kola-azure #1553

marmijo opened this issue Aug 22, 2023 · 3 comments · Fixed by coreos/afterburn#1074
Assignees
Labels
jira for syncing to jira status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release.

Comments

@marmijo
Copy link
Member

marmijo commented Aug 22, 2023

coreos.ignition.ssh.key has been failing in the kola-azure jenkins job since August 15, 2023 with the following error:

[2023-08-17T15:08:35.184Z] --- FAIL: coreos.ignition.ssh.key (331.60s)
[2023-08-17T15:08:35.184Z]         harness.go:1789: mach.Start() failed: machine "kola-975f064f-cccec36d80" failed basic checks: detected failed or stuck systemd units

Looking into the journal log file for the test shows:
journal.txt
console.txt

Aug 17 19:46:17.528507 afterburn[1523]: Aug 17 19:46:17.528 INFO Fetching http://168.63.129.16/?comp=versions: Attempt #1
Aug 17 19:46:17.552714 afterburn[1523]: Aug 17 19:46:17.552 INFO Fetch successful
Aug 17 19:46:17.569624 afterburn[1523]: Aug 17 19:46:17.569 INFO Fetching http://168.63.129.16/machine/?comp=goalstate: Attempt #1
Aug 17 19:46:17.574856 afterburn[1523]: Aug 17 19:46:17.574 INFO Fetch successful
Aug 17 19:46:17.883244 afterburn[1523]: Aug 17 19:46:17.883 INFO Fetching http://168.63.129.16/machine/5c91c37d-f891-4428-93c9-81e602a91a74/a1fc44fd%2Dc611%2D4ea8%2Dbc6b%2D678f2ba3b71e.%5Fkola%2Dc954503e%2D36b448701c?comp=certificates&incarnation=1: Attempt #1
Aug 17 19:46:17.898618 afterburn[1523]: Aug 17 19:46:17.898 INFO Fetch successful
Aug 17 19:46:17.927929 afterburn[1523]: Error: failed to run
Aug 17 19:46:17.927929 afterburn[1523]: Caused by:
Aug 17 19:46:17.927929 afterburn[1523]:     0: writing ssh keys
Aug 17 19:46:17.927929 afterburn[1523]:     1: failed to convert pkcs12 blob to ssh pubkey
Aug 17 19:46:17.927929 afterburn[1523]:     2: failed to get chain from pkcs12
Aug 17 19:46:17.929015 systemd[1]: afterburn-sshkeys@core.service: Main process exited, code=exited, status=1/FAILURE
Aug 17 19:46:17.929193 systemd[1]: afterburn-sshkeys@core.service: Failed with result 'exit-code'.
Aug 17 19:46:17.934176 systemd[1]: Failed to start afterburn-sshkeys@core.service - Afterburn (SSH Keys).
Aug 17 19:46:17.942790 systemd[1]: Dependency failed for afterburn-sshkeys.target - Synchronize afterburn-sshkeys@.service template instances.
Aug 17 19:46:17.947667 systemd[1]: Dependency failed for multi-user.target - Multi-User System.
Aug 17 19:46:17.951000 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=afterburn-sshkeys@core comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Aug 17 19:46:17.952533 systemd[1]: multi-user.target: Job multi-user.target/start failed with result 'dependency'.
Aug 17 19:46:17.952960 kernel: audit: type=1130 audit(1692301577.951:80): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=afterburn-sshkeys@core comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Aug 17 19:46:17.952564 systemd[1]: afterburn-sshkeys.target: Job afterburn-sshkeys.target/start failed with result 'dependency'.

Afterburn fails to convert pkcs12 blob to ssh pubkey
Before this failure, the fetching section looked like this:

Aug 14 00:49:48.769452 afterburn[1511]: Aug 14 00:49:48.769 INFO Fetching http://168.63.129.16/?comp=versions: Attempt #1    
Aug 14 00:49:48.790498 afterburn[1511]: Aug 14 00:49:48.790 INFO Fetch successful                                            
Aug 14 00:49:48.809532 afterburn[1511]: Aug 14 00:49:48.809 INFO Fetching http://168.63.129.16/machine/?comp=goalstate: Attem
Aug 14 00:49:48.819160 afterburn[1511]: Aug 14 00:49:48.818 INFO Fetch successful                                            
Aug 14 00:49:48.819462 afterburn[1511]: Aug 14 00:49:48.819 WARN SSH pubkeys requested, but not provisioned for this instance

Which is what other test's logs look like when an SSH key IS set. So it's looking like an SSH key is being set somehow, even though one isnt specified. This is causing the .ssh/authorized_keys.d/afterburn file to be created and the kola test fails.

The certificates endpoint is now included in http://168.63.129.16/machine/?comp=goalstate so now it's being fetched and afterburn is failing directly after fetching the certificate.

marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Aug 28, 2023
This test is failing for all streams on kola-azure.
Snooze the test, with `warn: true`, for a week while we
investigate coreos/fedora-coreos-tracker#1553
marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Aug 28, 2023
This test is failing for all streams on kola-azure.
Snooze the test, with `warn: true`, for a week while we
investigate coreos/fedora-coreos-tracker#1553
marmijo added a commit to coreos/fedora-coreos-config that referenced this issue Aug 28, 2023
This test is failing for all streams on kola-azure.
Snooze the test, with `warn: true`, for a week while we
investigate coreos/fedora-coreos-tracker#1553
marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Sep 5, 2023
This test is still failing. Let's snooze for another week and work
to find a fix for coreos/fedora-coreos-tracker#1553
@marmijo marmijo self-assigned this Sep 5, 2023
@marmijo marmijo added the jira for syncing to jira label Sep 5, 2023
aaradhak pushed a commit to coreos/fedora-coreos-config that referenced this issue Sep 5, 2023
This test is still failing. Let's snooze for another week and work
to find a fix for coreos/fedora-coreos-tracker#1553
@marmijo marmijo added jira for syncing to jira and removed jira for syncing to jira labels Sep 7, 2023
@mike-nguyen mike-nguyen added jira for syncing to jira and removed jira for syncing to jira labels Sep 11, 2023
marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Sep 13, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Sep 13, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
gursewak1997 added a commit to gursewak1997/fedora-coreos-config that referenced this issue Sep 27, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
marmijo pushed a commit to coreos/fedora-coreos-config that referenced this issue Sep 27, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is failing for all streams on kola-azure.
Snooze the test, with `warn: true`, for a week while we
investigate coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is still failing. Let's snooze for another week and work
to find a fix for coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is failing for all streams on kola-azure.
Snooze the test, with `warn: true`, for a week while we
investigate coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is still failing. Let's snooze for another week and work
to find a fix for coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
@jlebon
Copy link
Member

jlebon commented Jan 4, 2024

@marmijo As part of this, since it's touching the same part of the codebase, could you look at question 1 from coreos/coreos-assembler#1835? Possibly it can be closed out if it's no longer an issue, or if there's a valid rationale for the Afterburn message.

marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Feb 5, 2024
This test is still failing. Let's bump the snooze while we
continue to investigate:
coreos/fedora-coreos-tracker#1553
jlebon pushed a commit to coreos/fedora-coreos-config that referenced this issue Feb 5, 2024
This test is still failing. Let's bump the snooze while we
continue to investigate:
coreos/fedora-coreos-tracker#1553
marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Feb 27, 2024
jlebon pushed a commit to coreos/fedora-coreos-config that referenced this issue Feb 27, 2024
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
This test is still failing so let's bump the snooze while we work
to find a fix for coreos/fedora-coreos-tracker#1553
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
This test is still failing. Let's bump the snooze while we
continue to investigate:
coreos/fedora-coreos-tracker#1553
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
@marmijo
Copy link
Member Author

marmijo commented May 8, 2024

For some reason, azure started populating the certs endpoint even when an SSH key is not provisioned. The afterburn logic currently expects the certs endpoint to either exist with an SSH public key in the certificate chain, or not exist at all. It looks like we are hitting a third case where the certs endpoint exists, but the certificate chain inside the referenced PKCS#12 file is empty. After speaking with @dustymabe, we decided to allow this case to succeed and still warn that an SSH public key was not provisioned.

coreos/afterburn#1074 should resolve this issue.

@travier travier added the status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. label Jun 24, 2024
@marmijo
Copy link
Member Author

marmijo commented Jun 24, 2024

I'll reopen this issue until a new Afterburn release can be cut (which will include the change made in coreos/afterburn#1074) so we can continue to use it to snooze coreos.ignition.ssh.key in the FCOS pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants