Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grub2-mkconfig failing on F29 #1598

Closed
miabbott opened this issue Oct 3, 2018 · 19 comments
Closed

grub2-mkconfig failing on F29 #1598

miabbott opened this issue Oct 3, 2018 · 19 comments

Comments

@miabbott
Copy link
Member

miabbott commented Oct 3, 2018

Our automated tests caught this on Fedora 29 Atomic Host, starting from fedora/29/x86_64/testing/atomic-host ref. After installing wget and rebooting, the pending deployment scheduled for next boot is gone.

# rpm -q rpm-ostree                                                           
rpm-ostree-2018.8-1.fc29.x86_64
     
# rpm-ostree status                  
State: idle                                                                                 
AutomaticUpdates: disabled
Deployments:
● ostree://fedora-atomic:fedora/29/x86_64/testing/atomic-host
                   Version: 29.20181003.0 (2018-10-03 14:59:08)
                    Commit: 5dbe63cb0db058eedba63e645df08ff4c0b0b4a8bf20e1862e4921b5232cd5d1
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4

  ostree://fedora-atomic:fedora/29/x86_64/atomic-host
                   Version: 29.20180929.n.0 (2018-09-30 02:35:33)
                    Commit: af001dfbfbf1f2c5c083f5da9ecec992f3ee3bafafc652e33a22638f52911f39
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4
[root@micah-f29ah-vm1003a ~]# rpm-ostree install wget
Checking out tree 5dbe63c... done
Enabled rpm-md repositories: updates-testing updates fedora
Updating metadata for 'updates-testing': [=============] 100%
rpm-md repo 'updates-testing'; generated: 2018-10-01 21:51:38
Updating metadata for 'updates': [=============] 100%
rpm-md repo 'updates'; generated: 2018-02-20 19:18:14
Updating metadata for 'fedora': [=============] 100%
rpm-md repo 'fedora'; generated: 2018-10-02 09:59:20
Importing metadata [=============] 100%
Resolving dependencies... done
Will download: 1 package (723.6 kB)
  Downloading from updates-testing: [=============] 100%
Importing (1/1) [=============] 100%
Checking out packages (1/1) [=============] 100%
Running pre scripts... 0 done
Running post scripts... 7 done
Writing rpmdb... done
Writing OSTree commit... done
Added:
  wget-1.19.5-5.fc29.x86_64
Run "systemctl reboot" to start a reboot

# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
  ostree://fedora-atomic:fedora/29/x86_64/testing/atomic-host
                   Version: 29.20181003.0 (2018-10-03 14:59:08)
                BaseCommit: 5dbe63cb0db058eedba63e645df08ff4c0b0b4a8bf20e1862e4921b5232cd5d1
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4
           LayeredPackages: wget

● ostree://fedora-atomic:fedora/29/x86_64/testing/atomic-host
                   Version: 29.20181003.0 (2018-10-03 14:59:08)
                    Commit: 5dbe63cb0db058eedba63e645df08ff4c0b0b4a8bf20e1862e4921b5232cd5d1
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4

  ostree://fedora-atomic:fedora/29/x86_64/atomic-host
                   Version: 29.20180929.n.0 (2018-09-30 02:35:33)
                    Commit: af001dfbfbf1f2c5c083f5da9ecec992f3ee3bafafc652e33a22638f52911f39
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4

# systemctl reboot                                                                             
# Connection to 10.8.251.69 closed by remote host.                   
Connection to 10.8.251.69 closed.  
                                                                                         
$ ssh 10.8.251.69                                                                                                  
Warning: Permanently added '10.8.251.69' (ECDSA) to the list of known hosts.                                                                      
Last login: Wed Oct  3 18:14:25 2018 from 10.18.41.60                                                                          
$ rpm-ostree status                                            
State: idle                                                                                                                 
AutomaticUpdates: disabled                                                                                                     
Deployments:                                                                                     
● ostree://fedora-atomic:fedora/29/x86_64/testing/atomic-host                                                                                                                                                      
                   Version: 29.20181003.0 (2018-10-03 14:59:08)                                                             
                    Commit: 5dbe63cb0db058eedba63e645df08ff4c0b0b4a8bf20e1862e4921b5232cd5d1                                                                                      
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4                                                          
                                                                                 
  ostree://fedora-atomic:fedora/29/x86_64/atomic-host                          
                   Version: 29.20180929.n.0 (2018-09-30 02:35:33)                                                                                  
                    Commit: af001dfbfbf1f2c5c083f5da9ecec992f3ee3bafafc652e33a22638f52911f39                                                      
              GPGSignature: Valid signature by 5A03B4DD8254ECA02FDA1637A20AA56B429476B4                      

From the journal where the transaction happened:

$ sudo journalctl -b -1 -u rpm-ostreed
-- Logs begin at Wed 2018-10-03 17:48:58 UTC, end at Wed 2018-10-03 18:37:53 UTC. --
Oct 03 18:14:31 micah-f29ah-vm1003a systemd[1]: Starting RPM-OSTree System Management Daemon...
Oct 03 18:14:31 micah-f29ah-vm1003a rpm-ostree[1191]: Reading config file '/etc/rpm-ostreed.conf'
Oct 03 18:14:31 micah-f29ah-vm1003a systemd[1]: Started RPM-OSTree System Management Daemon.
Oct 03 18:14:31 micah-f29ah-vm1003a rpm-ostree[1191]: In idle state; will auto-exit in 64 seconds
Oct 03 18:14:31 micah-f29ah-vm1003a rpm-ostree[1191]: client(id:cli dbus:1.22 unit:session-1.scope uid:0) added; new total=1
Oct 03 18:14:31 micah-f29ah-vm1003a rpm-ostree[1191]: Initiated txn PkgChange for client(id:cli dbus:1.22 unit:session-1.scope uid:0): /org/projectatomic/rpmostree1/fedora_atomic
Oct 03 18:14:37 micah-f29ah-vm1003a rpm-ostree[1191]: Txn PkgChange on /org/projectatomic/rpmostree1/fedora_atomic failed: Operation was cancelled
Oct 03 18:14:37 micah-f29ah-vm1003a rpm-ostree[1191]: client(id:cli dbus:1.22 unit:session-1.scope uid:0) vanished; remaining=0
Oct 03 18:14:37 micah-f29ah-vm1003a rpm-ostree[1191]: In idle state; will auto-exit in 60 seconds
Oct 03 18:14:39 micah-f29ah-vm1003a rpm-ostree[1191]: client(id:cli dbus:1.24 unit:session-1.scope uid:0) added; new total=1
Oct 03 18:14:39 micah-f29ah-vm1003a rpm-ostree[1191]: client(id:cli dbus:1.24 unit:session-1.scope uid:0) vanished; remaining=0
Oct 03 18:14:39 micah-f29ah-vm1003a rpm-ostree[1191]: In idle state; will auto-exit in 63 seconds
Oct 03 18:14:41 micah-f29ah-vm1003a rpm-ostree[1191]: client(id:cli dbus:1.25 unit:session-1.scope uid:0) added; new total=1
Oct 03 18:14:41 micah-f29ah-vm1003a rpm-ostree[1191]: Initiated txn PkgChange for client(id:cli dbus:1.25 unit:session-1.scope uid:0): /org/projectatomic/rpmostree1/fedora_atomic
Oct 03 18:15:58 micah-f29ah-vm1003a rpm-ostree[1191]: Preparing pkg txn; enabled repos: ['updates-testing', 'updates', 'fedora'] solvables: 64721
Oct 03 18:16:05 micah-f29ah-vm1003a rpm-ostree[1191]: Imported 1 pkg
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(shared-mime-info) for usr/share/mime in 139ms; 792 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(systemd-udev) for usr/lib/udev/hwdb.d in 60ms; 18 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(systemd-udev) for usr/lib/udev/rules.d in 51ms; 50 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: No files matched %transfiletriggerin(lib) for glibc-common
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: No files matched %transfiletriggerin(lib64) for glibc-common
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(glibc-common) for lib, lib64, usr/lib, usr/lib64 in 218ms; 24593 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(info) for usr/share/info in 167ms; 3 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(glib2) for usr/lib64/gio/modules in 112ms; 4 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: Executed %transfiletriggerin(glib2) for usr/share/glib-2.0/schemas in 145ms; 30 matched files
Oct 03 18:16:06 micah-f29ah-vm1003a rpm-ostree[1191]: sanitycheck(/usr/bin/true) successful
Oct 03 18:16:08 micah-f29ah-vm1003a rpm-ostree[1191]: Wrote commit: 0e1408efd0f4be416b997e84f1dcaa2ccdff5f48973ab62cf841e56bf9f3cf06; New objects: meta:36 content:11 totaling 17.0 MB)
Oct 03 18:16:16 micah-f29ah-vm1003a rpm-ostree[1191]: Txn PkgChange on /org/projectatomic/rpmostree1/fedora_atomic successful
Oct 03 18:16:16 micah-f29ah-vm1003a rpm-ostree[1191]: client(id:cli dbus:1.25 unit:session-1.scope uid:0) vanished; remaining=0
Oct 03 18:16:16 micah-f29ah-vm1003a rpm-ostree[1191]: In idle state; will auto-exit in 62 seconds
Oct 03 18:17:18 micah-f29ah-vm1003a rpm-ostree[1191]: In idle state; will auto-exit in 60 seconds
Oct 03 18:20:52 micah-f29ah-vm1003a systemd[1]: Starting RPM-OSTree System Management Daemon...
Oct 03 18:20:52 micah-f29ah-vm1003a rpm-ostree[1739]: Reading config file '/etc/rpm-ostreed.conf'
Oct 03 18:20:52 micah-f29ah-vm1003a systemd[1]: Started RPM-OSTree System Management Daemon.
Oct 03 18:20:52 micah-f29ah-vm1003a rpm-ostree[1739]: In idle state; will auto-exit in 63 seconds
Oct 03 18:20:52 micah-f29ah-vm1003a rpm-ostree[1739]: client(id:cli dbus:1.26 unit:session-1.scope uid:0) added; new total=1
Oct 03 18:20:53 micah-f29ah-vm1003a rpm-ostree[1739]: client(id:cli dbus:1.26 unit:session-1.scope uid:0) vanished; remaining=0
Oct 03 18:20:53 micah-f29ah-vm1003a rpm-ostree[1739]: In idle state; will auto-exit in 63 seconds
Oct 03 18:21:21 micah-f29ah-vm1003a systemd[1]: Stopping RPM-OSTree System Management Daemon...
Oct 03 18:21:22 micah-f29ah-vm1003a systemd[1]: Stopped RPM-OSTree System Management Daemon.

I know @jlebon was digging into this a bit after seeing results off projectatomic/atomic-host-tests#437, but wanted to get this logged

@cgwalters
Copy link
Member

Dup of #1567 ?

@dustymabe
Copy link
Member

I think so see #1567 (comment)

micah can you journalctl -b -1 -u ostree-finalize-staged ?

@jlebon
Copy link
Member

jlebon commented Oct 3, 2018

I don't think it's a dupe. If you look at projectatomic/atomic-host-tests#437 (comment), you can see it's not failing from a timeout, but from grub2-mkconfig.

@miabbott
Copy link
Member Author

miabbott commented Oct 3, 2018

Yeah, @jlebon is onto something

$ sudo journalctl -b -1 -u ostree-finalize-staged
-- Logs begin at Wed 2018-10-03 17:48:58 UTC, end at Wed 2018-10-03 18:48:03 UTC. --
Oct 03 18:16:08 micah-f29ah-vm1003a systemd[1]: Started OSTree Finalize Staged Deployment.
Oct 03 18:21:21 micah-f29ah-vm1003a systemd[1]: Stopping OSTree Finalize Staged Deployment...
Oct 03 18:21:23 micah-f29ah-vm1003a ostree[1889]: Copying /etc changes: 19 modified, 0 removed, 51 added
Oct 03 18:21:24 micah-f29ah-vm1003a ostree[1889]: error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
Oct 03 18:21:24 micah-f29ah-vm1003a systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited status=1
Oct 03 18:21:24 micah-f29ah-vm1003a systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'.
Oct 03 18:21:24 micah-f29ah-vm1003a systemd[1]: Stopped OSTree Finalize Staged Deployment.

@miabbott
Copy link
Member Author

miabbott commented Oct 3, 2018

$ rpm-ostree db diff af001dfbfbf1f2c5c083f5da9ecec992f3ee3bafafc652e33a22638f52911f39 5dbe63cb0db058eedba63e645df08ff4c0b0b4a8bf20e1862e4921b5232cd5d1
ostree diff commit old: af001dfbfbf1f2c5c083f5da9ecec992f3ee3bafafc652e33a22638f52911f39
ostree diff commit new: 5dbe63cb0db058eedba63e645df08ff4c0b0b4a8bf20e1862e4921b5232cd5d1
Upgraded:                                                                  
...
  grub2-common 1:2.02-58.fc29 -> 1:2.02-60.fc29
  grub2-efi-x64 1:2.02-58.fc29 -> 1:2.02-60.fc29  
  grub2-pc 1:2.02-58.fc29 -> 1:2.02-60.fc29        
  grub2-pc-modules 1:2.02-58.fc29 -> 1:2.02-60.fc29                    
  grub2-tools 1:2.02-58.fc29 -> 1:2.02-60.fc29   
  grub2-tools-extra 1:2.02-58.fc29 -> 1:2.02-60.fc29      
  grub2-tools-minimal 1:2.02-58.fc29 -> 1:2.02-60.fc29
...

@cgwalters
Copy link
Member

Ugh, the sooner we can get rid of grub2-mkconfig and particularly the abomination that is os-prober the better.

I wonder if we're potentially racing with /boot being unmounted?

@dustymabe
Copy link
Member

rpm-ostree db diff

well that looks like a smoking gun :)

@jlebon jlebon changed the title deployment with layered pkg disappears after reboot grub2-mkconfig failing on F29 Oct 3, 2018
@miabbott
Copy link
Member Author

miabbott commented Oct 3, 2018

https://bodhi.fedoraproject.org/updates/FEDORA-2018-d4d38d8885

Adam Williamson reporting problems in OpenQA there, too

@dustymabe
Copy link
Member

micah can you run the transaction and then run ostree admin finalize-staged manually to see if it happens when the system is fully up and not rebooting?

@miabbott
Copy link
Member Author

miabbott commented Oct 3, 2018

# rpm-ostree install wget
Checking out tree 5dbe63c... done
Enabled rpm-md repositories: updates-testing updates fedora
rpm-md repo 'updates-testing' (cached); generated: 2018-10-01 21:51:38
rpm-md repo 'updates' (cached); generated: 2018-02-20 19:18:14
rpm-md repo 'fedora' (cached); generated: 2018-10-02 09:59:20
Importing metadata [=============] 100%
Resolving dependencies... done
Checking out packages (1/1) [=============] 100%
Running pre scripts... 0 done
Running post scripts... 7 done
Writing rpmdb... done
Writing OSTree commit... done
Freed: 16.5 MB (pkgcache branches: 0)
Added:
  wget-1.19.5-5.fc29.x86_64
Run "systemctl reboot" to start a reboot

# ostree admin finalize-staged
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

@dustymabe
Copy link
Member

good to know. can we add bad karma to that update?

@miabbott
Copy link
Member Author

miabbott commented Oct 3, 2018

Will do

@cgwalters
Copy link
Member

There's a bigger picture issue here in that I am doubtful that the boot-counting implementation that ships in grub as it exists today is something we want for Fedora {Atomic Host, CoreOS}.

@cgwalters
Copy link
Member

We could try just nuking those bits in postprocess...

@dustymabe
Copy link
Member

We could try just nuking those bits in postprocess...

👍 assuming there isn't an rpm that we can easily just yank out

@LorbusChris
Copy link
Collaborator

This is possibly caused by https://bugzilla.redhat.com/show_bug.cgi?id=1614637

I made a PR to fix it here: rhboot/grub2#42

With the current implementation it is required to do the following in userspace (this is how greenboot intends to do it, too).

On a successful boot (could also be run as a one-shot service when using proposed system-boot-success.target/boot-complete.target and reaching it):

grub2-editenv - set boot_success=1

On OS upgrade to activate the boot counter:

rpm-ostree upgrade
grub2-editenv - set boot_success=0
grub2-editenv - set boot_counter=3 # or any number >= 1
reboot

@cgwalters what are your thoughts on a boot counting implementation, i.e. what would it look like if not the way it is now? I'd once again like to start a broader discussion around this!

@cgwalters
Copy link
Member

Let's track that discussion in coreos/fedora-coreos-config#21 ?

@LorbusChris
Copy link
Collaborator

LorbusChris commented Oct 9, 2018

Ack. The explanatory part in the comment above also plays into: coreos/fedora-coreos-tracker#47

@cgwalters
Copy link
Member

Closing as this is tracked elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants