Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36729 - Implement pruning outdated TFTP files #873

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ekohl
Copy link
Member

@ekohl ekohl commented Sep 7, 2023

This is about replacing foreman_maintain's check_tftp_storage with a native implementation. It is implemented by a /tftp/prune API endpoint which accepts the duration as a parameter.

There is no plugin capability defined since Foreman can just call the endpoint and process HTTP 404 as Not Implemented.

A follow up in Foreman would implement a way to call this endpoint for all Smart Proxies with the TFTP feature present. Foreman does not depend on foreman-tasks natively, but it could be implemented in some cron-style way to make the system handle it automatically instead of putting the burden on the admin.

@ekohl
Copy link
Member Author

ekohl commented Sep 7, 2023

This does deserve tests.

@ehelms
Copy link
Member

ehelms commented Sep 7, 2023

This presents on the surface as self-healing activity the smart-proxy could just do either triggered via a cron/systemd timer or through some internal process on the smart-proxy host. Is that approach a possibility?

This is about replacing foreman_maintain's check_tftp_storage[1] with a
native implementation. It is implemented by a /tftp/prune API endpoint
which accepts the duration as a parameter.

There is no plugin capability defined since Foreman can just call the
endpoint and process HTTP 404 as Not Implemented.

[1]: https://github.com/theforeman/foreman_maintain/blob/1e20bbd9f32a4f47193c4833fbd4012772b34dc4/definitions/checks/foreman_proxy/check_tftp_storage.rb
Copy link
Member Author

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on this I did realize a potential flaw in foreman_maintain's logic: one host can download the TFTP files. The token can then expire, but another host may still use the same files.

After some digging I found that we define the prefix here:
https://github.com/theforeman/foreman/blob/b9986c9648b55c0d8fae4ab5ff5845763ddbea1f/app/models/operatingsystem.rb#L189-L196

So it's shared on the unique ID of the medium provider.

We do appear to always call curl (via HttpDownload) so perhaps that touches the file. If not, implementing that would keep it "fresh".

This presents on the surface as self-healing activity the smart-proxy could just do either triggered via a cron/systemd timer or through some internal process on the smart-proxy host. Is that approach a possibility?

theforeman/foreman#9825 implements a rake task. Completely untested now, but it shows the end-to-end idea. We can have a discussion there about other implementations (API, UI action, tasks, etc).

@ehelms
Copy link
Member

ehelms commented Sep 7, 2023

What is token_duration in the rake task?

@ehelms
Copy link
Member

ehelms commented Sep 7, 2023

@ekohl
Copy link
Member Author

ekohl commented Sep 7, 2023

Thanks, the setting name is rather generic sounding for such a specific use case. I am guessing cause this dates back a loooong time.

A quick look tells me theforeman/foreman@81159d4 introduced the setting. 2012 is indeed a loooong time ago.

@ehelms
Copy link
Member

ehelms commented Sep 7, 2023

👍 With tests, I'd be glad to see this merged and move forward with the Foreman part.

@ekohl
Copy link
Member Author

ekohl commented Sep 8, 2023

While looking at this further I noticed that it does keep the mtime from the original mirror, not the time of writing. So in practice this will always remove all files. Not just the ones that expired. That means that you can also easily purge all files that are also in use.

That really makes me question the foreman-maintain version. While the system can recover (move a host out of build mode and back in build mode), it can be frustrating for users.

This makes me question what makes sense here. Should Foreman perhaps calculate a list of acceptable filenames (IIRC it can do so) and any other file is removed.

def self.outdated_files(age)
delete_before = Time.now - age
Dir.glob(File.join(Proxy::TFTP::Plugin.settings.tftproot, "boot", "*-{vmlinuz,initrd.img}")).filter do |file|
File.file?(file) && File.mtime(file) < delete_before
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ctime be used? Assuming the file is written once and then never changed, would ctime ever change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but then you still have the problem where:

  • Host X provisions RHEL 9
  • Host X's token expires
  • Host Y enters build mode for RHEL 9
  • Pruning happens
  • Host Y attempts to retrieve TFTP files

Note that's also a problem with the current foreman_maintain implementation. Perhaps few users hit this because they may use external Smart Proxies and never run the code on them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that Host Y enters build mode for RHEL 9 does a check if the files already exist on the smart-proxy, and if they do no operation is performed. And in theory, we would need to "touch" the files somehow to indicate. Which I think then is where you were getting at Foreman (or maybe the smart-proxy object?) having a canonical list of needed files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a host enters build mode (which can also be rebuilding an existing host), it immediately retrieves the files. Then the host still needs to boot. A BIOS (or UEFI) can take a significant amount of time on server class hardware (a minute is often on the low end). So if the pruning happens between those 2 events (Smart Proxy fetches TFTP files and host fetches TFTP files from Smart Proxy) you have a problem.

I think this requires a bit more thought on what the best way is, but bringing it back to the original issue: I'd recommend removing this from a health check and instead make it a separate step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants