Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/execute shutdown command #7709

Closed

Conversation

lestat80
Copy link
Contributor

This feature adds the shutdown-host endpoint to the API.

This endpoint allows submitting shutdown commands to Icinga2 nodes in a distributed monitoring environment.
Commands are distributed via cluster messages.

@dnsmichi
Copy link
Contributor

What exactly is a "shutdown host" and how does it behave? Which problem are you aiming to solve with this pull request?

Please provide more details and the story behind this, also in regard of the technical design, compatibility and maintenance purpose (troubleshooting this feature) - since this involves API and cluster as I've seen from a quick look into the diff.

@dnsmichi
Copy link
Contributor

Also, the code portions which involve sending a command to a command_endpoint duplicate the existing code parts we already have in checkable-check.cpp and likewise. To anyone who reviews this - this code duplication must be resolved before approval.

@dnsmichi dnsmichi added needs feedback We'll only proceed once we hear from you again area/api REST API area/distributed Distributed monitoring (master, satellites, clients) labels Dec 11, 2019
macros = params->Get("macros");

Array::Ptr command_line_tmp = command->GetCommandLine();
Value shutdown_command_value = params->Get("shutdown_command");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is retrieved in various places, but it never gets set. Is that a new command object attribute or is the PR incomplete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint is called by passing a parameter called shutdown_command, which represents a valid shutdown command for the target host. An example follows:

curl -s -k -u root:admin -H 'X-HTTP-Method-Override: POST' -H 'Accept: application/json' https://localhost:5665/v1/actions/shutdown-host -d '{ "type": "Host", "filter": "host.name==\"<hostname>\"", "shutdown_command": ["systemctl", "poweroff"]}'

@lestat80
Copy link
Contributor Author

The user scenario behind the feature is the need of being able to shut down a host in a clean way when a monitoring condition is met (it can be a temperature level above threshold, for example).

The user can specify a shutdown command for a single host or a group of hosts. Then, when a condition is met, this command is triggered by calling the API endpoint we implemented. The logic behind the execution of the command resorts to the clustering infrastructure in order to distribute the message containing the command to execute to the target host.

@dnsmichi
Copy link
Contributor

So basically this implements a remote command execution triggered with an API endpoint? Or do I still need a CheckCommand object on the agent for this? From a security perspective, I see quite a few points where this can be considered harmful.

Also, the implementation mimics the command endpoint way of doing things, whose design was incomplete. Imho this requirement follows along the inventory and discovery ideas we had in the past, and should be taken into account in their design.

Is there a specific reason not to use Ansible or Puppet's Bolt for this type of shutdown scenario? Especially since you are only adding a transport layer with using the Icinga cluster protocol for this, I'm not yet convinced that Icinga is the right environment for this.

@lestat80
Copy link
Contributor Author

The Icinga2 clustering infrastructure gives us the possibility of using a secure channel to propagate the shutdown messages from the master to any agent in the cluster. Having this in place, it's a matter of sending the right message to the proper recipient without adding an extra ingredient as Ansible (which is, I agree on that, capable of doing similar things if properly used).

To execute the commands, we deploy a dummy shutdown command in director-global, which is then used as placeholder for the correct command that is received by the agent via cluster messages

@u238
Copy link

u238 commented Dec 12, 2019

Hi @dnsmichi ,
we know the implementation can be improved and if you point us to the right direction we will obviously make the changes accordingly.

The main advantage for using icinga2 for this purpose is in fact the already set-up secure communication and to install/configure one single agent on every machine.

Another future plan for us is developing a "service test" feature (based on this one) where a user can test a particular command by tuning the parameters before deploying them in the director.

@dnsmichi
Copy link
Contributor

Please coordinate the design and concept with @lippserd to ensure this is something which adds properly to Icinga's feature set.

Creating a pull request and code without discussing the design and technical details beforehand adds a burden for both sides investing time into it. In its current state, I'll stall the PR until the design is fixed.

@dnsmichi dnsmichi added stalled Blocked or not relevant yet TBD To be defined - We aren't certain about this yet and removed needs feedback We'll only proceed once we hear from you again labels Dec 18, 2019
@lestat80
Copy link
Contributor Author

lestat80 commented Dec 18, 2019

Hi @dnsmichi,
during our monthly collaboration meeting with you guys in November, we tried to schedule a dedicated design meeting for this feature.
The meetings were not scheduled and in the end, we agreed on sending a git patch via email for getting feedback on the code at the beginning of December.

The patch at the end was not reviewed, and during the following meeting in December, we were asked to open this PR for an open discussion with you about the design and where to go from here.

For us, this is a starting point for opening a discussion to identify if this use case is interesting for the community. As you can imagine, we are now confused about what we should do now.

Could you please help us to understand how is the best way to proceed so that we can learn how to contribute to Icinga in the correct way?
Thanks.

@dnsmichi
Copy link
Contributor

Hi,

unfortunately I'm not aware of these meetings nor patches, I just saw the PR without any details and was wondering what this is. That is why I started reviewing and analysing it, that's a normal thing in my daily work contributing to Icinga.

Meanwhile I have learned that you are in discussion with Eric, that is why I have updated this issue since I've been mentioned here. Otherwise the conversation would look incomplete - please join Eric on the matter and clarify how we can proceed here.

Cheers,
Michael

@Al2Klimov
Copy link
Member

Hello @lestat80 and thank you for contributing!

Why a shutdown command and not a general command?

What problem not solved by EventCommands does this solve?

Best,
AK

@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label May 18, 2020
@Al2Klimov Al2Klimov assigned Al2Klimov and unassigned lestat80 Jun 29, 2020
@Al2Klimov Al2Klimov added duplicate This issue or pull request already exists and removed TBD To be defined - We aren't certain about this yet needs feedback We'll only proceed once we hear from you again labels Jun 29, 2020
@Al2Klimov Al2Klimov added the queue/reserved Before doing anything: Ask the one who has added this label! label Aug 11, 2020
@Al2Klimov Al2Klimov added the needs feedback We'll only proceed once we hear from you again label Aug 11, 2020
@Decstasy
Copy link

I'm sorry but what the hell? Are you serious to implement a shutdown via an monitoring agent???

A event handler should be sufficient in every case to shutdown a machine. For what purpose should icinga2 do this? There are much better ways to stonith machines or automatically shut them down all at once.

If there is a slightly bug which would allow me to initiate shutdowns without permissions, I could shutdown thousands of michines by hundrets of customers at once. Imho it's mental to merge such an pull request.

@Al2Klimov Al2Klimov added this to the 2.13.0 milestone Nov 23, 2020
@icinga-probot icinga-probot bot removed this from the 2.13.0 milestone Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/distributed Distributed monitoring (master, satellites, clients) duplicate This issue or pull request already exists needs feedback We'll only proceed once we hear from you again queue/reserved Before doing anything: Ask the one who has added this label! stalled Blocked or not relevant yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants