-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check metrics authentication validity before perf_capture_queue #493
Check metrics authentication validity before perf_capture_queue #493
Conversation
app/models/manageiq/providers/kubernetes/container_manager/metrics_capture.rb
Show resolved
Hide resolved
Nice |
5d3c22d
to
d70847b
Compare
Check that the EMS's metrics authentication is valid before queueing perf_capture
d70847b
to
be53bdc
Compare
end | ||
|
||
def verify_metrics_connection!(ems) | ||
t = target || ems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbrock do we have a general way of handling this? When called from perf_capture_timer
as ems.perf_capture_object.perf_capture_all_queue
target
is nil
and only ems
is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like how @target
is set behind the scenes.
perf_capture_gap, perf_capture_all_queue, perf_capture_realtime_queue ( /via ui)
if you are running capture for a whole ems, target
is null.
If you are running capture for a single object, target
is single object.
If you are running capture for a batch/vmware, target
/targets
is an array
I think I'd just use ems
Note: you need to handle multiple targets case. (I would like everyone to have targets
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to ensure that prometheus
exists in here? (aka capture_context
)
That way verify does all the raises here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd just use ems
Well when we use this below it is on a single object so this has to work with target
or ems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to ensure that prometheus exists in here? (aka capture_context)
Yeah I was thinking that but we use the return of capture_context
and returning something from a verify_*
method seems weird.
Also building a prometheus_capture_context object seems like overkill when it isn't used in the perf_capture_timer
case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably could call this from capture_context
and raise if that is nil in that method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I pushed a second commit which moves some things around, LMK if you like it better or worse and it is easy to revert out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, it doesn't look like it needs to know about target at all beside building a couple of printf strings. Looks good for now.
TANGENT:
Yea, I guess the code is currently coded to reference a single target.
I have on my short list to convert this over to targets
.
For some reason, I can't find the initialize method on PrometheusCaptureContext
.
But from what I can tell, that code assumes a single target when building the labels.
Not sure how hard it would be to handle a list of targets over there.
@@ -59,6 +59,33 @@ | |||
end | |||
end | |||
|
|||
context "#perf_capture_all_queue" do | |||
it "returns the objects" do | |||
expect(@ems_kubernetes.perf_capture_object.perf_capture_all_queue).to include("Container" => [@container], "ContainerNode" => [@node]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO why only @container
and @node
and not @group
|
||
context "with a missing metrics endpoint" do | ||
before do | ||
@ems_kubernetes.endpoints.find_by(:role => "prometheus").destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE I don't love how this spec sets up these ivars since it makes it difficult to override in sub-contexts. I worked around it but I think I should refactor to use let
context "with invalid authentication on the metrics endpoint" do | ||
before do | ||
@ems_kubernetes.authentications.find_by(:authtype => "prometheus").update!(:status => "Error") | ||
@ems_kubernetes.reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE a good example of why the before {}
setting up ivars isn't great
context = capture_context(ems, target, start_time, end_time) | ||
|
||
raise TargetValidationWarning, "no metrics endpoint found for #{target_name}" if context.nil? | ||
context = build_capture_context!(ems, target, start_time, end_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. this is nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
32e316c
to
b8a12a9
Compare
0aac8bf
to
0c258a3
Compare
Checked commits agrare/manageiq-providers-kubernetes@be53bdc~...0c258a3 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint spec/factories/ext_management_system.rb
|
@kbrock can you re-review? I refactored the specs to use |
rescue TargetValidationError, TargetValidationWarning => e | ||
_log.send(e.log_severity, e.message) | ||
return [] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all providers implement similar logic? Can we extract this so other providers will also bypass metrics capture with invalid auth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "future" PR question... I'm fine with fixing kubernetes in this way now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a little tricky because not all providers have different endpoints and/or authentications so what you need to check varies.
What I've considered doing is consolidating this in the supports :metrics {}
block in ext_management_system so that the logic could be limited to the provider but that is going to be more of a cross-provider change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating supports
to handle all the actual logic totally got me in a few PRs. they are still outstanding :( ✨
raise TargetValidationError, "no provider for #{target_name}" if ems.nil? | ||
|
||
raise TargetValidationWarning, "no metrics endpoint found for #{target_name}" if metrics_connection(ems).nil? | ||
raise TargetValidationWarning, "metrics authentication isn't valid for #{target_name}" unless metrics_connection_valid?(ems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check if the ems is paused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should do that up in core perf_capture_timer since that is going to be the same for all providers, good call though. I'm a little wary of that just because it feels like every single queue method would have to implement this check but I don't know that there's a more central place to handle it and this is a real-world problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FactoryBot.create(:kubernetes_node, :name => 'node', :ext_management_system => ems, :ems_ref => 'target').tap do |node| | ||
node.computer_system.hardware = FactoryBot.create(:hardware, :cpu_total_cores => 2, :memory_mb => 2_048) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets keep. But in the future if we merge ManageIQ/manageiq#22565 this could be:
FactoryBot.create(:kubernetes_node, :name => 'node', :ext_management_system => ems, :ems_ref => 'target').tap do |node| | |
node.computer_system.hardware = FactoryBot.create(:hardware, :cpu_total_cores => 2, :memory_mb => 2_048) | |
end | |
FactoryBot.create(:kubernetes_node, :name => 'node', :ext_management_system => ems, :ems_ref => 'target', :hardware => FactoryBot.create(:hardware, :cpu_total_cores => 2, :memory_mb => 2_048)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yeah I wanted to keep this dependent-less for simpler backport but that's a much better way to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. keep. reduce dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these updates look good.
introduced by: ManageIQ/manageiq-providers-kubernetes#493 This makes sure the ems is all good and tests pass
Backported to
|
…_before_queueing_perf_captures Check metrics authentication validity before perf_capture_queue (cherry picked from commit ba04739)
introduced by: ManageIQ/manageiq-providers-kubernetes#493 This makes sure the ems is all good and tests pass
Check that the EMS's metrics authentication is valid before queueing perf_capture calls, potentially flooding the queue with calls that are going to fail later.
requires: