Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-963 revert most of cert watching #312

Merged
merged 1 commit into from
Apr 3, 2023

Conversation

jotak
Copy link
Member

@jotak jotak commented Mar 30, 2023

Reverting most of certificate watching (NETOBSERV-684) as it generates a lot of pods restart.

We did not necessarily have to do this certificate watching as CM/secrets are updated within volumes.

We might however monitor carefully if new (or old) problems arise, potentially due to the kubelet sync delay for updating volumes

And also make sure certificates aren't cached in our different workloads

Reverting most of certificate watching (NETOBSERV-684) as it generates a
lot of pods restart.

We did not necessarily have to do this certificate watching as
CM/secrets are updated within volumes.

We might however monitor carefully if new (or old) problems arise,
potentially due to the kubelet sync delay for updating volumes

And also make sure certificates aren't cached in our different workloads
@jotak jotak changed the title NETOBSERV-963, revert most of cert watching NETOBSERV-963 revert most of cert watching Mar 30, 2023
@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Merging #312 (8447ed6) into main (55f6bd7) will decrease coverage by 0.02%.
The diff coverage is 80.00%.

@@            Coverage Diff             @@
##             main     #312      +/-   ##
==========================================
- Coverage   50.41%   50.39%   -0.02%     
==========================================
  Files          43       43              
  Lines        5096     5072      -24     
==========================================
- Hits         2569     2556      -13     
+ Misses       2318     2311       -7     
+ Partials      209      205       -4     
Flag Coverage Δ
unittests 50.39% <80.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
controllers/ebpf/agent_controller.go 83.25% <0.00%> (+1.00%) ⬆️
controllers/reconcilers/client_helper.go 46.09% <ø> (ø)
controllers/flowlogspipeline/flp_common_objects.go 85.25% <50.00%> (-0.03%) ⬇️
pkg/helper/certificates.go 75.30% <66.66%> (-0.31%) ⬇️
controllers/consoleplugin/consoleplugin_objects.go 92.74% <100.00%> (-0.03%) ⬇️
...trollers/consoleplugin/consoleplugin_reconciler.go 62.96% <100.00%> (+0.78%) ⬆️
controllers/flowcollector_controller.go 53.90% <100.00%> (-0.19%) ⬇️
controllers/flowlogspipeline/flp_ingest_objects.go 74.62% <100.00%> (ø)
...trollers/flowlogspipeline/flp_ingest_reconciler.go 63.71% <100.00%> (+1.32%) ⬆️
...ntrollers/flowlogspipeline/flp_monolith_objects.go 88.88% <100.00%> (ø)
... and 3 more

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@stleerh
Copy link
Contributor

stleerh commented Mar 30, 2023

The solution was to:
remove cert watcher annotations, and make sure we don't cache certificates in FLP / console plugin

I see lots of code removed for the watcher, but were there code changes to remove caching?

@jotak
Copy link
Member Author

jotak commented Mar 30, 2023

@stleerh if there are any remaining cached data, that wouldn't be in the operator, but in flp/plugin/ebpf.
I reviewed the code for Loki and didn't find cached certificate. Haven't done so for Kafka yet - so yes, this is something to do.
Anyway we need to test it carefully and make sure it works in case of certificate rotation.

@nathan-weinberg
Copy link
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 30, 2023
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:981c0ae
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-981c0ae
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-981c0ae

They will expire after two weeks.

Catalog source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-981c0ae
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@msherif1234
Copy link
Contributor

@stleerh if there are any remaining cached data, that wouldn't be in the operator, but in flp/plugin/ebpf. I reviewed the code for Loki and didn't find cached certificate. Haven't done so for Kafka yet - so yes, this is something to do. Anyway we need to test it carefully and make sure it works in case of certificate rotation.

@jotak do u know of a way to force cert rotation ?

@jotak
Copy link
Member Author

jotak commented Mar 30, 2023

do u know of a way to force cert rotation ?

@msherif1234 no, apart from deleting / recreating the LokiStack, which would generate a new certificate

@jotak
Copy link
Member Author

jotak commented Mar 30, 2023

@msherif1234 no, apart from deleting / recreating the LokiStack, which would generate a new certificate

Actually this is wrong: while deleting + recreating LokiStack does delete and recreate the certificate CM, this is the same certificate that is inside. I didn't notice that the first time I tried.

@nathan-weinberg
Copy link
Contributor

I've done four different performance tests with this image and have not hit the FLP bug once

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Mar 30, 2023
@jpinsonneau
Copy link
Contributor

@stleerh if there are any remaining cached data, that wouldn't be in the operator, but in flp/plugin/ebpf.

Yes I guess we need to revisit eBPF agent kafka certificate reading that seems to be done at startup only:
https://github.com/netobserv/netobserv-ebpf-agent/blob/main/cmd/netobserv-ebpf-agent.go#L42
https://github.com/netobserv/netobserv-ebpf-agent/blob/main/pkg/agent/agent.go#L211

Copy link
Contributor

@OlivierCazade OlivierCazade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jotak
Copy link
Member Author

jotak commented Apr 3, 2023

/approve

@openshift-ci
Copy link

openshift-ci bot commented Apr 3, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Apr 3, 2023
@openshift-merge-robot openshift-merge-robot merged commit bb80b4f into netobserv:main Apr 3, 2023
acmenezes pushed a commit to acmenezes/network-observability-operator that referenced this pull request Apr 5, 2023
Reverting most of certificate watching (NETOBSERV-684) as it generates a
lot of pods restart.

We did not necessarily have to do this certificate watching as
CM/secrets are updated within volumes.

We might however monitor carefully if new (or old) problems arise,
potentially due to the kubelet sync delay for updating volumes

And also make sure certificates aren't cached in our different workloads

Signed-off-by: acmenezes <adcmenezes@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants