Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOC: request device configuration and publish metrics/info #3037

Merged
merged 17 commits into from
Mar 14, 2023

Conversation

rouming
Copy link
Contributor

@rouming rouming commented Feb 13, 2023

This work introduces two LOC related things if LOC configuration exists (LOC URL is set by a controller):

  1. Device configuration can be fetched from another source, aka LOC, if main controller is unreachable or returns an error.

  2. /info and /metrics are published to the LOC URL along with publishing to a controller. The main difference is that /info is made recurring for the LOC case, because LOC is supposed to be stateless so all /info structures are published without any sending error handling.

Most of the commits are API changes to support different destinations for publishing requests: controller, loc, local server.

@rouming rouming changed the title [RFC] Request device configugration from LOC [RFC] Request device configuration from LOC Feb 13, 2023
@rouming rouming force-pushed the request-config-from-loc branch from acab4ed to a8b5dc9 Compare February 13, 2023 17:01
@rouming
Copy link
Contributor Author

rouming commented Feb 13, 2023

Difference to the previous version:

  • Rebased on the latest master

@eriknordmark
Copy link
Contributor

  • cca1885 handleconfig: handle 'CertInvalid' and 'CertMiss' similar in the getLatestConfig()
    Do we really handle 'CertMiss' and 'CertInvalid' differently? I unify the logic.

They are different things; CertMiss is about the controller (signing) certificate, and CertInvalid is about the TLS certificate chain. We should probably add that information in the comments for the enums to make this more clear.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some detailed comments in the diffs.

@rouming rouming force-pushed the request-config-from-loc branch from a8b5dc9 to 507a84d Compare February 21, 2023 08:00
@rouming rouming changed the title [RFC] Request device configuration from LOC LOC: request device configuration and publish metrics/info Feb 21, 2023
@rouming
Copy link
Contributor Author

rouming commented Feb 21, 2023

Difference to the previous version:

  • Unite two LOC related works in one PR: fetch config, publish /info, /metrics
  • Remove config refactoring commits as not very much helpful

@rouming
Copy link
Contributor Author

rouming commented Feb 21, 2023

  • cca1885 handleconfig: handle 'CertInvalid' and 'CertMiss' similar in the getLatestConfig()
    Do we really handle 'CertMiss' and 'CertInvalid' differently? I unify the logic.

They are different things; CertMiss is about the controller (signing) certificate, and CertInvalid is about the TLS certificate chain. We should probably add that information in the comments for the enums to make this more clear.

Commit is removed.

Copy link
Contributor

@milan-zededa milan-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting eden tests, but left some review comments + there are some revive issues worth fixing.

// destination types, where info should be sent
const (
ControllerDest infoDest = 1
LocalServerDest = 2
Copy link
Contributor

@milan-zededa milan-zededa Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe start consistently denoting Local Profile Server with LPS (similarly to LOC abbreviation). Se here I would suggest LPSDest instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Do we have shorter name for the "Controller"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't. But LPS is more accurate that LocalServer.

TriggerHwInfo chan<- struct{}
TriggerDeviceInfo chan<- infoDest
TriggerHwInfo chan<- infoDest
TriggerCloudLocationInfo chan<- infoDest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these Trigger* fields can start with lower-case, zedagent context is not exported (I realize this mistake was there already).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@@ -235,10 +235,68 @@ var logger *logrus.Logger
var log *base.LogObject
var zedcloudCtx *zedcloud.ZedCloudContext

type infoDest uint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about publishDest instead to make it more clear and cover metrics as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe also with s or Set suffix to make it clear that a value of this type is essentially a set, built using bit flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix, infoDestBitset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to use a different name to cover metrics as well. publishDest or apiDest are names to consider.

pkg/pillar/cmd/zedagent/zedagent.go Outdated Show resolved Hide resolved
}

func triggerPublishAllInfo(ctxPtr *zedagentContext) {
func triggerPublishLocationToController(ctxPtr *zedagentContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be triggerPublishLocation (without ToController)?
Also triggerPublishLocationToControllerWithDest should be probably just triggerPublishLocationWithDest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can be removed.

pkg/pillar/cmd/zedagent/parseconfig.go Show resolved Hide resolved
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kick of regression tests

Are there some eden tests being added for the LOC part?

@eriknordmark
Copy link
Contributor

Dco is saying some commit is not signed

Copy link
Contributor

@milan-zededa milan-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I would suggest to squash this PR into a smaller number of commits for easier to navigate git history.

@eriknordmark
Copy link
Contributor

Looks like yetus is tripping on:
yetus: pkg/pillar/cmd/zedagent/handlelps.go#L118golangcilint: getconfigCtx.localServerMap undefined (type *getconfigContext has no field or method localServerMap) (typecheck)

)

const (
defaultLPSPort = "8888"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I suspect it is the renaming of the constants and other names which makes GH not realize this is a rename of a file; it reports it as a delete of the old name and a create of this file.
Don't see a way to make it treat it as a rename + set of content changes.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run eden again

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are still comment comment suggestions plus yetus failures of the form
golangcilint: getconfigCtx.localServerMap undefined (type *getconfigContext has no field or method localServerMap) (typecheck)
to address then we can merge this.

rouming added 17 commits March 9, 2023 11:10
getLatestConfig() function is by no means a generic function, it is
tightly coupled with the whole logic of how config is parsed, is read
from the file, etc, so we can construct and define the URL inside the
function. This will be especially useful in the next patches, where
config can be read also from the LOC server, so two sources of config
will be supported.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Don't set 'configGetStatus' to success if inhaleDeviceConfig()
has failed. In this patch inhaleDeviceConfig() is called before
statuses have been updated and published.

Also consider as fatal if proto marshal fails.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Parse 'LOCConfig' by assigning members of the 'types.LOCConfig'
copying them from the 'zconfig.LOCConfig' structure.

Set the 'locConfig' to nil if 'LocUrl' is invalid.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Request configuration from the LOC if attempt to get the configuration
from the controller has failed, but initial configuration was received
either from the controller (onboarded), either successfully read from
the file (/lastconfig).

The LOC config won't be used if it is obsolete, see the corresponding
check in the parseConfig() function.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
sendMetricsProtobuf() sends metrics to the main controller
and to the LOC as well. Don't block the execution context
by LOC send, so call it from a goroutine.

Also make sendMetricsProtobuf() private by renaming it.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Don't access any members of the zedcloud context from the deferred
context, well, except only log. The main goal of this patch is to
support multiple deferred contexts for one zedcloud in the next
patch.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
…ation

All errors will be ignored and an item will be removed from the queue
regardless of an error while send, if 'ignoreErr' is passed as true.
The default behavior (ignoreErr == false) to retry sending from
the message which failed. Ignoring errors will be needed for the
LOC /info updates.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Need a generic API for kicking the timer for immediate execution

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Special deferred context for periodic requests ("info"
requests in the next patches), where errors can be
ignored and run to a completion happens in a separate
goroutine in order not to block the main event loop.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
…tion

For now we have two destinations for the "info" message: actual
controller (cloud) and local server. By default "info" message
is not periodic, but is sent only on update, which causes a
problem for the stateless LOC implementation (not part of this
patch set).

The idea of a new 'destinationBitset' enum is to have a
controller and a local server be served as usual - by the "info"
messages updates, and only LOC should receive periodic "info"
messages. In order to send periodic "info" messages only to
LOC we need to define destination type (the purpose of this
patch).

This patch defines a type for different destinations and a
function 'queueInfoToDest()' which schedules deferred
requests and runs to a completion for the cloud case and
kicks the timer and offloads the execution for the LOC case.

The LOC deferred request is special: created with the ignore
error flag and runs to a completion in the separate goroutine.

In the following patches type will be used for all Publish*()
calls.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Nothing to look at, just hide a couple of members and make
them private since are not used from the other packages.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
…h*() calls

Pass the destination type to all sort of "info" Publish*() calls.
By default 'AllDest' (all three destinations: controller, LPS,
LOC) type is passed. In the next patches the destination will
be used.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Now all functions which publish "info" messages use multiple URLs.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
This flag forces deferred request to be created on the periodic
queue regardless of the destination type. This will be needed
for the location "info" requests, which are always periodic.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Switch location pulishing to the deferred requests for multiple
destination.

Location publishing happens recurrently, thus errors for all
destinations must be ignored.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Just reuse the same timer which sends metrics update for the
all info case only for the LOC as a destination.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
@rouming rouming force-pushed the request-config-from-loc branch from c0ce191 to 91f617a Compare March 9, 2023 10:11
@rouming
Copy link
Contributor Author

rouming commented Mar 9, 2023

Difference to the previous version:

  • Rebased on master
  • Removed all commits related to the renaming of the LocalServer to LPS etc. I do not feel comfortable to commit these changes in this PR (especially because don't want to fight with yetus, which complains about the variable which was renamed, so basically the member 'localServerMap' is 'lpsMap' now)

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark
Copy link
Contributor

I'll kick off another run, but @rouming can you figure out why TestEdenScripts/port_forward fails 2/2 times in https://github.com/lf-edge/eve/actions/runs/4376395843/jobs/7658444261 and the eden tests run out of time?

@rouming
Copy link
Contributor Author

rouming commented Mar 10, 2023

I'll kick off another run, but @rouming can you figure out why TestEdenScripts/port_forward fails 2/2 times in https://github.com/lf-edge/eve/actions/runs/4376395843/jobs/7658444261 and the eden tests run out of time?

Can try

@rouming
Copy link
Contributor Author

rouming commented Mar 13, 2023

I'll kick off another run, but @rouming can you figure out why TestEdenScripts/port_forward fails 2/2 times in https://github.com/lf-edge/eve/actions/runs/4376395843/jobs/7658444261 and the eden tests run out of time?

Can try

@eriknordmark Port forwarding does not work. As far as I can tell all nat iptables rules exist (2223->22), but indeed you can't ssh to the app using eve ip addr: $ ssh 192.168.0.10 -p2223, but directly app-vm accepts the connection: $ ssh 10.11.12.2, I will try to debug further.

These eden tests successfully passed on this PR on my local machine:
./eden test tests/eclient -e profile -v debug
./eden test tests/eclient -e radio_silence -v debug
./eden test tests/eclient -e dev_local_info -v debug

so I'm about to rebase and merge this PR.

@rouming rouming merged commit 9c89b1a into lf-edge:master Mar 14, 2023
@rouming rouming deleted the request-config-from-loc branch March 14, 2023 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants