-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOC: request device configuration and publish metrics/info #3037
Conversation
acab4ed
to
a8b5dc9
Compare
Difference to the previous version:
|
They are different things; CertMiss is about the controller (signing) certificate, and CertInvalid is about the TLS certificate chain. We should probably add that information in the comments for the enums to make this more clear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some detailed comments in the diffs.
a8b5dc9
to
507a84d
Compare
Difference to the previous version:
|
Commit is removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting eden tests, but left some review comments + there are some revive issues worth fixing.
pkg/pillar/cmd/zedagent/zedagent.go
Outdated
// destination types, where info should be sent | ||
const ( | ||
ControllerDest infoDest = 1 | ||
LocalServerDest = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maybe start consistently denoting Local Profile Server with LPS (similarly to LOC abbreviation). Se here I would suggest LPSDest
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Do we have shorter name for the "Controller"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't. But LPS is more accurate that LocalServer.
pkg/pillar/cmd/zedagent/zedagent.go
Outdated
TriggerHwInfo chan<- struct{} | ||
TriggerDeviceInfo chan<- infoDest | ||
TriggerHwInfo chan<- infoDest | ||
TriggerCloudLocationInfo chan<- infoDest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these Trigger* fields can start with lower-case, zedagent context is not exported (I realize this mistake was there already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
pkg/pillar/cmd/zedagent/zedagent.go
Outdated
@@ -235,10 +235,68 @@ var logger *logrus.Logger | |||
var log *base.LogObject | |||
var zedcloudCtx *zedcloud.ZedCloudContext | |||
|
|||
type infoDest uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about publishDest
instead to make it more clear and cover metrics as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe also with s
or Set
suffix to make it clear that a value of this type is essentially a set, built using bit flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bitset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix, infoDestBitset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to use a different name to cover metrics as well. publishDest or apiDest are names to consider.
pkg/pillar/cmd/zedagent/zedagent.go
Outdated
} | ||
|
||
func triggerPublishAllInfo(ctxPtr *zedagentContext) { | ||
func triggerPublishLocationToController(ctxPtr *zedagentContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be triggerPublishLocation
(without ToController
)?
Also triggerPublishLocationToControllerWithDest
should be probably just triggerPublishLocationWithDest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kick of regression tests
Are there some eden tests being added for the LOC part?
Dco is saying some commit is not signed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I would suggest to squash this PR into a smaller number of commits for easier to navigate git history.
Looks like yetus is tripping on: |
pkg/pillar/cmd/zedagent/handlelps.go
Outdated
) | ||
|
||
const ( | ||
defaultLPSPort = "8888" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I suspect it is the renaming of the constants and other names which makes GH not realize this is a rename of a file; it reports it as a delete of the old name and a create of this file.
Don't see a way to make it treat it as a rename + set of content changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run eden again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are still comment comment suggestions plus yetus failures of the form
golangcilint: getconfigCtx.localServerMap undefined (type *getconfigContext has no field or method localServerMap) (typecheck)
to address then we can merge this.
getLatestConfig() function is by no means a generic function, it is tightly coupled with the whole logic of how config is parsed, is read from the file, etc, so we can construct and define the URL inside the function. This will be especially useful in the next patches, where config can be read also from the LOC server, so two sources of config will be supported. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Don't set 'configGetStatus' to success if inhaleDeviceConfig() has failed. In this patch inhaleDeviceConfig() is called before statuses have been updated and published. Also consider as fatal if proto marshal fails. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Parse 'LOCConfig' by assigning members of the 'types.LOCConfig' copying them from the 'zconfig.LOCConfig' structure. Set the 'locConfig' to nil if 'LocUrl' is invalid. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Request configuration from the LOC if attempt to get the configuration from the controller has failed, but initial configuration was received either from the controller (onboarded), either successfully read from the file (/lastconfig). The LOC config won't be used if it is obsolete, see the corresponding check in the parseConfig() function. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
sendMetricsProtobuf() sends metrics to the main controller and to the LOC as well. Don't block the execution context by LOC send, so call it from a goroutine. Also make sendMetricsProtobuf() private by renaming it. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Don't access any members of the zedcloud context from the deferred context, well, except only log. The main goal of this patch is to support multiple deferred contexts for one zedcloud in the next patch. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
…ation All errors will be ignored and an item will be removed from the queue regardless of an error while send, if 'ignoreErr' is passed as true. The default behavior (ignoreErr == false) to retry sending from the message which failed. Ignoring errors will be needed for the LOC /info updates. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Need a generic API for kicking the timer for immediate execution Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Special deferred context for periodic requests ("info" requests in the next patches), where errors can be ignored and run to a completion happens in a separate goroutine in order not to block the main event loop. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
…tion For now we have two destinations for the "info" message: actual controller (cloud) and local server. By default "info" message is not periodic, but is sent only on update, which causes a problem for the stateless LOC implementation (not part of this patch set). The idea of a new 'destinationBitset' enum is to have a controller and a local server be served as usual - by the "info" messages updates, and only LOC should receive periodic "info" messages. In order to send periodic "info" messages only to LOC we need to define destination type (the purpose of this patch). This patch defines a type for different destinations and a function 'queueInfoToDest()' which schedules deferred requests and runs to a completion for the cloud case and kicks the timer and offloads the execution for the LOC case. The LOC deferred request is special: created with the ignore error flag and runs to a completion in the separate goroutine. In the following patches type will be used for all Publish*() calls. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Nothing to look at, just hide a couple of members and make them private since are not used from the other packages. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
…h*() calls Pass the destination type to all sort of "info" Publish*() calls. By default 'AllDest' (all three destinations: controller, LPS, LOC) type is passed. In the next patches the destination will be used. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Now all functions which publish "info" messages use multiple URLs. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
This flag forces deferred request to be created on the periodic queue regardless of the destination type. This will be needed for the location "info" requests, which are always periodic. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Switch location pulishing to the deferred requests for multiple destination. Location publishing happens recurrently, thus errors for all destinations must be ignored. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Just reuse the same timer which sends metrics update for the all info case only for the LOC as a destination. Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
c0ce191
to
91f617a
Compare
Difference to the previous version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'll kick off another run, but @rouming can you figure out why TestEdenScripts/port_forward fails 2/2 times in https://github.com/lf-edge/eve/actions/runs/4376395843/jobs/7658444261 and the eden tests run out of time? |
Can try |
@eriknordmark Port forwarding does not work. As far as I can tell all nat iptables rules exist (2223->22), but indeed you can't ssh to the app using eve ip addr: $ ssh 192.168.0.10 -p2223, but directly app-vm accepts the connection: $ ssh 10.11.12.2, I will try to debug further. These eden tests successfully passed on this PR on my local machine: so I'm about to rebase and merge this PR. |
This work introduces two LOC related things if LOC configuration exists (LOC URL is set by a controller):
Device configuration can be fetched from another source, aka LOC, if main controller is unreachable or returns an error.
/info and /metrics are published to the LOC URL along with publishing to a controller. The main difference is that /info is made recurring for the LOC case, because LOC is supposed to be stateless so all /info structures are published without any sending error handling.
Most of the commits are API changes to support different destinations for publishing requests: controller, loc, local server.