-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check cert dates #2952
Check cert dates #2952
Conversation
c702d84
to
7defb9f
Compare
pkg/pillar/zedcloud/authen.go
Outdated
@@ -511,6 +512,20 @@ func VerifySigningCertChain(log *base.LogObject, certs []*zcert.ZCert) ([]byte, | |||
} | |||
} | |||
} | |||
|
|||
// verify date range of certificates | |||
for _, cert := range certs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function gets called very time we receives a message from the cloud, to check the validity on every time is not necessary. have a way to check say once a day would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I don't know if it is better having a separate timer for this, or just have this function track the last time it checked the certificates.
I try to understand the consumer of the novel information you try to show. You set check inside the handler of |
@giggsoff it is useful to have some warnings from the devices, since currently when the controller certificates are about to expire there is no indication until 1) they have expired and 2) a device is installed or rebooted, since there are no periodic checks at all in running devices. |
certName, | ||
-validDuration) | ||
certValid = false | ||
CurrCertStatus.CertStatus = aCertIsINVALID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function which checks this should also 1) stop accepting new configuration from the controller 2) periodically fetch from /certs to get updated and valid certificate(s), and 3) put the device in maintenance mode until valid certs are received.
We can do 3) as a separate PR now or later; requires definiting a new MaintenanceModeReason value similar to the vaultlocked, and have EVE-OS set that new value and then clear it later. #3 makes sure that things are clearly flagged should be controller send unusable invalid certificate(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function posts logs based on the particular threshold. (Default is 60 days prior a Notice log, escalating to Warning at 30 days -- both deprived from the config, and thus can be administratively set). Obviously, it posts a Failing log when out of date.
I can refactor this into a function to test the cert against a given date, returning the cert state based on the date specified, and a higher level function that evaluates the cert with the current date/time, and do the logging in a caller to that function. Thought about that from the start, but opted for what seemed to be the most straight forward approach.
It's about an hour or so to do the refactor. Let me know if you prefer it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is a function in the x509 package to to the validation based on a given date.
And I swear I added the name of that function in a comment in this PR a few weeks back; not the first time I find the gh looses comments.
Set CurrentTime with and call https://pkg.go.dev/crypto/x509#Certificate.Verify just like we do elsewhere in EVE-OS code.
certInfoThreshold = 30 | ||
certWarnThreshold = 15 | ||
) | ||
|
||
func verifySignature(log *base.LogObject, certByte []byte, interm *x509.CertPool) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this function does not get called until a new certificate chain is downloaded from the controller. We want a periodic verification that the chain is still valid, which means having zedagent run a timer to verify the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look, but I thought it was executed with roughly the same frequency of where I originally had the check code. I obviously need to take a better look.
CertExpireInfo GlobalSettingKey = "timer.certcheck.infolog" | ||
CertExpireWarn GlobalSettingKey = "timer.certcheck.warnlog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, these are not used.
|
||
var ( | ||
lastCertTimeCheck time.Time | ||
CurCertStatus = allCertsAreValid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a global variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best way I could think to expose a warning. Also, a way to clear once any problems have been cleared.
certInfoThreshold = 30 | ||
certWarnThreshold = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these removed and use the variables you added to types/global.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a place holder until I understand and create the appropriate accessors.
@@ -559,6 +576,43 @@ func verifySignature(log *base.LogObject, certByte []byte, interm *x509.CertPool | |||
log.Errorln("verifySignature: " + errStr) | |||
return errors.New(errStr) | |||
} | |||
|
|||
// verify cert validity dates against thresholds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is both simpler and more correct to do not this here, but instead add a "when" argument to verifySignature() and call verifySignature() with different values of when (not+30 days etc)
And the feed what when into x509.VerifyOptions so that the leafcert.Verify above will do all the checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds better, I assume that I t would be better to implement a timer callout that fetches the thresholds, locates the cert chain (or chains) to process, and then call verifySignature with each of the threshold values.
Sound more what you'd like to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, should we return a failure, or just log on the threshold events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now let's get the logging in place (with the periodic check) and in a future PR once we've gone through a certificate roll in production, we can add the hard fail.
In terms of placement of the periodic check, after you started this work we've made controllerCertsTask() periodically try to re-fetch the certificates. It probably makes sense having that function also re-verify the validity of the certificates both each time when it fetched new ones, and infrequently (once a day?) when it did not fetch new certificates.
check cert valid for date, info at longest threshold, warn at the shortest threshold, and error when out of date. Attempt to include these thresholds within the global configuration data. This commit for review and help integrating service into the rest of EVE. SEE: EVE-125
Correct use of threshold inputs to days. Make default log a debug/trace message Config name change to better clarify its use.
Use the `Verify` interface to a x509 Certificate to validate the Certificate for the next fifteen or thirty days. Events are logged and currently return an error to the caller. Also reflects the most urgent state seen in `CurCertStatus`. A WARN log is issued for a cert expiring in the next fifteen days, and INFO if within the next thirty days. Had to pull in the `logrus` package for an INFO log. Signed-off-by: Gerald (Bob) Lee <gerald@zededa.com>
9205aa6
to
bff95fd
Compare
This changes the TAG variable from being the branch to the actual tag e.g., 10.1 vs. 10.1.0, which is needed so we can pull the correct container from docker hub. Signed-off-by: eriknordmark <erik@zededa.com>
The format changes in the new version we pulled in PR lf-edge#3293 resulted in producing log entries in /persist/newlog/collect/ which are effectively empty. The change in memlogd as seen by logread is a common vs a a semicolon but the underlying change is to produce json which we need to parse in newlogd. Signed-off-by: eriknordmark <erik@zededa.com>
The old getSourceFromMsg() is no longer needed, and the repaireMsg() has become unnecessary. Signed-off-by: eriknordmark <erik@zededa.com>
The messages from containerd are not in json format but with time="X" level=Y msg="Z". This adds back the ability to parse those (without trying to repair some json format). Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
when grub queries for available disks it doesn't take into account that the disk can be removable e.g. USB stick. The disk can appear in front of regular HDDs and the numbering will be different e.g. hd0 become hd1 when the USB stick is plugged in. It is not a problem for GRUB to find a correct partition in this case and the system can be booted just fine. However every command from grub.cfg is measured into PCR8 while being executed and HDD names appear in those commands e.g. 'set root=(hd2,gpt5)'. If any key is sealed into TPM using PCR8 then that key cannot be unsealed when a random USB stick is inserted (or removed if it was inserted when the key was sealed) the original issue should not affect PC BIOS case because USB devices are usually emulated as either CD or floppy drives and have their unique numbering Signed-off-by: Mikhail Malyshev <mikem@zededa.com>
When a cloned ./grub repository exists in the pkg/grub it is taken instead of tar file. autogen.sh must be run in this case and gawk util is required Signed-off-by: Mikhail Malyshev <mikem@zededa.com>
…nstanceStatus. This commit introduces robust serialization and deserialization functionality for the snapshot instance status, ensuring continuity of snapshot lists after system reboots. Key changes: Addition of the serializeSnapshotInstanceStatus function, which serializes the SnapshotInstanceStatus into a JSON-formatted file. Addition of the deserializeSnapshotInstanceStatus function. During deserialization, comprehensive field checks are performed. These checks ensure that all expected fields are present, and unknown fields are appropriately flagged. This ensures a safe deserialization process even when the struct format changes over time. We've defined a list of critical fields essential for snapshot restoration. If any of these critical fields are missing in the serialized data, the deserialization will report an error, avoiding potential runtime issues. The moveSnapshotToAvailable function was updated to call the serialization function. The restoreAvailableSnapshots function was added to the Run function in zedmanager.go. It handles the restoration of the snapshot status for each snapshot present in the snapshots directory after a system reboot. With these changes, our snapshot management becomes more resilient and flexible, easily accommodating changes in the SnapshotInstanceStatus structure. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces several utility functions in the zedmanagertypes.go file. These functions are designed to facilitate the access of snapshot-related files across different modules. Each of these functions constructs the appropriate file path using the provided snapshot ID, enabling easy access to the corresponding snapshot-related files. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
Enhanced `lookupAppInstanceConfig` function to include an additional boolean parameter that checks local AppInstanceConfig when set to true. Also, added new function `lookupLocalAppInstanceConfig` to retrieve local AppInstanceConfig. Added subscription and publication for local AppInstanceConfig in the zedmanager context. This serves as a placeholder for a config copy that Eve reverts to when necessary. So, this subscriptions is checked first now. It includes logic to use local AppInstanceConfig in `checkDelayedStartApps` and `handleModify`. Updated all calls to `lookupAppInstanceConfig` in multiple files to include the new boolean parameter. Implemented `unpublishLocalAppInstanceConfig` and `publishLocalAppInstanceConfig` for managing local AppInstanceConfig publication. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…unt. The increment in the reference count for VolumeRefConfig was removed. This operation was intended to prevent a volume from being deleted when it's temporarily not in use by any application. However, this extra layer of protection is not necessary as VolumeConfig is sufficient to maintain the corresponding file. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
Remove serialization of VolumeRefStatus to avoid persistent subscription issues during Eve upgrades. This commit removes the serialization and deserialization of VolumeRefStatus to avoid potential problems with persistent subscriptions during Eve upgrades. Changes in struct formats during an upgrade could lead to issues, hence this preventative measure. Specifically, the serialization and deserialization methods for VolumeRefStatus have been removed, along with the related helper methods such as getVolumeRefStatusFilename. Consequently, all error handling tied to these operations has also been discarded. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…n names for clarity. This commit refactors the names of several serialization and deserialization functions for improved clarity and code readability. The changes include: - `restoreConfigFromSnapshot` is now `restoreAppInstanceConfigFromSnapshot` - `addFixupsIntoSnappedConfig` is now `fixupAppInstanceConfig` - `deserializeConfigFromSnapshot` is now `deserializeAppInstanceConfigFromSnapshot` - `saveConfigForSnapshots` is now `saveAppInstanceConfigForSnapshot` - `serializeConfigToSnapshot` is now `serializeAppInstanceConfigToSnapshot` The function argument lists and bodies remain unchanged; only the function names have been modified. The new names make it clear that these functions pertain to AppInstanceConfigs, improving their self-documentation and making the codebase easier to understand and maintain. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit replaces the use of locally defined functions and string manipulations for deriving snapshot-related directory paths with newly defined, centralized functions. Replaced `getSnapshotDir` function and subsequent filename derivation with `types.GetSnapshotDir` and `types.GetSnapshotAppInstanceConfigFile` respectively. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
Refactoring of handlers for VolumesSnapshotStatus: The functions handleVolumesSnapshotStatusCreate, handleVolumesSnapshotStatusModify, and handleVolumesSnapshotStatusDelete have been refactored. Now, they call a new function handleVolumesSnapshotStatusImpl which takes in the context and volumesSnapshotStatus as arguments. Depending on the ResultOfAction in the volumesSnapshotStatus, it calls one of the new functions reactToSnapshotCreate, reactToSnapshotRollback, or reactToSnapshotDelete. Separate handlers for different snapshot actions: The new functions reactToSnapshotCreate, reactToSnapshotRollback, and reactToSnapshotDelete handle the respective snapshot actions. Each of these functions fetches the appInstanceStatus and performs specific operations based on the action and whether there is an error in the volumesSnapshotStatus. Snapshot deletion handled separately: A new function DeleteSnapshotFiles has been added to delete snapshot files. It checks if a directory exists and then removes it. This is used by reactToSnapshotDelete to remove snapshot files. Changes in error handling: The error handling has been adjusted to consider the error severity in reactToSnapshotCreate. Adjustments in handleVolumesSnapshotStatusDelete: Now, it sets the ResultOfAction in volumesSnapshotStatus to VolumesSnapshotDelete before calling handleVolumesSnapshotStatusImpl. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
In updatestatus.go, the check for RollbackInProgress has been removed from the updateAIStatusUUID and doUpdate functions, thus not skipping the update process during a rollback. We should not skip now, as the config version is taken from a special local subscription that provides us with the proper version. The check for RollbackInProgress has also been removed from the doInstall function. Instead, a new check for RollbackInProgress is added after checking if an app purge is in progress. This means the system will not activate an application if a rollback is still in progress. In zedmanager.go, the triggering of the rollback has been moved to the handleModify function from the doUpdate function. This means a rollback request can now be handled directly when the app configuration is modified. The initiation of the rollback now sets HasRollbackRequest and RollbackInProgress to true. The rollback also publishes the application instance status and updates the configuration to the one from the snapshot. The handleModify function also has changes to the purge handling. The needPurge condition now has one more option: if HasRollbackRequest is true. This is to ensure that a purge occurs during a rollback (we need it to upgrade app instance volumes). The rollback handling that was previously in handleModify has been removed, as it's now handled earlier in the same function. In zedmanagertypes.go, the comments for HasRollbackRequest and RollbackInProgress in the SnapshottingStatus struct have been updated to reflect the new handling of rollbacks. The ConfigBeforeRollback field has been removed because the rollback flow is now controlled by the RollbackInProgress field, and the version of the configuration available in the module is the correct one, not the previous one. Hence, it doesn't need to store the previous configuration version. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
…, synced one. Substituted os.WriteFile with fileutils.WriteRename for a snapshot-related file operation in zedmanager.go. This alteration ensures atomicity and synchronicity in disk writes, preventing the creation of partially written files during snapshot serialization. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces a function to clean up and remove unused VolumeRefStatuses. The function iterates through the VolumeRefStatusList and checks if the volume is still referenced in the current configuration. If not, the volume status is removed from the list and its associated VolumeRefConfig is unpublished. This ensures that any volume statuses that are no longer in use are properly cleaned up. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit significantly revises the triggerRollback function within the Zedmanager's updatestatus module. Instead of restoring VolumeRefStatuses and the AppInstanceConfig from a snapshot, the function now creates or updates a VolumesSnapshotConfig to indicate a rollback action. If a VolumesSnapshotConfig is found for the snapshot in question, its action is set to rollback. If no matching VolumesSnapshotConfig is found, a new one is created with the relevant VolumeIDs from the current AppInstanceStatus, and the action set to rollback. This change simplifies the rollback process and makes it more resilient to system reboots. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces snapshot cleanup during the uninstallation process of an application. Now, when an application is being uninstalled, all related snapshot files will be deleted. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
The comment about the ConfigID being the ID of the config that created the snapshot is no longer relevant or accurate, so it has been removed to avoid confusion. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces significant changes to the handling of volume snapshots in our system. Key changes include: In volumemgr.go, the handlers for creating, modifying, and deleting volume snapshots have been reworked. The new handlers are `handleVolumesSnapshotConfigCreate`, `handleVolumesSnapshotConfigModify`, and `handleVolumesSnapshotConfigDelete`. These handlers now call a common implementation function `handleVolumesSnapshotConfigImpl`. In handlesnapshot.go, the functions `handleVolumesSnapshotCreate`, `handleVolumesSnapshotModify`, and `handleVolumesSnapshotDelete` have been significantly reworked. They now call `handleVolumesSnapshotConfigImpl` with the appropriate action set in the snapshot config. The function `handleVolumesSnapshotConfigImpl` handles the creation, rollback, and deletion of snapshots based on the action in the config. New functions `createSnapshot`, `rollbackSnapshot`, and `deleteSnapshot` have been introduced in handlesnapshot.go to handle the respective actions. The functions `publishVolumesSnapshotStatus` and `unpublishVolumesSnapshotStatus` remain largely unchanged, but the function `lookupVolumesSnapshotStatus` has been updated to check if the snapshot status exists in the store if it is not found in the published state. It's necessary to be able to do rollback after the system reboot. For that a new function `deserializeVolumesSnapshotStatus` has been introduced to read the snapshot status from a file. In updatestatus.go, the function `triggerSnapshotDeletion` has been updated to set the `Action` of `volumesSnapshotConfig` to `types.VolumesSnapshotDelete` before unpublishing the snapshot config. It's an optional change. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
… in handleDeferredVolumeCreate. After reworking the snapshot serialization, volume status is no longer stored. As a result, it becomes an unexpected situation when a status exists at the moment of calling this function. Therefore, the log level has been changed from Warning to Fatal to reflect the seriousness of this unexpected state. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit refactors the deserialization logic used in volumemgr and zedmanager packages into a new utility function, utils.DeserializeToStruct, to reduce code duplication and improve readability. The redundant code in the handleSnapshot and handlevolumemgr files is replaced with this new utility function. In addition, a new utility file deserialize.go is introduced to contain the shared deserialization logic. The related imports are also updated accordingly. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
Refactor the DeserializeToStruct function to accept a pointer to an empty structure instead of requiring the caller to provide a reflect.Type and separately manage critical fields. Remove the usage of reflect package from snapshot handling methods. Reflected code has been replaced by direct deserialization into the structs. Replaced the critical fields maps in VolumesSnapshotStatus and SnapshotInstanceStatus structures with struct tags. Introduced 'mandatory' tag to mark critical fields directly in the struct declaration. This approach is more idiomatic and makes it easier to understand which fields are critical. Updated extractFields function to identify the mandatory fields based on the new 'mandatory' struct tag. Simplified error handling in DeserializeToStruct function by directly returning formatted error messages. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
This commit introduces a suite of tests for the DeserializeToStruct function in the utils package. The tests cover various scenarios including simple struct, error conditions, mandatory field logic, extra field in file, missing non-mandatory field in file, struct with anonymous field, nested structs, array of structs, and anonymous structs. These tests ensure the robustness and reliability of the DeserializeToStruct function. Signed-off-by: Nikolay Martyanov <nikolay@zededa.com>
Signed-off-by: Avi Deitcher <avi@deitcher.net>
Some pillar/types on which Edgeview depends were changed during the Zedrouter refactoring. Please note that these types are not persisted, so we do not concern ourselves with backward compatibility (as it would significantly limit development). The Edgeview changes done in this commit only affects the server side, meaning that users will not have to update the client-side container. Signed-off-by: Milan Lenco <milan@zededa.com>
Add GNU tar to the container, otherwise running collect-info.sh script while the device is offline fails to gzip the collected logs, because the BusyBox tar lacks some features compared to GNU version. Signed-off-by: Shahriyar Jalayeri <shahriyar@zededa.com>
Signed-off-by: Shahriyar Jalayeri <shahriyar@zededa.com>
Signed-off-by: Shahriyar Jalayeri <shahriyar@zededa.com>
check cert valid for date, info at longest threshold, warn at the shortest threshold, and error when out of date. Attempt to include these thresholds within the global configuration data. This commit for review and help integrating service into the rest of EVE. SEE: EVE-125
@famleebob looks like the PR got messed up since it is showing there are 126 commits. |
@famleebob are you still keen to continue work on this? or we can close? |
Labeled as stalled. Closing this because no activity. |
This is a draft pull request, for review and discussion. Needs to be re-based, and interface work added.