Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Secondary metadata error #1612

Merged
merged 4 commits into from
Mar 26, 2020
Merged

Conversation

lbonn
Copy link
Contributor

@lbonn lbonn commented Mar 23, 2020

The test is done with a simple fiu_fail

rotateSecondaryRoot(Uptane::RepositoryType::Image(), *(sec->second));
if (!sec->second->putMetadata(meta)) {
LOG_ERROR << "Sending metadata to " << sec->first << " failed";
put_meta_succeed = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we abort if any installation fails? Or is that a separate problem to solve since we don't handle atomicity well at present and that's a bigger topic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. After further thinking, it might be worse than ignoring the check as it allows an hostile secondary to block a primary (and other secondaries) update, even if the metadata checks out from the PoV of the primary.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point. I would think that would be okay, but we need to properly decide what to do in these cases and this is decidedly out of the scope of this work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that even if we don't interrupt on the first putMetadata error anyway sendFirmware/install won't be called even for those Secondaries for which putMetdata was successful.
I agree with Patrick that it has to be discussed with a wider audience including product owners/managers, personally, I think aktualizr should carry on "metadata putting" and installation/update process if there is at least one "successful" case and then retry if there were non-verification/validation errors and then report a result to the backend...

if (fiu_fail("secondary_putmetadata") != 0) {
return false;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go in VirtualSecondary instead of ManagedSecondary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though it needs an overriding of the method there but no big deal.

if (!sendMetadataToEcus(updates)) {
result.dev_report = {false, data::ResultCode::Numeric::kInternalError, "Metadata verification failed"};
return std::make_tuple(result, "Secondary metadata verification failed", false);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming: if this fails, does the backend recognize it and abort the update, or does it stay pending and cause aktualizr to retry it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a good question. It's out of the scope of the given PR nevertheless I think it makes sense to return something more meaningful to putMetadata, not just bool like we do for install. So, the follow-up behavior can depend on an error type, e.g. if it's some network-related error then it makes sense to retry if Uptane verification of metadata actually fails then there is no point to retry.

I suppose proper testing of the actualize behavior after putMetadata failure requires adding test(s) to the python-based tests and OTF tests in order to see how it behaves if the uptane cycle is running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickvacek I just had a tried and the installation shows as failed on the UI, with nothing pending.

However, if I re-run aktualizr check after the failure, it still shows 1 new update to install, which it will do if aktualizr once is called without the injected error.

So it's not ideal but it sounds like we can't really fix that on the client?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this actually might require a client-side fix. I'm wondering if we need to drop the Director Targets metadata in this case (and all the similar returns for the metadata checks and such above this in this function). I want to avoid more of these cases where we get stuck in a loop, and as long as the empty targets optimization is in place, we have to work around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right, it doesn't persist if we drop the metadata. That's what I pushed in the last version.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to know that fixed it, but annoying that we have to keep expanding this target dropping business. I suspect all of the errors in that lambda function should return true for drop_targets.

src/libaktualizr/primary/aktualizr_test.cc Show resolved Hide resolved
@@ -72,6 +73,10 @@ void ManagedSecondary::rawToMeta() {
}

bool ManagedSecondary::putMetadata(const Uptane::RawMetaPack &meta_pack) {
if (fiu_fail("secondary_putmetadata") != 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's fine in the given context to inject into the "production" code some test code but IMHO, implementation of SecondaryInterface that realizes specific test case and is instantiated within a corresponding test and then just aktualizr->AddSecondary would do more robust and flexible. Otherwise, adding a new test case means adding another if (fiu_fail("") != 0) into the "production" code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a valid point. I've chosen the fiu solution one part because it's simpler and it's also easier to trigger the error against the real backend by scheduling an update and launching aktualizr with fiu.

For now, the code clutter seem minimal enough to me but I understand that's subjective.

Copy link
Collaborator

@mike-sul mike-sul Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of this fiu_fail bit now that it's in VirtualSecondary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I thought I did. Thanks!

@lbonn lbonn force-pushed the fix/OTA-4342/secondary-metadata-error branch from 3fec14f to ee74b4f Compare March 24, 2020 14:52
@codecov-io
Copy link

codecov-io commented Mar 24, 2020

Codecov Report

Merging #1612 into master will decrease coverage by 0.12%.
The diff coverage is 76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1612      +/-   ##
==========================================
- Coverage   82.54%   82.41%   -0.13%     
==========================================
  Files         189      189              
  Lines       11996    12005       +9     
==========================================
- Hits         9902     9894       -8     
- Misses       2094     2111      +17
Impacted Files Coverage Δ
src/virtual_secondary/virtualsecondary.h 100% <ø> (ø) ⬆️
src/libaktualizr/primary/sotauptaneclient.h 100% <ø> (ø) ⬆️
src/virtual_secondary/managedsecondary.cc 86.77% <100%> (+0.11%) ⬆️
src/virtual_secondary/virtualsecondary.cc 100% <100%> (ø) ⬆️
src/libaktualizr/primary/sotauptaneclient.cc 90.56% <71.42%> (+0.05%) ⬆️
src/libaktualizr/storage/sqlstorage_base.h 60% <0%> (-40%) ⬇️
...aktualizr_secondary/aktualizr_secondary_factory.cc 61.9% <0%> (-33.34%) ⬇️
src/libaktualizr/storage/sqlstorage_base.cc 74.49% <0%> (-4.7%) ⬇️
src/libaktualizr/package_manager/ostreemanager.cc 80.51% <0%> (-1.11%) ⬇️
src/aktualizr_info/main.cc 91.48% <0%> (-0.86%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f90e899...bac4123. Read the comment docs.

@lbonn lbonn force-pushed the fix/OTA-4342/secondary-metadata-error branch from ee74b4f to 5a4f8a1 Compare March 24, 2020 15:45
@lbonn
Copy link
Contributor Author

lbonn commented Mar 25, 2020

Added a test on secondary side, using mocks.

Is there still something missing in your opinion?

Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
@lbonn lbonn force-pushed the fix/OTA-4342/secondary-metadata-error branch from bac4123 to 50de0d3 Compare March 26, 2020 09:02
@lbonn lbonn merged commit 740da18 into master Mar 26, 2020
@lbonn lbonn deleted the fix/OTA-4342/secondary-metadata-error branch March 26, 2020 12:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants