-
Notifications
You must be signed in to change notification settings - Fork 60
Conversation
rotateSecondaryRoot(Uptane::RepositoryType::Image(), *(sec->second)); | ||
if (!sec->second->putMetadata(meta)) { | ||
LOG_ERROR << "Sending metadata to " << sec->first << " failed"; | ||
put_meta_succeed = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we abort if any installation fails? Or is that a separate problem to solve since we don't handle atomicity well at present and that's a bigger topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. After further thinking, it might be worse than ignoring the check as it allows an hostile secondary to block a primary (and other secondaries) update, even if the metadata checks out from the PoV of the primary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point. I would think that would be okay, but we need to properly decide what to do in these cases and this is decidedly out of the scope of this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that even if we don't interrupt on the first putMetadata error anyway sendFirmware/install won't be called even for those Secondaries for which putMetdata was successful.
I agree with Patrick that it has to be discussed with a wider audience including product owners/managers, personally, I think aktualizr should carry on "metadata putting" and installation/update process if there is at least one "successful" case and then retry if there were non-verification/validation errors and then report a result to the backend...
if (fiu_fail("secondary_putmetadata") != 0) { | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this go in VirtualSecondary instead of ManagedSecondary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though it needs an overriding of the method there but no big deal.
if (!sendMetadataToEcus(updates)) { | ||
result.dev_report = {false, data::ResultCode::Numeric::kInternalError, "Metadata verification failed"}; | ||
return std::make_tuple(result, "Secondary metadata verification failed", false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming: if this fails, does the backend recognize it and abort the update, or does it stay pending and cause aktualizr to retry it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually a good question. It's out of the scope of the given PR nevertheless I think it makes sense to return something more meaningful to putMetadata
, not just bool like we do for install
. So, the follow-up behavior can depend on an error type, e.g. if it's some network-related error then it makes sense to retry if Uptane verification of metadata actually fails then there is no point to retry.
I suppose proper testing of the actualize behavior after putMetadata failure requires adding test(s) to the python-based tests and OTF tests in order to see how it behaves if the uptane cycle is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickvacek I just had a tried and the installation shows as failed on the UI, with nothing pending.
However, if I re-run aktualizr check
after the failure, it still shows 1 new update to install, which it will do if aktualizr once
is called without the injected error.
So it's not ideal but it sounds like we can't really fix that on the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this actually might require a client-side fix. I'm wondering if we need to drop the Director Targets metadata in this case (and all the similar returns for the metadata checks and such above this in this function). I want to avoid more of these cases where we get stuck in a loop, and as long as the empty targets optimization is in place, we have to work around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right, it doesn't persist if we drop the metadata. That's what I pushed in the last version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to know that fixed it, but annoying that we have to keep expanding this target dropping business. I suspect all of the errors in that lambda function should return true for drop_targets
.
@@ -72,6 +73,10 @@ void ManagedSecondary::rawToMeta() { | |||
} | |||
|
|||
bool ManagedSecondary::putMetadata(const Uptane::RawMetaPack &meta_pack) { | |||
if (fiu_fail("secondary_putmetadata") != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's fine in the given context to inject into the "production" code some test code but IMHO, implementation of SecondaryInterface that realizes specific test case and is instantiated within a corresponding test and then just aktualizr->AddSecondary would do more robust and flexible. Otherwise, adding a new test case means adding another if (fiu_fail("") != 0)
into the "production" code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a valid point. I've chosen the fiu solution one part because it's simpler and it's also easier to trigger the error against the real backend by scheduling an update and launching aktualizr with fiu.
For now, the code clutter seem minimal enough to me but I understand that's subjective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get rid of this fiu_fail
bit now that it's in VirtualSecondary
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I thought I did. Thanks!
3fec14f
to
ee74b4f
Compare
Codecov Report
@@ Coverage Diff @@
## master #1612 +/- ##
==========================================
- Coverage 82.54% 82.41% -0.13%
==========================================
Files 189 189
Lines 11996 12005 +9
==========================================
- Hits 9902 9894 -8
- Misses 2094 2111 +17
Continue to review full report at Codecov.
|
ee74b4f
to
5a4f8a1
Compare
Added a test on secondary side, using mocks. Is there still something missing in your opinion? |
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
Signed-off-by: Laurent Bonnans <laurent.bonnans@here.com>
bac4123
to
50de0d3
Compare
The test is done with a simple fiu_fail