WIP: fix empty version when loading upgrade action from disk #4184

AndersonQ · 2024-02-02T13:50:59Z

What does this PR do?

Why is it important?

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes Scheduled upgrade fails with error parsing version "" if agent restarts before the upgrade starts #3912

Use cases

Screenshots

Logs

Questions to ask yourself

How are we going to support this in production?
How are we going to measure its adoption?
How are we going to debug this?
What are the metrics I should take care of?
...

AndersonQ · 2024-02-02T14:54:04Z

TL;DR: YAML isn't JSON. Ok, let me explain that:

When using JSON it's possible to have a []byte field and when unmarshaling it have the byte representation of the JSON string and it can be parsed later. That's the strategy used to marshal/unmarshal a
[].fleet.Action, an interface, which actually parses the data into a []fleet.FleetAction then into the specific Action.

It works for the fleet defined JSON action, which has a opaque data field with a different schema for each cation type. In Go, it's a json.RawMessage, a.k.a []byte and it works just fine. However the same isn't true for YAML, it's not even possible to have a raw field (see this), and we use yaml.v2. Besides the concrete Action structs we use do not have a data field, they're flat. What makes their marshaled version different from what they were unmarshaled from.

That's why we're loosing data like the version for the upgrade actions, the original data inside the data field is marshaled to disk in a different place than the one expected by unmarshal. Again, as JSON isn't YAML, even if the specific fields for each action were saved to disk under a data field, the code would break anyway because the data field would fail to unmarshal to a []byte/json.RawMessage.

Some possible solutions:

Refactor Actions to Reflect JSON Schema:
So far I believe the best solution would be to refactor all the actions to reflect the JSON schema by having a nested Data struct with the action's specific fields. This should make the marshal/unmarshal trivial. However this means changing a substantial amount of code and therefore the potential for things to break.
Do not use YAML for actions:
We could also just get rid of the yaml for actions. I confess right now I wonder if we really need yaml for them. Using only JSON could be a easy solution. And of course adding some good old panic when marshalling/unmarshalling the actions so we'd be sure no one uses them with YAML.
Explore if yaml.v3 can replicate the JSON raw message behaviour:
We could try to use yaml.v3 to see if it'd be possible to replicate the JSON behaviour. However, I haven't explored all that could be done if we'd to use yaml.v3 because I'm concerned if we don't change everywhere we might end up with some package using the v2 and messing things when marshalling/unmarshalling the actions.
Have specific structs used only to marshal/unmarshal actions:
Another option is to have specific structs to represent the YAML version of the actions, those structs would only be used for marshalling/unmarshalling what would hide the complexity of making the actions work in JSON form as well as in YAML form.

Regardless of the solution: we also need some good set of tests to ensure this mix and match of types that an action can be do not loose data when persisting them to disk and reading them back. Mainly if we add new fields or action types. It's exactly what our current tests are missing, to assert the specific fields for each action type.

Right now, for me, I'd like to avoid future problems, what requires a solution allowing the model of the actions be altered or new actions be created without requiring to change any marshal/unmarshal code. What makes me side with 1) and accept the burden of the refactor. However I haven't accessed how far all the actions spread in the agent's code, so I cannot guess the amount of work it'd require.

cmacknz · 2024-02-02T16:25:45Z

Consistently using JSON everywhere seems like the correct approach to eliminate compatibility errors. I am unsure why we use YAML to store actions on disk. Perhaps @michel-laterman knows or can give us guidance on the best approach since he was involved in the original implementation.

The one complication to changing the format is that actions are persisted on disk, which means if we change the format we have to treat this as a data migration.

I don't think there are a lot of actions persisted in general, in fact I'm not sure the most important one in upgrades even gets persisted. That said we should avoid throwing away actions that otherwise would have been processed. I can think of a couple of ways to deal with this, but am open to suggestion on which approach we should take.

michel-laterman · 2024-02-05T15:38:40Z

@cmacknz I think yaml was used to be consistent with the format of the other files the elastic-agent wrote to disk.

…-state-empty-version

It's needed so we can set the encrypted store vault's path

…y-version

…lastic-agent into 3912-fix-state-empty-version

elastic-sonarqube · 2024-02-12T14:08:35Z

Quality Gate passed

The SonarQube Quality Gate passed, but some issues were introduced.

4 New issues
0 Security Hotspots
47.0% 47.0% Coverage on New Code
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

mergify · 2024-02-13T16:13:43Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 3912-fix-state-empty-version upstream/3912-fix-state-empty-version
git merge upstream/bugfix/3912-borken-stat-store
git push upstream 3912-fix-state-empty-version

AndersonQ · 2024-04-03T07:20:26Z

closed in favour of #4441

AndersonQ added Team:Elastic-Agent Label for the Agent team backport-v8.12.0 Automated backport with mergify labels Feb 2, 2024

AndersonQ self-assigned this Feb 2, 2024

AndersonQ added 8 commits February 8, 2024 15:05

WIP: marshaling a []byte for the data property/field

884c800

WIP: json works

2e4ae0f

wip

d4d7466

wip

5f92fd9

update action models on fleetapi

24e7850

action store tests pass. But there is no migration

364a14d

add list of files using fleetapi.Action

29ce313

use new fleetapi models

1313ded

AndersonQ force-pushed the 3912-fix-state-empty-version branch from 3dd8f7b to 1313ded Compare February 8, 2024 14:05

AndersonQ added 2 commits February 9, 2024 07:51

more fixes

c274e24

use golden file for store migration test

0e62b48

AndersonQ changed the base branch from main to bugfix/3912-borken-stat-store February 9, 2024 16:37

AndersonQ added 11 commits February 9, 2024 17:38

Merge branch 'main' of github.com:elastic/elastic-agent into 3912-fix…

98b2f86

…-state-empty-version

add warning

77d01f5

add TODO

64039db

skip on darwin

21f8998

improve skip message

aa94d7a

remove withFile function

5e34095

fix skip message

eab1465

migrate take in a storage.Store instead of the storage's path

67f2de4

It's needed so we can set the encrypted store vault's path

adjust the tests for new migrate signature

1d316d5

fix windows tests

7f77eb0

Merge branch 'bugfix/3912-borken-stat-store' into 3912-fix-state-empt…

a688dcf

…y-version

AndersonQ changed the title ~~WIP: fix empty version when loading upgrade action from disk~~ Makes internal action schema to match Fleet Server schema Feb 12, 2024

AndersonQ added 2 commits February 12, 2024 14:36

fix after cherrypick

571f228

Merge branch '3912-fix-state-empty-version' of github.com:AndersonQ/e…

29a999f

…lastic-agent into 3912-fix-state-empty-version

AndersonQ changed the title ~~Makes internal action schema to match Fleet Server schema~~ WIP: fix empty version when loading upgrade action from disk Feb 12, 2024

mergify bot mentioned this pull request Feb 12, 2024

Match fleet server action model and use on JSON for serialisation #4240

Merged

3 tasks

AndersonQ closed this Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: fix empty version when loading upgrade action from disk #4184

WIP: fix empty version when loading upgrade action from disk #4184

AndersonQ commented Feb 2, 2024 •

edited

Loading

AndersonQ commented Feb 2, 2024

cmacknz commented Feb 2, 2024 •

edited

Loading

michel-laterman commented Feb 5, 2024

elastic-sonarqube bot commented Feb 12, 2024

mergify bot commented Feb 13, 2024

AndersonQ commented Apr 3, 2024

WIP: fix empty version when loading upgrade action from disk #4184

WIP: fix empty version when loading upgrade action from disk #4184

Conversation

AndersonQ commented Feb 2, 2024 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

AndersonQ commented Feb 2, 2024

cmacknz commented Feb 2, 2024 • edited Loading

michel-laterman commented Feb 5, 2024

elastic-sonarqube bot commented Feb 12, 2024

Quality Gate passed

mergify bot commented Feb 13, 2024

AndersonQ commented Apr 3, 2024

AndersonQ commented Feb 2, 2024 •

edited

Loading

cmacknz commented Feb 2, 2024 •

edited

Loading