Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for detached HUP on startOsUpdate #1443

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

jaomaloy
Copy link

@jaomaloy jaomaloy commented Oct 3, 2024

This will call the v2 actions endpoint for resinhup which runs a detached version of HUP that increases HUP reliability on slow networks but will offer
no status updates such as in_progress.

Resolves: #
HQ:
See:
Depends-on:
Change-type: major|minor|patch


Contributor checklist
  • Includes tests
  • Includes typings
  • Includes updated documentation
  • Includes updated build output

@jaomaloy jaomaloy requested review from Page- and kb2ma October 3, 2024 12:03
src/models/device.ts Outdated Show resolved Hide resolved
src/models/device.ts Outdated Show resolved Hide resolved
@kb2ma
Copy link
Contributor

kb2ma commented Oct 4, 2024

One more thing to add -- OsUpdateActionResult needs a 'triggered' status option. For v2, the status is either 'idle', 'triggered', or possibly 'error'. It will error for example if there is a bad parameter in the request. In that case the HUP is not started on the device. See balena-proxy src/services/actions-backend/app.ts.

We could create an OsUpdateActionResult2 for just v2, but I don't think it's worth the effort. We can just remove the unused OsUpdateActionResult status values when v1 is removed.

src/models/device.ts Outdated Show resolved Hide resolved
src/models/device.ts Outdated Show resolved Hide resolved
@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from 4f63a6d to bd2d9a9 Compare October 7, 2024 09:57
src/util/device-actions/device-actions-service.ts Outdated Show resolved Hide resolved
src/models/device.ts Outdated Show resolved Hide resolved
@Page- Page- requested a review from thgreasi October 7, 2024 16:55
@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from 8f12260 to 5355aa8 Compare October 8, 2024 06:22
@jaomaloy jaomaloy marked this pull request as ready for review October 16, 2024 04:44
@flowzone-app flowzone-app bot enabled auto-merge October 16, 2024 07:55
@jaomaloy jaomaloy requested review from Page- and kb2ma October 16, 2024 08:02
src/models/device.ts Outdated Show resolved Hide resolved
@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from 22ea52b to e1c3e18 Compare October 18, 2024 13:05
src/models/device.ts Outdated Show resolved Hide resolved
@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from 14b4fb9 to b38519e Compare October 18, 2024 15:40
@kb2ma
Copy link
Contributor

kb2ma commented Oct 18, 2024

@jaomaloy I've gone through all the code now, and overall it is more compact and looks good. Please update the API comments in src/models/device.ts to describe that startOsUpdate() is deprecated for runDetached=false. These marks will update the online API docs to show in strikethrough font.

Also add a comment to getOsUpdateStatus() also is deprecated, and will not return a useful status for runDetached=true.

src/models/device.ts Outdated Show resolved Hide resolved
@jaomaloy
Copy link
Author

@jaomaloy I've gone through all the code now, and overall it is more compact and looks good. Please update the API comments in src/models/device.ts to describe that startOsUpdate() is deprecated for for runDetached=false. These marks will update the online API docs to show in strikethrough font.

Also add a comment to getOsUpdateStatus() also is deprecated, and will not return a useful status for runDetached=true.

We will do this at the next step, correct? This PR isn't for deprecation yet but to add the option for true so the rest of our code can use runDetached=true.

@kb2ma
Copy link
Contributor

kb2ma commented Oct 21, 2024

@jaomaloy we want to deprecate now to give users notice of the change and time to adapt.

@kb2ma
Copy link
Contributor

kb2ma commented Oct 22, 2024

Tests look OK, except for what I describe below.

When I test v1, I see these results when starting the update, as expected.

> await balena.models.device.startOsUpdate("43c932f91a2e305a13f6b692faf39230", "2.99.27")
{
  status: 'in_progress',
  lastRun: 1729563480785,
  parameters: { target_version: '2.99.27' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("43c932f91a2e305a13f6b692faf39230")
{
  status: 'in_progress',
  lastRun: 1729563480785,
  parameters: { target_version: '2.99.27' },
  action: 'resinhup'
}

However, when running v2, I see these results when starting the update. I actually would expect to see similar results -- basically whatever is in the action server's ACTION_STATUS dictionary.

> await balena.models.device.startOsUpdate("43c932f91a2e305a13f6b692faf39230", "4.0.26+rev1", {runDetached: true})
{
  status: 'triggered',
  lastRun: 1729569204298,
  parameters: { target_version: '4.0.26+rev1' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("43c932f91a2e305a13f6b692faf39230")
{ status: 'error' }

Looking into the error, I see this in the log for balena-proxy -- generating an error just looking up the device. Of course we don't support getOsUpdateStatus() when running in detached mode. However, it should not generate an error either. Do you get the same results? My concern is that there may be some code or rules in pinejs that are sensitive to the status of the device.

Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.112701] actions-backend[7773]: ::ffff:127.0.0.1 GET actions.devices.d90c5192ae585eaed21f1f48258c954c.kb2ma.balena-dev.com /v1/43c932f91a2e305a13f6b692faf39230/resinhup 400 12 10.968ms curl/7.81.0
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.112886] proxy[7757]: ::ffff:192.168.1.100 GET actions.devices.d90c5192ae585eaed21f1f48258c954c.kb2ma.balena-dev.com /v1/43c932f91a2e305a13f6b692faf39230/resinhup 400 12 15.058ms curl/7.81.0
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113011] actions-backend[7773]: error: Error: Actions Backend error: -  StatusError
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113076] actions-backend[7773]:     at PinejsClientRequest._request (/usr/src/app/node_modules/pinejs-client-request/request.ts:143:10)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113122] actions-backend[7773]:     at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113166] actions-backend[7773]:     at async PinejsClientRequest.callWithRetry (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1173:11)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113205] actions-backend[7773]:     at async PinejsClientRequest.get (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1308:18)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113245] actions-backend[7773]:     at async getDeviceInfo (/usr/src/app/src/common/api-utils.ts:146:17)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113283] actions-backend[7773]:     at async <anonymous> (/usr/src/app/src/services/actions-backend/app.ts:689:23)

@kb2ma
Copy link
Contributor

kb2ma commented Oct 22, 2024

There is another problem. If I run a HUP in non-detached mode, and then run a second HUP in detached mode, the second HUP actually runs in non-detached mode.

1st HUP:

> await balena.models.device.getOsUpdateStatus("447c7aeda6702cac332d48bc65d78797")
{ status: 'idle', action: 'resinhup', lastRun: null }
> await balena.models.device.startOsUpdate("447c7aeda6702cac332d48bc65d78797", "2.99.27", {runDetached: false})
{
  status: 'in_progress',
  lastRun: 1729575207485,
  parameters: { target_version: '2.99.27' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("447c7aeda6702cac332d48bc65d78797")
{
  status: 'in_progress',
  lastRun: 1729575207485,
  parameters: { target_version: '2.99.27' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("447c7aeda6702cac332d48bc65d78797")
{
  status: 'done',
  action: 'resinhup',
  stdout: '[upgrade-2.x.sh][000000000][INFO]Loading info from config.json\n',
  error: 'failed to get digest sha256:cfe5adead644da98672552df619e3758fe4e8773abe02f54b95bf01cd2fd71f4: open /mnt/sysroot/inactive/balena/image/overlay2/imagedb/content/sha256/cfe5adead644da98672552df619e3758fe4e8773abe02f54b95bf01cd2fd71f4: no such file or directory\n' +
    "WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested\n" +
    "WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested\n" +
    "WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested\n" +
    'Running as unit: run-update-supervisor.service\n' +
    'Finished with result: success\n' +
    'Main processes terminated with: code=exited/status=0\n' +
    'Service runtime: 9.969s\n',
  lastRun: 1729575327902,
  parameters: { target_version: '2.99.27' }
}

2nd HUP. Notice status should be 'triggered'.

> await balena.models.device.startOsUpdate("447c7aeda6702cac332d48bc65d78797", "4.0.26", {runDetached: true})
{
  status: 'in_progress',
  lastRun: 1729575680882,
  parameters: { target_version: '4.0.26' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("447c7aeda6702cac332d48bc65d78797")
{
  status: 'in_progress',
  lastRun: 1729575680882,
  parameters: { target_version: '4.0.26' },
  action: 'resinhup'
}

Proxy log shows the second attempt did indeed call the /v1 endpoint:

Oct 22 05:41:20 d90c519 2c27e73cdc26[1631]: [45314.541747] actions-backend[7773]: ::ffff:127.0.0.1 POST actions.devices.d90c5192ae585eaed21f1f48258c954c.kb2ma.balena-dev.com /v1/447c7aeda6702cac332d48bc65d78797/resinhup 200 109 63.712ms node-fetch/1.0 (+https://github.com/bitinn/node-fetch)

@jaomaloy
Copy link
Author

Tests look OK, except for what I describe below.

When I test v1, I see these results when starting the update, as expected.

> await balena.models.device.startOsUpdate("43c932f91a2e305a13f6b692faf39230", "2.99.27")
{
  status: 'in_progress',
  lastRun: 1729563480785,
  parameters: { target_version: '2.99.27' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("43c932f91a2e305a13f6b692faf39230")
{
  status: 'in_progress',
  lastRun: 1729563480785,
  parameters: { target_version: '2.99.27' },
  action: 'resinhup'
}

However, when running v2, I see these results when starting the update. I actually would expect to see similar results -- basically whatever is in the action server's ACTION_STATUS dictionary.

> await balena.models.device.startOsUpdate("43c932f91a2e305a13f6b692faf39230", "4.0.26+rev1", {runDetached: true})
{
  status: 'triggered',
  lastRun: 1729569204298,
  parameters: { target_version: '4.0.26+rev1' },
  action: 'resinhup'
}
> await balena.models.device.getOsUpdateStatus("43c932f91a2e305a13f6b692faf39230")
{ status: 'error' }

Looking into the error, I see this in the log for balena-proxy -- generating an error just looking up the device. Of course we don't support getOsUpdateStatus() when running in detached mode. However, it should not generate an error either. Do you get the same results? My concern is that there may be some code or rules in pinejs that are sensitive to the status of the device.

Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.112701] actions-backend[7773]: ::ffff:127.0.0.1 GET actions.devices.d90c5192ae585eaed21f1f48258c954c.kb2ma.balena-dev.com /v1/43c932f91a2e305a13f6b692faf39230/resinhup 400 12 10.968ms curl/7.81.0
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.112886] proxy[7757]: ::ffff:192.168.1.100 GET actions.devices.d90c5192ae585eaed21f1f48258c954c.kb2ma.balena-dev.com /v1/43c932f91a2e305a13f6b692faf39230/resinhup 400 12 15.058ms curl/7.81.0
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113011] actions-backend[7773]: error: Error: Actions Backend error: -  StatusError
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113076] actions-backend[7773]:     at PinejsClientRequest._request (/usr/src/app/node_modules/pinejs-client-request/request.ts:143:10)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113122] actions-backend[7773]:     at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113166] actions-backend[7773]:     at async PinejsClientRequest.callWithRetry (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1173:11)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113205] actions-backend[7773]:     at async PinejsClientRequest.get (/usr/src/app/node_modules/pinejs-client-core/src/index.ts:1308:18)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113245] actions-backend[7773]:     at async getDeviceInfo (/usr/src/app/src/common/api-utils.ts:146:17)
Oct 22 04:28:24 d90c519 2c27e73cdc26[1631]: [40938.113283] actions-backend[7773]:     at async <anonymous> (/usr/src/app/src/services/actions-backend/app.ts:689:23)

This is likely to be because if we getOsUpdateStatus with a v2 running, the result is forced to 'error'. I could just make this return new error type instead?

		getOsUpdateStatus: async (uuid: string): Promise<OsUpdateActionResult> => {
			try {
				const osUpdateHelper = await getOsUpdateHelper('v1');
				const result = await osUpdateHelper.getOsUpdateStatus?.(uuid);

				if (result === undefined) {
					return { status: 'error' } as OsUpdateActionResult;
				}

				return result;
			} catch (err) {
				if (err.statusCode !== 400) {
					throw err;
				}

				// as an attempt to reduce the requests for this method
				// check whether the device exists only when the request rejects
				// so that it's rejected with the appropriate BalenaDeviceNotFound error
				await exports.get(uuid, { $select: 'id' });
				// if the device exists, then re-throw the original error
				throw err;
			}
		},

@kb2ma
Copy link
Contributor

kb2ma commented Oct 22, 2024

It's good that the SDK code is catching this case, and I don't think we need a separate error. My concern is that the error occurs on line 689 in balena-proxy src/services/actions-backend/app.ts, where the resinhup GET is just looking up device info.

For v2, the resinhup GET should still return either 'triggered' or 'idle'. They are not really useful, but I don't understand why an error would be generated in this case. I'm concerned that backend modules like the UI will use the SDK as well, and also will get this odd error.

Is there something in the pinejs rules or SQL that is failing because the device is in an unexpected state? @Page- , any insights here?

@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from 5271959 to 37992da Compare October 22, 2024 17:45
src/models/device.ts Outdated Show resolved Hide resolved
src/models/device.ts Outdated Show resolved Hide resolved
src/models/device.ts Outdated Show resolved Hide resolved
@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from 00a7865 to 05e8d91 Compare October 24, 2024 13:23
@@ -2262,6 +2270,10 @@ const getDeviceModel = function (
* Unsupported (unpublished) version will result in rejection.
* The version **must** be the exact version number, a "prod" variant and greater than the one running on the device.
* To resolve the semver-compatible range use `balena.model.os.getMaxSatisfyingVersion`.
* @param {Object} [options] - options
* @param {Boolean} [options.runDetached] - run the update in detached mode.
* Default behaviour is runDetached=false but is DEPRECATED and will be removed in a future release. Use runDetached=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment!

@kb2ma
Copy link
Contributor

kb2ma commented Oct 25, 2024

LGTM, please rebase and squash commits. @Page- do you want to take another look?

@jaomaloy jaomaloy force-pushed the jaomaloy/detached-hup branch 2 times, most recently from fef2e74 to 48bdc1d Compare October 25, 2024 13:27
@kb2ma
Copy link
Contributor

kb2ma commented Oct 25, 2024

When I do a local build, the DOCUMENTATION.md file is updated for the deprecations. I expect this update should be included with the PR. Also, please confirm that the documentation for getOsUpdateStatus() is accurate. I'm not sure it works to put the @summary and @deprecated on a single line.

@kb2ma kb2ma self-requested a review October 25, 2024 14:07
Copy link
Contributor

@kb2ma kb2ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to resolve comment above on docs.

@jaomaloy
Copy link
Author

When I do a local build, the DOCUMENTATION.md file is updated for the deprecations. I expect this update should be included with the PR. Also, please confirm that the documentation for getOsUpdateStatus() is accurate. I'm not sure it works to put the @summary and @deprecated on a single line.

yep, @deprecated needed a new line.

This will call the v2 actions endpoint for resinhup
which runs a detached version of HUP that increases
HUP reliability on slow networks but will offer
no status updates such as in_progress.

Change-type: minor
Copy link
Contributor

@kb2ma kb2ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation looks good now.

@flowzone-app flowzone-app bot merged commit cb0cd3c into master Oct 25, 2024
53 checks passed
@flowzone-app flowzone-app bot deleted the jaomaloy/detached-hup branch October 25, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants