-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guac linuxvm status: "deployment failed" on install in workspace after upgrading bundle w/ make user_resource_bundle locally #3823
Comments
I completely removed the existing TRE instance and did a 'make all' to start a fresh. It got as far as deploying the shared firewall bundle and then gave the same error as before:
I am running make all from the devcontainer and my dev machine has a M2 chip. I can build and publish bundles ok. But unable to deploy them. |
Hi @m1p1h , I'll try and validate the first issue, user VM upgrades aren't something that happen very often as is a risk the VM will be replaced. What's the upgrade scenario? For both issues, is there anything in the API logs? ( https://microsoft.github.io/AzureTRE/latest/troubleshooting-faq/app-insights-logs/ ) Also what release/branch are you deploying from? Thanks. |
Hi @marrobi thanks for looking into this. I'm part of the nwsde dev team and I need a way to test changes locally within a running TRE instance (ideally without having to rebuild the whole thing every time). So in the initial case, I'm looking to update the windowsvm and linuxvm user resources. But I think there's a bigger issue in that, I'm finding any CNAB I build and deploy from my local machine I experience the error above. A colleague running the same make with the same code doesn't experience this issue where the resource_processor appears to be unable to see the bundles in the ACR even though I can see via the portal that they exist and are registered correctly in cosmos (by checking via the API). My dev machine is a Mac Air (M2 chip) running Sonoma (14.2.1) with Docker Desktop (4.26.1, Engine: 24.0.7, using Rosetta emulation). I'm running AzureTRE release 0.16.0. The logs don't give much else (see attached). |
Ok, so the "Unable to find image ... locally" is standard, Docker always shows this if an image does not exist locally, then pulls the image from the Docker Registry. @jjgriff93 @martinpeck any comments on the Mac setup? (I run Windows and WSL so can't comment). A seperate tip though, if iterating locally, it's often possible to just deploy the terraform (as long as no VNet access is required), which means can iterate faster. You need to ensure is a deploy.sh script, and .env file is correct, but can use |
@m1p1h those logs are all resource processor logs, not API logs, can you check with AppRoleName set to API when searching the logs. Thanks. |
Here's the logs for api and resource_processor... |
Doesn't give much away. can you set debug on the API and try again? https://microsoft.github.io/AzureTRE/latest/troubleshooting-faq/debug-api/ The other thing would be to jump onto the resource processor and watch the logs as the bundle is installed, see "Logs" at https://microsoft.github.io/AzureTRE/latest/troubleshooting-faq/troubleshooting-rp/ |
Hello - I recall having an issue with the resource processor not picking up messages when it was bundled and deployed from my machine (M1) but never got to the bottom of it - I generally used Codespaces (Linux) for working on the TRE. However, when I was experimenting with this I remember getting further when using QEMU and getting docker to build Linux/amd64 images so you could give this a try. You can do this by modifying the FROM debian:bullseye-slim to FROM --platform=linux/amd64 debian:bullseye-slim |
Hmm, I think that is covered in ARCHITECTURE=$(uname -m)
if [ "${ARCHITECTURE}" == "arm64" ]; then
DOCKER_BUILD_COMMAND="docker buildx build --platform linux/amd64"
else
DOCKER_BUILD_COMMAND="docker build"
fi @m1p1h what does |
Returns arm64. I can also see that docker buildx is being used. |
Although looking at the built images it does look like for some bundles the arch is still being built as arm64... Resource processor does get built for amd64 arch but for tre-shared-service-firewall (where the error happens) its being built as arm64
|
In cli/scripts/build.sh I dont see this. Am I looking in the wrong place? |
I searched a completely different repo for a different project! Maybe that's the solution we need... in |
@marrobi will do. I did quickly try with code above in bundle_runtime_image_build.sh but it still built some bundle images as arm64 archs resutling in the same error. But might be because in the devcontainer 'uname -m' returns x86_64. |
Hmm, interesting. Wonder if a way to find the Docker architecture. |
I think we can use docker info --format '{{ .Architecture }}' which will give aarch64 for docker instances running on arm64. |
Just to confirm the change to the docker build would work i hardcoded the build command in devops/scripts/bundle_runtime_image_build.sh to: docker buildx build --platform linux/amd64 --build-arg BUILDKIT_INLINE_CACHE=1 \
-t "${FULL_IMAGE_NAME_PREFIX}/${image_name}:${version}" \
"${docker_cache[@]}" -f "${docker_file}" "${docker_context}" But I still see the same issue where some bundles (tre-shared-service-firewall) are still built with an arm64 arch. This would suggest some bundle images are being built elsewhere? I notice in the Makefile there is a 'build_image' function defined but looks like that is used to build the api, resource processor and airlock processors only.
|
Thinking about it, the Ah, actually its |
Looks like answer is need to add to each porter bundle's Dockerfile - getporter/porter#2021 (comment) Can you try this with |
Actually, this is caused by us having custom |
@marrobi, could you give me access to create a branch? |
@m1p1h you need to create a fork. Then a PR back if all is good. Thanks. |
I'm trying to test an upgrading to the linuxvm user resource by running the following make locally:
make user_resource_bundle BUNDLE=guacamole-azure-linuxvm WORKSPACE_SERVICE=guacamole
The make script completes successfully. However, when I try to deploy a linuxVM in a workspace with the upgraded version I get a deployment_failed error:
: 4bd2d574-628c-4bff-8372-e26cc4dc876c: Error message: Unable to find image '***.azurecr.io/tre-service-guacamole-linuxvm@sha256:2d7aa9e5c8318941f02dd57e7975e29a502c9ae7242a9f24fee30260165646b8' locally exec /cnab/app/run: exec format error 2 errors occurred: * container exit code: 1, message: <nil>. fetching outputs failed: error copying outputs from container: Error response from daemon: Could not find the file /cnab/app/outputs in container a5c93eddd67a154087f7f979a6bc68ec31fbcc6d4222e7b91c29290bac7597e6 * required output hostname is missing and has no default ; Command executed: az cloud set --name AzureCloud && az login --identity -u bba7efea-eed7-4319-8695-24c61d9dc0c4 && az acr login --name ***acr && porter install "4bd2d574-628c-4bff-8372-e26cc4dc876c" --reference ***acr.azurecr.io/tre-service-guacamole-linuxvm:v0.8.0 --param arm_environment="public" --param arm_use_msi="true" --param azure_environment="AzureCloud" --param id="4bd2d574-628c-4bff-8372-e26cc4dc876c" --param os_image="Ubuntu 18.04" --param parent_service_id="ba82e7c8-f5ce-4abb-994c-86bfaeb501cf" --param shared_storage_access="True" --param tfstate_container_name="tfstate" --param tfstate_resource_group_name="rg-***-mgmt" --param tfstate_storage_account_name="***mgmtstore" --param tre_id="***" --param vm_size="2 CPU | 8GB RAM" --param workspace_id="2954681e-b9fd-4551-b15b-ae1cbc4ca9d2" --force --credential-set arm_auth --credential-set aad_auth
For this test, I don't actually change the linuxvm code apart from the version number in the porter.yaml file. The porter bundle / image are definitely in the acr, in this case v0.8.0. The odd thing is a colleague can run the same make locally and deploy the linuxvm CNAB with a later version number to the previous one i deployed into the same AzureTRE instance and everything works in the sense that we can successfully deploy a linuxvm from that upgraded user resource.
I have the same permissions as they do. The only difference we can see is that I'm running the azureTRE devcontainer on a mac and they're on windows which shouldnt impact on anything.
Any suggestions on what else we might want to consider?
The text was updated successfully, but these errors were encountered: