Upgrade keycloak to version 24 #3796

rocketeerbkw · 2024-08-22T22:02:47Z

General Checklist

Affected Issues have been mentioned in the Closing issues section
Documentation has been written/updated
PR title is ready for inclusion in changelog

Database Migrations

If your PR contains a database migation, it MUST be the latest in date order alphabetically

We've been on Keycloak 21 for awhile, this PR upgrades to 24. A number of API changes have been made, so there is a refactor to account for those. I also was able to get rid of our keycloak-admin-client fork.

There are no migrations needed other than the normal keycloak upgrade process.

shreddedbacon · 2024-08-23T02:21:05Z

The remoteshell test fails because the sshportal calls to keycloak to get the group information fails.

{"time":"2024-08-23T02:14:26.766545934Z","level":"ERROR","msg":"couldn't generate group name to project IDs map","query":{"sshFingerprint":"SHA256:kDhWiCrizJSFOAPdDcpVIOV3W9f2VlwtjUrDUudJgTg","namespaceName":"ci-features-control-k8s-remoteshell","projectID":181,"environmentID":37,"sessionID":"6e63bce76a21589a6942e50c58df156ce75ea69a664e2a7729013d303996c2bd"},"error":"couldn't get group ID for group: ci-group"}

Since sshportal handles all of the group/access/permission logic independently of the API, it will likely need to have updates done to it too to support the sparsegroup queries against keycloak, or it could just use the API

smlx · 2024-08-23T14:36:21Z

The problem turned out to be nothing really to do with ssh-portal. It's just an incompatibility with Keycloak permissions in newer versions. Details and backward-compatible fix here: #3797

I got the CI run to pass by logging in to Keycloak in CI and adding the view-users permission manually. So once this is rebased on #3797 it should go green.

shreddedbacon

I've checked over this and had a play with it locally both in k3d/local-stack and docker compose. I wasn't able to detect anything wrong. The changes seem to be pretty straightforward.

I do have a question around the choice to not use redis for the sparse queries, given past experiences with keycloak performance issues. Maybe it will be fine now with sparse groups though. I guess we can find out in test after merging.

The only issue I found was during local testing in docker compose. The API hangs waiting for keycloak here. Restarting the API container locally manages to allow it to succeed though. But by the time restarting the API takes place, the seeding process has already failed, so needs to be re-run.

It's fine in kubernetes because the liveness probe eventually fails on the API pod and causes it to restart. This pod restart does the same thing as restarting the pod locally.

I tried adding a jank timeout like this which gives up on that first attempt after a few seconds which is enough to then ensure the next or subsequent attempts succeed. This works, but I'm sure there is probably a better way to do this.

services/api/src/models/group.ts

rocketeerbkw · 2024-09-04T11:48:45Z

The only issue I found was during local testing in docker compose. The API hangs waiting for keycloak here. Restarting the API container locally manages to allow it to succeed though. But by the time restarting the API takes place, the seeding process has already failed, so needs to be re-run.

This is a weird one. I traced the error down to a fetch call that appears to crash Node.js. Doesn't matter if it's async or promise, not able to be caught by try/catch. For me it appears to be related to the 172.17.0.1 ip, if you remove the KEYCLOAK_URL env vars in docker-compose.yml does that fix it for you? If so, then it's probably a local specific issue and not something that would break in a real deployment?

shreddedbacon · 2024-09-04T21:17:34Z

The only issue I found was during local testing in docker compose. The API hangs waiting for keycloak here. Restarting the API container locally manages to allow it to succeed though. But by the time restarting the API takes place, the seeding process has already failed, so needs to be re-run.

This is a weird one. I traced the error down to a fetch call that appears to crash Node.js. Doesn't matter if it's async or promise, not able to be caught by try/catch. For me it appears to be related to the 172.17.0.1 ip, if you remove the KEYCLOAK_URL env vars in docker-compose.yml does that fix it for you? If so, then it's probably a local specific issue and not something that would break in a real deployment?

I'll have another look real quick. But I don't understand why it fails with the IP on the first attempt, but then restarting the pod succeeds. To me it seems like a hang like this with no timeout? is a bad thing to have.

The jank timeout wrapper I tried didn't require any manual intervention or changes to the docker-compose file. So I'm keen to understand the fetch crash a bit more and why the crash isn't captured by the try in that waitforkeycloak code?

shreddedbacon · 2024-09-04T21:38:27Z

Yeah, removing KEYCLOAK_URL in the api service does indeed work. So I guess lets just do that for now.

But it would still be worth figuring out if this is just a weird local edge case, or could it happen in a running pod? It looks like port 3000 is inaccessible when the condition takes place, so in kubernetes it will just go into crash loop. But if it continues to encounter that bug then the API could be completely unusable.

Maybe I'm overreacting though 🙂

shreddedbacon

Good for now, we can evaluate performance issues in the test instance as required.

rocketeerbkw force-pushed the keycloak-upgrade branch from 3d9953a to 66d592a Compare August 28, 2024 21:34

rocketeerbkw marked this pull request as ready for review August 28, 2024 21:50

shreddedbacon requested changes Sep 3, 2024

View reviewed changes

services/api/src/models/group.ts Outdated Show resolved Hide resolved

services/api/src/models/group.ts Outdated Show resolved Hide resolved

services/api/src/models/group.ts Show resolved Hide resolved

shreddedbacon added this to the 2.21.0 milestone Sep 3, 2024

rocketeerbkw added 6 commits September 4, 2024 07:13

fix: Missed changes from keycloak 21 upgrade

9d5f177

Upgrade to keycloak 22.0.11

3b5f0dc

Upgrade to keycloak 23.0.7

4282a74

Use latest keycloak-admin-client

5e17b93

Refactor Group model to account for keycloak API changes

c76e5ce

Upgrade to keycloak 24.0.5

d7a076f

rocketeerbkw force-pushed the keycloak-upgrade branch from 66d592a to d7a076f Compare September 4, 2024 12:14

shreddedbacon approved these changes Sep 4, 2024

View reviewed changes

tobybellwood merged commit 43b005e into main Sep 4, 2024
1 check passed

tobybellwood deleted the keycloak-upgrade branch September 4, 2024 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade keycloak to version 24 #3796

Upgrade keycloak to version 24 #3796

rocketeerbkw commented Aug 22, 2024 •

edited

Loading

shreddedbacon commented Aug 23, 2024

smlx commented Aug 23, 2024

shreddedbacon left a comment

rocketeerbkw commented Sep 4, 2024

shreddedbacon commented Sep 4, 2024 •

edited

Loading

shreddedbacon commented Sep 4, 2024

shreddedbacon left a comment

Upgrade keycloak to version 24 #3796

Upgrade keycloak to version 24 #3796

Conversation

rocketeerbkw commented Aug 22, 2024 • edited Loading

General Checklist

Database Migrations

shreddedbacon commented Aug 23, 2024

smlx commented Aug 23, 2024

shreddedbacon left a comment

Choose a reason for hiding this comment

rocketeerbkw commented Sep 4, 2024

shreddedbacon commented Sep 4, 2024 • edited Loading

shreddedbacon commented Sep 4, 2024

shreddedbacon left a comment

Choose a reason for hiding this comment

rocketeerbkw commented Aug 22, 2024 •

edited

Loading

shreddedbacon commented Sep 4, 2024 •

edited

Loading