-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade keycloak to version 24 #3796
Conversation
The
Since sshportal handles all of the group/access/permission logic independently of the API, it will likely need to have updates done to it too to support the sparsegroup queries against keycloak, or it could just use the API |
The problem turned out to be nothing really to do with I got the CI run to pass by logging in to Keycloak in CI and adding the |
3d9953a
to
66d592a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked over this and had a play with it locally both in k3d/local-stack and docker compose. I wasn't able to detect anything wrong. The changes seem to be pretty straightforward.
I do have a question around the choice to not use redis for the sparse queries, given past experiences with keycloak performance issues. Maybe it will be fine now with sparse groups though. I guess we can find out in test after merging.
The only issue I found was during local testing in docker compose. The API hangs waiting for keycloak here. Restarting the API container locally manages to allow it to succeed though. But by the time restarting the API takes place, the seeding process has already failed, so needs to be re-run.
It's fine in kubernetes because the liveness probe eventually fails on the API pod and causes it to restart. This pod restart does the same thing as restarting the pod locally.
I tried adding a jank timeout like this which gives up on that first attempt after a few seconds which is enough to then ensure the next or subsequent attempts succeed. This works, but I'm sure there is probably a better way to do this.
This is a weird one. I traced the error down to a |
66d592a
to
d7a076f
Compare
I'll have another look real quick. But I don't understand why it fails with the IP on the first attempt, but then restarting the pod succeeds. To me it seems like a hang like this with no timeout? is a bad thing to have. The jank timeout wrapper I tried didn't require any manual intervention or changes to the docker-compose file. So I'm keen to understand the fetch crash a bit more and why the crash isn't captured by the try in that waitforkeycloak code? |
Yeah, removing But it would still be worth figuring out if this is just a weird local edge case, or could it happen in a running pod? It looks like port 3000 is inaccessible when the condition takes place, so in kubernetes it will just go into crash loop. But if it continues to encounter that bug then the API could be completely unusable. Maybe I'm overreacting though 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for now, we can evaluate performance issues in the test instance as required.
General Checklist
Database Migrations
We've been on Keycloak 21 for awhile, this PR upgrades to 24. A number of API changes have been made, so there is a refactor to account for those. I also was able to get rid of our keycloak-admin-client fork.
There are no migrations needed other than the normal keycloak upgrade process.