Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[drbd] 9.2.2-v1.4.1 causes immediate crash/restart on replicated volumes #155

Closed
themicknugget opened this issue May 7, 2023 · 15 comments · Fixed by siderolabs/pkgs#743
Closed

Comments

@themicknugget
Copy link

themicknugget commented May 7, 2023

I am testing out piraeus-datastore using talos, and I was able to get it working wonderfully on v1.3.7 but after upgrading to v1.4.1 (and using drbd extension 9.2.2-v1.4.1), upon provisioning a PVC with more then one replica (necessitating communication between nodes) the "primary" node for the provisioned volume restarts without kernel panic/console message.

I have attached a "talosctl dmesg" log in hopes that it will help troubleshoot.

Thank you!
elite6.log

@themicknugget themicknugget changed the title [drbd] 9.2.2-v1.4.1 causes immediate crash/restart [drbd] 9.2.2-v1.4.1 causes immediate crash/restart on replicated volumes May 7, 2023
@japtain-cack
Copy link

I'm experiencing the same issue. I downgraded my cluster back to 1.3.7 and it seems to be working again.

@frezbo
Copy link
Member

frezbo commented May 8, 2023

The dmesg logs doesn't show anything useful, probably it's the kernel module. There's a version 9.2.3 available now, worth checking the changelog (seems a lots of fixes), would someone be able to manually built this and test?

@themicknugget
Copy link
Author

I attempted compiling an extension with 9.2.3, but I wasn't able to get it to work due to not being familiar with the talos compilation process. My drbd kernel module was rejected due to the key not matching, I believe. If someone could build one I'd love to test it and report back.

@themicknugget
Copy link
Author

Not that I expected anything different since I didn't see a drbd version bump, but I just wanted to mention that I'm still experiencing this issue with Talos 1.4.2 (and drbd 9.2.2-v1.4.2 extension)

@themicknugget
Copy link
Author

@frezbo could you please outline the process for compiling an extension for DRBD 9.2.3 so that I can test? Thank you!

@frezbo
Copy link
Member

frezbo commented May 15, 2023

@themicknugget could you try with this installer and extensions?

  • ghcr.io/frezbo/installer:v1.4.0-alpha.4-61-gcc3128d94-dirty
  • ghcr.io/frezbo/drbd:9.2.3-v1.4.0-alpha.4-2-g0855dd7-dirty

@themicknugget
Copy link
Author

@frezbo Thank you so much for building that for me! I attempted, and the installer was fetchable but I got this for the extension:

failed to resolve reference "ghcr.io/frezbo/drbd:9.2.3-v1.4.0-alpha.4-2-g0855dd7-dirty": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized

It seems it's still marked private?

@frezbo
Copy link
Member

frezbo commented May 15, 2023

should be fixed now

@themicknugget
Copy link
Author

Thank you! Unfortunately, after testing it appears to crash at the same time as 9.2.2. Did you enable additional debugging in the kernel that I can collect for you?

@frezbo
Copy link
Member

frezbo commented May 15, 2023

Did you enable additional debugging in the kernel that I can collect for you?

this is just the standard build, i guess it's better to create an issue with drbd since it seems to have issues with linux 6.1

@themicknugget
Copy link
Author

@frezbo FYI an issue was created and a fixed release of drbd is coming: LINBIT/drbd#57 (comment)

@Jonomir
Copy link

Jonomir commented Jun 6, 2023

Yesterday DRBD 9.2.4 was released containing this fix for the issue.

If we could get the extension updated to 9.2.4-v1.4.5, this would be fantastic :)
Thank you!

@frezbo
Copy link
Member

frezbo commented Jun 6, 2023

Yesterday DRBD 9.2.4 was released containing this fix for the issue.

If we could get the extension updated to 9.2.4-v1.4.5, this would be fantastic :) Thank you!

Awesome, will get it updated for the 1.5 release as part of our normal deps update.

frezbo added a commit to frezbo/pkgs that referenced this issue Jun 20, 2023
Bump drbd to a non-broken version.

Fixes: siderolabs/extensions#155

Signed-off-by: Noel Georgi <git@frezbo.dev>
(cherry picked from commit f7cd916)
@frezbo
Copy link
Member

frezbo commented Jun 21, 2023

The next release of Talos (both 1.5 and 1.4) should have the fixed drbd version of 9.2.4

@Jonomir
Copy link

Jonomir commented Jun 21, 2023

Thanks a lot :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants