-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NudmSDM url after udm crash #155
Fix NudmSDM url after udm crash #155
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
test this please |
Hi @ghislainbourgeois, I am not able to reproduce the issue you are indicating here. I am deploying the sd-core using AiaB. After deleting the UDM pod, the new test/simulation successfully completes. Any suggestion on how to reproduce the issue? or am I missing something? |
9f0c5bc
to
95f22e6
Compare
Hi @gab-arrobo, I tried deploying AiaB to test directly with it, but the UPF does not deploy properly, with an error on the
The first try is taking a long time, because the AMF tries to contact the old UDM IP, and needs to timeout. At that point the cache on the AMF will be cleared and the next attempts work fine. This PR makes it so that the timeout will not need to happen. I also just rebased this PR to be up-to-date with the |
@ghislainbourgeois, FYI, a temporary solution to deploy AiaB is by making the following changes in the sd-core-5g-values.yaml file: diff --git a/sd-core-5g-values.yaml b/sd-core-5g-values.yaml
index d4e145c..50e1312 100644
--- a/sd-core-5g-values.yaml
+++ b/sd-core-5g-values.yaml
@@ -234,9 +234,10 @@ omec-user-plane:
images:
repository: "registry.opennetworking.org/docker.io/"
# uncomment below section to add update bess image tag
- #tags:
+ tags:
# bess: <bess image tag>
# pfcpiface: <pfcp image tag>
+ tools: busybox:stable
config:
upf:
name: "oaisim" |
ok to test |
I am going to give it a try using OnRamp |
Thanks, I tested with AiaB, and I cannot reproduce this issue with it. I have not figured out what is different in that deployment however. |
I think the issue might be related to the image tag used for the AMF (the Helm Charts used by AiaB use this image: |
It is using the same tag: |
How should we proceed to reproduce the issue? It would be good to understand why the issue shows up in OnRamp but not in AiaB |
I am working on this today, trying to understand the difference there. |
I cannot say for sure, but it looks like the biggest difference is that the onramp quick start guide does not deploy Aether, but only SD-Core. In our distribution, we also do not currently deploy Aether. I am not sure why Aether would prevent this bug from happening, but I would argue that SD-Core should not depend on additional software to do the right thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -1173,18 +1173,16 @@ func communicateWithUDM(ue *context.AmfUe, accessType models.AccessType) error { | |||
|
|||
func getSubscribedNssai(ue *context.AmfUe) { | |||
amfSelf := context.AMF_Self() | |||
if ue.NudmSDMUri == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that you are removing this variable from here, would it be possible that you also remove from the other places? because I think it is actually used anywhere in the code (besides some "self-checks" as shown below). It can be done as part of this PR or in another PR.
ue.NudmSDMUri = sdmUri
if ue.NudmUECMUri == "" || ue.NudmSDMUri == "" {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghislainbourgeois, please let me know what you think about my previous comment. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the impact would be in removing it from the other places. I assume the URI of the UDM is saved in the UE context as a form of cache to make things faster, but I definitely do not have the whole context here.
I think I could propose a separate PR with this change, and it would make it safer to test.
If you have any input on running multiple UDMs, that would also be interesting, as I think this cache per UE would only really be useful in the case of multiple UDM instances.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, opening another PR for the removal of NudmSDMUri
would be fine.
For the deployment of multiple UDMs, I think @thakurajayL would be the best person to with it.
Because the AMF is keeping a cache of all the UEs it sees and saves the NFs URI in it, when the UDM is restarted with a new IP for any reason, the first connection by a previously seen user would get rejected.
It is easy to test the previous error by deploying the core and gnbsim, configuring a subscriber and executing a succesful simulation. After that, deleting the UDM pod in Kubernetes and wait for the pod to be restarted properly. Run a new simulation, and it will fail. Running the simulation again after a failure will work, because it will have cleaned the cache.
This change makes it so that we always ask the NRF for the URI to use.