-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JunOS failed commits causes lingering config sessions that are not cleared #1445
Comments
Hey @indy-independence - are you using any of the How are you using NAPALM? I'm asking as I'm not able to reproduce: if there's a syntax error, |
Hey, no I'm not using any optional arguments currently but I did try using config_private to test out the workaround thing I'm using NAPALM via nornir, so this should be the code calling load_replace_candidate etc I think: https://github.com/nornir-automation/nornir_napalm/blob/master/nornir_napalm/plugins/tasks/napalm_configure.py |
Yes, this should be rather fixed in there (i.e., if loading config fails, discard the config instead of trying to commit). |
@mirceaulinic No I think there is a misunderstanding, it does not try to commit since an exception is raised. |
@mirceaulinic I will look into this. @indy-independence and I have been messaging in Slack so I want to look into what is going on a bit more. |
I think the general good approach is to always discard on load failure, regardless on the driver. To me, the implementation you've linked is far from sufficient / correct (regardless on what driver we're referring to). Note, EOS also throws an error when loading an incorrect change:
So the right approach is:
And that should stay true for any NAPALM driver. In code, your implementation should catch the load_replace / load_merge error and call |
Yeah, it looks like EOS does a discard_config() whereas Junos doesn't (after a failed commit_config). @indy-independence pointed this out to me so that is what I wanted to look into. I only looked into it very quickly so I might be wrong on the above, but that is what I wanted to dig into more. |
I tried applying this patch to my napalm code and this seems to result in the same behavior as the napalm EOS driver:
I no longer have those lingering sessions, and I don't have to do any junos driver-specific stuff in the nornir code. For me it makes more sense to change the junos driver here rather than changing the eos driver and the nornir code since I don't see why you would want to keep those broken syntax error config candidates lingering around blocking other changes. |
Yes, I see where you're coming from now... I just wasn't aware that the EOS driver does that. And by the looks of it, it might be the only driver if I'm not mistaken. I've just always assumed that the discard is always done at the framework level. For example, in Salt - as that's what I'm most familiar with - this and many other corner cases like this are being handled gracefully (if curious, see code here). So I'm a bit surprised this is being signaled only now in Nornir (I've somewhat assumed Nornir does all the checks and graceful discard, commit etc.). Seems like I've lived on false assumptions. 😄 Anyway, that said, I'm not at all opposed to doing what you're suggesting, but then let's do it for all the drivers consistently. |
Yes that makes sense, I did a PR with the patch for the junos driver #1448 |
I don't really have strong opinions on whether the discard_config() should be in the Nornir-napalm plugin or in NAPALM itself (as both of you mentioned earlier we should just make the behavior consistent). @mirceaulinic Do you have preferences on where this discard_config() should be? Is there ever a case on a failed commit_config() where you would still want the bad candidate config to be staged? The only real scenario I can see would be for debugging what went wrong with the candidate config. |
Hi We (SURF NL) are still having similar issues on CNaaS-NMS version: 1.4.0b3. But only on Virtual Chassis'. So maybe we could re-open this issue? |
@makzdot You probably should just open a new issue and potentially reference this issue. Note, there was a pull-request that fixed this issue for the Junos driver almost two years ago. So the issue you are experiencing might not be the same as this. |
JunOS failed commits causes lingering config sessions that are not cleared
When doing load_replace_candidate and the candidate configuration has a syntax error, napalm will raise a ReplaceConfigException as expected. However, the failed candidate configuration is not cleared from the router and instead lingers causing future commits to fail. The next attempt at a commit will result in napalm.base.exceptions.LockError and you have to manually log in to the network device and run "request system logout pid ". This behavior differs from the EOS driver in NAPALM for example, which seems to run "discard_config()" after a failed configuration. I think the same should be done in the junos driver, doing a rollback 0 should accomplish the same thing as I understand it. I don't see any reason to lock any future attempts to commit configuration because of a past syntax error. There is a sorf-of workaround possible by changing the configuration to private instead of exclusive, which will allow future commits but still not clear the failed sessions so you will probably run in to an issue with too many sessions after a while.
Did you follow the steps from https://github.com/napalm-automation/napalm#faq
(Place an
x
between the square brackets where applicable)Setup
napalm version
(Paste verbatim output from
pip freeze | grep napalm
between quotes below)Network operating system version
(Paste verbatim output from
show version
- or equivalent - between quotes below)Steps to Reproduce the Issue
Error Traceback
Commit with syntax error, expected output:
Following commit, fails for strange reasons:
The text was updated successfully, but these errors were encountered: