Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JunOS failed commits causes lingering config sessions that are not cleared #1445

Closed
1 task done
indy-independence opened this issue May 12, 2021 · 13 comments
Closed
1 task done

Comments

@indy-independence
Copy link
Contributor

JunOS failed commits causes lingering config sessions that are not cleared

When doing load_replace_candidate and the candidate configuration has a syntax error, napalm will raise a ReplaceConfigException as expected. However, the failed candidate configuration is not cleared from the router and instead lingers causing future commits to fail. The next attempt at a commit will result in napalm.base.exceptions.LockError and you have to manually log in to the network device and run "request system logout pid ". This behavior differs from the EOS driver in NAPALM for example, which seems to run "discard_config()" after a failed configuration. I think the same should be done in the junos driver, doing a rollback 0 should accomplish the same thing as I understand it. I don't see any reason to lock any future attempts to commit configuration because of a past syntax error. There is a sorf-of workaround possible by changing the configuration to private instead of exclusive, which will allow future commits but still not clear the failed sessions so you will probably run in to an issue with too many sessions after a while.

Did you follow the steps from https://github.com/napalm-automation/napalm#faq

(Place an x between the square brackets where applicable)

  • Yes
  • [] No

Setup

napalm version

(Paste verbatim output from pip freeze | grep napalm between quotes below)

napalm==3.2.0
nornir-napalm==0.1.1
junos-eznc==2.5.4

Network operating system version

(Paste verbatim output from show version - or equivalent - between quotes below)

Junos: 20.4R1-S1.2

Steps to Reproduce the Issue

Error Traceback

Commit with syntax error, expected output:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/device.py", line 837, in execute
    filter_xml=kvargs.get("filter_xml"),
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 165, in wrapper
    raise ex
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 117, in wrapper
    rsp = function(self, *args, **kwargs)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/device.py", line 1440, in _rpc_reply
    return self._conn.rpc(rpc_cmd_e, filter_xml)._NCElement__doc
  File "/opt/cnaas/venv/lib/python3.7/site-packages/ncclient/manager.py", line 292, in rpc
    huge_tree=self._huge_tree).request(*args, **kwds)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/ncclient/operations/rpc.py", line 506, in request
    return self._request(node)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/ncclient/operations/rpc.py", line 365, in _request
    raise RPCError(to_ele(self._reply._raw), errs=errors)
ncclient.operations.rpc.RPCError: error: syntax error
error: error recovery ignores input until this point

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/utils/config.py", line 491, in try_load
    rpc_contents, ignore_warning=ignore_warning, **rpc_xattrs
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/rpcmeta.py", line 288, in load_config
    return self._junos.execute(rpc, ignore_warning=ignore_warning)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 76, in wrapper
    return function(*args, **kwargs)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 31, in wrapper
    return function(*args, **kwargs)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/device.py", line 854, in execute
    raise EzErrors.RpcError(cmd=rpc_cmd_e, rsp=rsp, errs=ex)
jnpr.junos.exception.RpcError: RpcError(severity: error, bad_element: syntaxerror, message: error: syntax error
error: error recovery ignores input until this point)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 253, in _load_candidate
    ignore_warning=self.ignore_warning,
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/utils/config.py", line 579, in load
    return try_load(rpc_contents, rpc_xattrs, ignore_warning=ignore_warning)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/utils/config.py", line 496, in try_load
    raise ConfigLoadError(cmd=err.cmd, rsp=err.rsp, errs=err.errs)
jnpr.junos.exception.ConfigLoadError: ConfigLoadError(severity: error, bad_element: syntaxerror, message: error: syntax error
error: error recovery ignores input until this point)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/nornir/core/task.py", line 98, in start
    r = self.task(self, **self.params)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/nornir_napalm/plugins/tasks/napalm_configure.py", line 32, in napalm_configure
    device.load_replace_candidate(filename=filename, config=configuration)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 264, in load_replace_candidate
    self._load_candidate(filename, config, True)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 257, in _load_candidate
    raise ReplaceConfigException(e.errs)
napalm.base.exceptions.ReplaceConfigException: [{'severity': 'error', 'source': None, 'edit_path': None, 'bad_element': 'syntaxerror', 'message': 'syntax error'}, {'severity': 'error', 'source': None, 'edit_path': None, 'bad_element': '}', 'message': 'error recovery ignores input until this point'}]

Following commit, fails for strange reasons:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/device.py", line 837, in execute
    filter_xml=kvargs.get("filter_xml"),
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 165, in wrapper
    raise ex
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 117, in wrapper
    rsp = function(self, *args, **kwargs)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/device.py", line 1440, in _rpc_reply
    return self._conn.rpc(rpc_cmd_e, filter_xml)._NCElement__doc
  File "/opt/cnaas/venv/lib/python3.7/site-packages/ncclient/manager.py", line 292, in rpc
    huge_tree=self._huge_tree).request(*args, **kwds)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/ncclient/operations/rpc.py", line 506, in request
    return self._request(node)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/ncclient/operations/rpc.py", line 367, in _request
    raise self._reply.error
ncclient.operations.rpc.RPCError: 
configuration database locked by:
  admin terminal  (pid 32260) on since 2021-05-06 11:04:06 UTC, idle 00:04:48
      exclusive 


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/utils/config.py", line 603, in lock
    self.rpc.lock_configuration()
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/rpcmeta.py", line 364, in _exec_rpc
    return self._junos.execute(rpc, **dec_args)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 76, in wrapper
    return function(*args, **kwargs)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/decorators.py", line 31, in wrapper
    return function(*args, **kwargs)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/device.py", line 854, in execute
    raise EzErrors.RpcError(cmd=rpc_cmd_e, rsp=rsp, errs=ex)
jnpr.junos.exception.RpcError: RpcError(severity: error, bad_element: None, message: configuration database locked by:
  admin terminal  (pid 32260) on since 2021-05-06 11:04:06 UTC, idle 00:04:48
      exclusive)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 150, in _lock
    self.device.cu.lock()
  File "/opt/cnaas/venv/lib/python3.7/site-packages/jnpr/junos/utils/config.py", line 606, in lock
    raise LockError(rsp=err.rsp)
jnpr.junos.exception.LockError: LockError(severity: error, bad_element: None, message: configuration database locked by:
  admin terminal  (pid 32260) on since 2021-05-06 11:04:06 UTC, idle 00:04:48
      exclusive)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/cnaas/venv/lib/python3.7/site-packages/nornir/core/task.py", line 98, in start
    r = self.task(self, **self.params)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/nornir_napalm/plugins/tasks/napalm_configure.py", line 32, in napalm_configure
    device.load_replace_candidate(filename=filename, config=configuration)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 264, in load_replace_candidate
    self._load_candidate(filename, config, True)
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 234, in _load_candidate
    self._lock()
  File "/opt/cnaas/venv/lib/python3.7/site-packages/napalm/junos/junos.py", line 153, in _lock
    raise LockError(str(jle))
napalm.base.exceptions.LockError: LockError(severity: error, bad_element: None, message: configuration database locked by:
  admin terminal  (pid 32260) on since 2021-05-06 11:04:06 UTC, idle 00:04:48
      exclusive)
@mirceaulinic
Copy link
Member

Hey @indy-independence - are you using any of the lock_disable, config_lock or config_private optional arguments by chance?

How are you using NAPALM? I'm asking as I'm not able to reproduce: if there's a syntax error, load_replace_candidate / load_merge_candidate will error and will tell you about it, e.g., [{'severity': 'error', 'source': None, 'edit_path': None, 'bad_element': 'fake', 'message': 'syntax error'}] so you don't even get to the commit step normally. Could you clarify?

@indy-independence
Copy link
Contributor Author

Hey, no I'm not using any optional arguments currently but I did try using config_private to test out the workaround thing

I'm using NAPALM via nornir, so this should be the code calling load_replace_candidate etc I think: https://github.com/nornir-automation/nornir_napalm/blob/master/nornir_napalm/plugins/tasks/napalm_configure.py
Indeed looking at the stack trace it seems like it never reaches the commit_config() call

@mirceaulinic
Copy link
Member

Yes, this should be rather fixed in there (i.e., if loading config fails, discard the config instead of trying to commit).

@indy-independence
Copy link
Contributor Author

indy-independence commented May 18, 2021

@mirceaulinic No I think there is a misunderstanding, it does not try to commit since an exception is raised.
Also the main point here is that I think napalm should handle all drivers in the same way, it seems wrong that when using napalm with junos driver you have to manually run discard config somewhere but for napalm with eos driver you don't need to have that code. I want napalm to be the abstraction layer so that I don't have to handle device specific stuff in my own code

@ktbyers ktbyers reopened this May 18, 2021
@ktbyers
Copy link
Contributor

ktbyers commented May 18, 2021

@mirceaulinic I will look into this. @indy-independence and I have been messaging in Slack so I want to look into what is going on a bit more.

@mirceaulinic
Copy link
Member

mirceaulinic commented May 18, 2021

Also the main point here is that I think napalm should handle all drivers in the same way, it seems wrong that when using napalm with junos driver you have to manually run discard config somewhere but for napalm with eos driver you don't need to have that code.

I think the general good approach is to always discard on load failure, regardless on the driver. To me, the implementation you've linked is far from sufficient / correct (regardless on what driver we're referring to).

Note, EOS also throws an error when loading an incorrect change:

Error [1002]: CLI command 3 of 3 'fake' failed: invalid command [Invalid input (at token 0: 'fake')]

So the right approach is:

  1. Try to load the config.
  2. a. If it fails, discard the config (and unlock)
  3. b. If it doesn't fail, proceed with commit.

And that should stay true for any NAPALM driver.

In code, your implementation should catch the load_replace / load_merge error and call discard_config when it bombs out. That's it.

@ktbyers
Copy link
Contributor

ktbyers commented May 18, 2021

Yeah, it looks like EOS does a discard_config() whereas Junos doesn't (after a failed commit_config). @indy-independence pointed this out to me so that is what I wanted to look into.

I only looked into it very quickly so I might be wrong on the above, but that is what I wanted to dig into more.

@indy-independence
Copy link
Contributor Author

I tried applying this patch to my napalm code and this seems to result in the same behavior as the napalm EOS driver:

diff --git a/napalm/junos/junos.py b/napalm/junos/junos.py
index 5867fd90..128749b0 100644
--- a/napalm/junos/junos.py
+++ b/napalm/junos/junos.py
@@ -267,6 +267,7 @@ class JunOSDriver(NetworkDriver):
                 ignore_warning=self.ignore_warning,
             )
         except ConfigLoadError as e:
+            self.discard_config()
             if self.config_replace:
                 raise ReplaceConfigException(e.errs)
             else:

I no longer have those lingering sessions, and I don't have to do any junos driver-specific stuff in the nornir code. For me it makes more sense to change the junos driver here rather than changing the eos driver and the nornir code since I don't see why you would want to keep those broken syntax error config candidates lingering around blocking other changes.

@mirceaulinic
Copy link
Member

mirceaulinic commented May 19, 2021

Yes, I see where you're coming from now... I just wasn't aware that the EOS driver does that. And by the looks of it, it might be the only driver if I'm not mistaken.

I've just always assumed that the discard is always done at the framework level. For example, in Salt - as that's what I'm most familiar with - this and many other corner cases like this are being handled gracefully (if curious, see code here). So I'm a bit surprised this is being signaled only now in Nornir (I've somewhat assumed Nornir does all the checks and graceful discard, commit etc.). Seems like I've lived on false assumptions. 😄

Anyway, that said, I'm not at all opposed to doing what you're suggesting, but then let's do it for all the drivers consistently.

@indy-independence
Copy link
Contributor Author

Yes that makes sense, I did a PR with the patch for the junos driver #1448

@ktbyers
Copy link
Contributor

ktbyers commented May 19, 2021

I don't really have strong opinions on whether the discard_config() should be in the Nornir-napalm plugin or in NAPALM itself (as both of you mentioned earlier we should just make the behavior consistent).

@mirceaulinic Do you have preferences on where this discard_config() should be? Is there ever a case on a failed commit_config() where you would still want the bad candidate config to be staged? The only real scenario I can see would be for debugging what went wrong with the candidate config.

@makzdot
Copy link

makzdot commented Feb 15, 2023

Hi We (SURF NL) are still having similar issues on CNaaS-NMS version: 1.4.0b3. But only on Virtual Chassis'. So maybe we could re-open this issue?

@ktbyers
Copy link
Contributor

ktbyers commented Feb 15, 2023

@makzdot You probably should just open a new issue and potentially reference this issue. Note, there was a pull-request that fixed this issue for the Junos driver almost two years ago.

So the issue you are experiencing might not be the same as this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants