Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base client hidden exceptions processing #688

Open
KonsKo opened this issue Sep 25, 2024 · 2 comments
Open

Base client hidden exceptions processing #688

KonsKo opened this issue Sep 25, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request Infrastructure The framework reworking and extensions

Comments

@KonsKo
Copy link
Contributor

KonsKo commented Sep 25, 2024

Current implementation of the framework.client.Client has a hidden error(perhaps this is architecture feature). But it is not easy to debug with such realization.

Let's look at a situation: we have a test that work with a client, for my case it is ratecheck (t_frang.test_connection_rate_burst.FrangTcpVsBoth.test_burst).

Now, I already knew, that the client did not exist on my machine (was not installed).

But when I run the test with low details first times I received something like:

test_burst (t_frang.test_connection_rate_burst.FrangTcpVsBoth.test_burst) ... ERROR

======================================================================
ERROR: test_burst (t_frang.test_connection_rate_burst.FrangTcpVsBoth.test_burst)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/Tests_on_bare_metal/tempesta-test/t_frang/test_connection_rate_burst.py", line 262, in test_burst
    tls_reset_conn_n, tls_total_conn_n = parse_out(client)
                                         ^^^^^^^^^^^^^^^^^
  File "/home/jenkins/workspace/Tests_on_bare_metal/tempesta-test/t_frang/test_connection_rate_burst.py", line 26, in parse_out
    reset_conn_n = int(rx.findall(client.stdout.decode())[0])
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

----------------------------------------------------------------------
Ran 1 test in 1.239s

FAILED (errors=1) 

The real problem is hidden.

I can set more detailed logs and I may see an error, but the error displayed in the middle of log output and I have to search for it, and it is just a message (it is colored now, but we added logs processing recently).

Real problem here:

def _run_client(client, resq: multiprocessing.Queue):

def _run_client(client, resq: multiprocessing.Queue):
    try:
        res = remote.client.run_cmd(client.cmd, timeout=(client.duration + 5))
    except remote.CmdError as e:
        res = (e.stdout, e.stderr)
        client.returncode = e.returncode
    resq.put(res)
    tf_cfg.dbg(3, "\tClient exit")

run_cmd raises an exception if an error was caught, but here it become hidden.

We need to explicit (re)raise an exception in such cases to provide an user correct information.

I have not updated the code right away, because the client works with multiprocessing and we need to think about the best way how to implement it, otherwise side effects may occur.

@RomanBelozerov RomanBelozerov added enhancement New feature or request Infrastructure The framework reworking and extensions labels Sep 27, 2024
@KonsKo
Copy link
Contributor Author

KonsKo commented Sep 30, 2024

One more similar case:

def force_stop(self):

If an exception is caught, it is being hidden (all information is provided, but the situation is not considered as an error)

Example (there is an exception but test is considered as successfull):

test_load_distribution (t_sched.test_h2_ratio_weight.RatioH2.test_load_distribution) ... 24-09-27 21:15:35.321801 CRITICAL  | Exception in stopping         tf_cfg.py:245
                                  process: ("Cmd                                
                                  '/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta/scripts/te              
                                  mpesta.sh --stop' got timed out               
                                  (timeout value is 30)",                       
                                  b'Stopping                                    
                                  Tempesta...\nnet.tempesta.state               
                                  = stop\n...unload Tempesta                    
                                  modules\nUn-loading Tempesta                  
                                  kernel modules...\ndone\n',                   
                                  b'Unit irqbalance.service could               
                                  not be found.\n', -9), type:                  
                                  <class                                        
                                  'helpers.remote.CmdError'>,                   
                                  traceback:                                    
                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 106, in                   
                                  run_cmd                                       
                                      stdout, stderr =                          
                                  p.communicate(timeout=timeout)                
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^                           
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1209, in communicate                
                                      stdout, stderr =                          
                                  self._communicate(input,                      
                                  endtime, timeout)                             
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^               
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 2116, in _communicate               
                                      self._check_timeout(endtime,              
                                  orig_timeout, stdout, stderr)                 
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1253, in                            
                                  _check_timeout                                
                                      raise TimeoutExpired(                     
                                  subprocess.TimeoutExpired:                    
                                  Command                                       
                                  '/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta/scripts/te              
                                  mpesta.sh --stop' timed out                   
                                  after 30 seconds                              
                                                                                
                                  During handling of the above                  
                                  exception, another exception                  
                                  occurred:                                     
                                                                                
                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/stateful.py", line 68, in                  
                                  force_stop                                    
                                      stop_proc()                               
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/control.py", line 355, in                  
                                  stop_tempesta                                 
                                      self.node.run_cmd(cmd,                    
                                  timeout=30,                                   
                                  err_msg=(self.err_msg % "stop"))              
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 114, in                   
                                  run_cmd                                       
                                      raise CmdError(                           
                                  helpers.remote.CmdError: ("Cmd                
                                  '/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta/scripts/te              
                                  mpesta.sh --stop' got timed out               
                                  (timeout value is 30)",                       
                                  b'Stopping                                    
                                  Tempesta...\nnet.tempesta.state               
                                  = stop\n...unload Tempesta                    
                                  modules\nUn-loading Tempesta                  
                                  kernel modules...\ndone\n',                   
                                  b'Unit irqbalance.service could               
                                  not be found.\n', -9)                         
                                                                                
ok.         **<------ Success**

@KonsKo
Copy link
Contributor Author

KonsKo commented Sep 30, 2024

Fails on a teardown or a cleanup should be considered as fail.

One more example:

test_wait_for_connections_negative (selftests.test_lxc_server.TestLxcServer.test_wait_for_connections_negative) ... 24-09-30 23:20:32.577133 CRITICAL  | Exception in stopping         tf_cfg.py:245
                                  process: ("Cmd 'lxc stop                      
                                  tempesta-site-stage' got timed                
                                  out (timeout value is 10)", b'',              
                                  b'', -9), type: <class                        
                                  'helpers.remote.CmdError'>,                   
                                  traceback:                                    
                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 106, in                   
                                  run_cmd                                       
                                      stdout, stderr =                          
                                  p.communicate(timeout=timeout)                
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^                           
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1209, in communicate                
                                      stdout, stderr =                          
                                  self._communicate(input,                      
                                  endtime, timeout)                             
                                                       ^^^^^^^^^^^              
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^               
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 2116, in _communicate               
                                      self._check_timeout(endtime,              
                                  orig_timeout, stdout, stderr)                 
                                    File                                        
                                  "/usr/lib/python3.11/subprocess.              
                                  py", line 1253, in                            
                                  _check_timeout                                
                                      raise TimeoutExpired(                     
                                  subprocess.TimeoutExpired:                    
                                  Command 'lxc stop                             
                                  tempesta-site-stage' timed out                
                                  after 10 seconds                              
                                                                                
                                  During handling of the above                  
                                  exception, another exception                  
                                  occurred:                                     
                                                                                
                                  Traceback (most recent call                   
                                  last):                                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/stateful.py", line 68, in                  
                                  force_stop                                    
                                      stop_proc()                               
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/frame              
                                  work/lxc_server.py", line 153,                
                                  in stop_server                                
                                      self.node.run_cmd(self._cons              
                                  truct_cmd(["stop",                            
                                  self.container_name]))                        
                                    File                                        
                                  "/home/jenkins/workspace/Tests_o              
                                  n_bare_metal/tempesta-test/helpe              
                                  rs/remote.py", line 114, in                   
                                  run_cmd                                       
                                      raise CmdError(                           
                                  helpers.remote.CmdError: ("Cmd                
                                  'lxc stop tempesta-site-stage'                
                                  got timed out (timeout value is               
                                  10)", b'', b'', -9)                           
                                                                                
ok         **<------ Success**

@RomanBelozerov RomanBelozerov added this to the 0.9 (LA) for tests milestone Oct 22, 2024
@RomanBelozerov RomanBelozerov self-assigned this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Infrastructure The framework reworking and extensions
Projects
None yet
Development

No branches or pull requests

2 participants