Sometimes results of some tests disappear if an error occured #115

ImeevMA · 2018-08-17T08:23:44Z

Error:

[001] box/errinj.test.lua                                             
[001] Worker "001_box" received the following error; stopping...
[001] Traceback (most recent call last):
[001]   File "/home/mimeev/Work/radiant/test-run/lib/worker.py", line 266, in run_task
[001]     task, self.server, self.inspector)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/test_suite.py", line 193, in run_test
[001]     short_status = test.run(server)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/test.py", line 166, in run
[001]     self.execute(server)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/tarantool_server.py", line 147, in execute
[001]     crash_occured = save_join(lua, timeout=self.TIMEOUT)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/tarantool_server.py", line 50, in save_join
[001]     green_obj.join(timeout=timeout)
[001]   File "/home/mimeev/.local/lib/python2.7/site-packages/gevent/greenlet.py", line 483, in join
[001]     result = self.parent.switch()
[001]   File "/home/mimeev/.local/lib/python2.7/site-packages/gevent/hub.py", line 538, in switch
[001]     return greenlet.switch(self)
[001] TarantoolStartError
[001] 
[001] 
[001] 
[001] [Instance "proxy" returns with non-zero exit code: 1]
[001] 
[001] Last 15 lines of Tarantool Log file [Instance "proxy"][/home/mimeev/Work/radiant/test/var/001_box/proxy.log]:
[001] 2018-08-17 11:15:09.911 [27927] main/106/console/127.0.0.1:38426 I> started
[001] 2018-08-17 11:15:09.911 [27927] main C> entering the event loop
[001] 2018-08-17 11:15:10.252 [27927] main C> got signal 15 - Terminated
[001] Starting instance proxy...
[001] Run console at unix/:/home/mimeev/Work/radiant/test/var/001_box/proxy.control
[001] tcp_server: remove dead UNIX socket: /home/mimeev/Work/radiant/test/var/001_box/proxy.control
[001] started
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy C> Tarantool 1.9.1-70-gc1a16b2
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy C> log level 5
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy I> mapping 117440512 bytes for memtx tuple arena...
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy I> mapping 134217728 bytes for vinyl tuple arena...
[001] 2018-08-17 11:15:11.625 [28413] iproto/101/main evio.cc:351 E> binary: failed to bind on 127.0.0.1:38194: bind, called on fd 29, aka 0.0.0.0:0
[001] 2018-08-17 11:15:11.625 [28413] main/101/proxy evio.cc:356 !> SystemError binary: failed to bind, called on fd -1: Address already in use
[001] 2018-08-17 11:15:11.625 [28413] main/101/proxy F> can't initialize storage: binary: failed to bind, called on fd -1
[001] 2018-08-17 11:15:11.625 [28413] main/101/proxy F> can't initialize storage: binary: failed to bind, called on fd -1

How to reproduce:

TEST_RUN_TESTS=$(for i in $(seq 1 100); do echo -n 'box/errinj. '; done) make test

Possible result (should pass 100):

---------------------------------------------------------------------------------
Statistics:
* pass: 96
Built target test

OS:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 16.04.5 LTS
Release:  16.04
Codename: xenial

Totktonada · 2018-08-17T12:31:32Z

The 'Address already in use' error is likely due to the way test-run assign free ports to tarantool servers. It checks for a free port, then pass it to the tarantool via an environment variable and there is the lag between the check and actual bind. It becomes rarely and have no (known for me) ways to mitigate, because we should save the port in the tarantool_server class to connect to the server to pass lines of a test. Maybe we can read a port tarantool binds from logs and set python's variable then.

Anyway, it is not the problem of this issue. The problem is that test-run does not count the test in this case. It should be counted as failed ones.

It allows to mitigate the error 'Address already in use' while binding admin (console) port. Check [1] for more information (see comment itself, the issue is about the another problem). [1]: #115 (comment)

The problem is that frontend wait infinitely for a storage server, which have not been initialized due to the error like the following. ``` Start failed: builtin/box/console.lua:560: failed to create server localhost:57988: Address already in use ``` The problem is described in [1] (see comment itself, the issue is about the another problem). The gist of the fix is using unix sockets for admin console of non-default servers (storages). The option 'use_unix_sockets' affects only console (admin) socket and does nothing with binary (listen) port of a storage. It is critical in the benchmarking test suite. TBD: update test-run from master after test-run's PR #123. [1]: tarantool/test-run#115 (comment)

It allows to mitigate the error 'Address already in use' while binding admin (console) port. Check [1] for more information (see comment itself, the issue is about the another problem). Unified boolean suite.ini options parsing. [1]: #115 (comment)

The problem is that frontend wait infinitely for a storage server, which have not been initialized due to the error like the following. ``` Start failed: builtin/box/console.lua:560: failed to create server localhost:57988: Address already in use ``` The problem is described in [1] (see comment itself, the issue is about the another problem). The gist of the fix is using unix sockets for admin console of non-default servers (storages), it is why test-run update is needed. The option 'use_unix_sockets' affects only console (admin) socket and does nothing with binary (listen) port of a storage. It is critical in the benchmarking test suite. [1]: tarantool/test-run#115 (comment)

* Allow to use unix socket for non-default servers (tarantool/test-run#115) * Fix inspector error reporting for app tests * Add new config param 'show_reproduce_content' (tarantool/test-run#113) * Print last 15 lines of .result for hung tests (tarantool/test-run#107)

Totktonada · 2018-10-29T16:02:22Z

Raw idea: choose ports via the coordinator: the main process.

Totktonada · 2019-02-18T12:46:34Z

Now we have use_unix_sockets_iproto option. Setting both use_unix_sockets and use_unix_sockets_iproto likely will help with 'address already in use' issue. But this issue is about handling of TarantoolStartError (for default and non-default servers).

Totktonada · 2019-02-18T12:50:47Z

Another case:

TEST_RUN_TESTS=$(for i in $(seq 1 500); do echo -n "tarantoolctl.test.lua "; done) make test

[008] Worker "008_app-tap" received the following error; stopping...
[008] Traceback (most recent call last):
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/worker.py", line 291, in run_task
[008]     task, self.server, self.inspector)
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/test_suite.py", line 207, in run_test
[008]     short_status = test.run(server)
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/test.py", line 181, in run
[008]     self.execute(server)
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/app_server.py", line 43, in execute
[008]     tarantool.join()
[008]   File "src/gevent/greenlet.py", line 661, in gevent._greenlet.Greenlet.join
[008]   File "src/gevent/greenlet.py", line 687, in gevent._greenlet.Greenlet.join
[008]   File "src/gevent/greenlet.py", line 676, in gevent._greenlet.Greenlet.join
[008]   File "src/gevent/_greenlet_primitives.py", line 59, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
[008]   File "src/gevent/_greenlet_primitives.py", line 63, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
[008]   File "src/gevent/__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
[008] TarantoolStartError
[008] 
[008] Exception:

Statistics:
* pass: 499
Built target test

echo $?
0

cat test/var/008_app-tap/remote.log
<...>
2019-02-18 15:23:22.934 [27133] main/101/remote tarantoolctl:541 E> Start failed: builtin/box/console.lua:560: failed to create server localhost:44152: Address already in use

Totktonada · 2019-02-18T16:35:19Z

Moved out 'address already in use' part into #141.

Totktonada · 2019-04-17T00:31:50Z

The problem should be fixed by 7176ced for 'core = tarantool' test suites. I'll propose the fix for 'core = app' tests soon.

Totktonada · 2019-04-17T02:11:31Z

The case with tarantoolctl.test.lua is a kinda specific: we check whether a task was reported using its ID (a test name + a configuration). When the same test runs many times; test-run is not able to detect that the test was run less times then should. It should not affect regular testing, so this part of the issue is not important.

This commit handles the case when test-run found that some of scheduled tasks were not reported as finished by workers. The old behaviour leads to zero exit code in the case when --force option is not set (looks as the unintended mistake). Now this situation causes a testing fail (non-zero exit code) and it does not matter whether --force option is set. Show list of failed tasks always except the case when it is caused by a keyboard interruption. While we are here clarified and changed related test-run messages. All that is about the situation that should never occur. See #115 for such cases.

When test_run:cmd('start server foo') is called from an app test and tarantool fails to start, the following exception is shown: | [010] Worker "010_app-tap" received the following error; stopping... | [010] Traceback (most recent call last): | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/worker.py", line 294, in run_task | [010] task, self.server, self.inspector) | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test_suite.py", line 208, in run_test | [010] short_status = test.run(server) | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test.py", line 180, in run | [010] self.execute(server) | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/app_server.py", line 43, in execute | [010] tarantool.join() The commit handles this case and the exception should not be shown anymore. This commit follows the similar one for tarantool server: 7176ced ('Fix reporting of non-default server fail at start '). The reporting of such situations is not ideal for now (we show output and logs for a default server, but don't do that for non-default one), but this will be fixed in the scope of #159. Fixes #115.

This update contains changes from 0.6.5 release (cited below) and usage of yaml.safe_load() instead of yaml.load() in tarantool-python tests (doesn't affect test-run behaviour). The reason why it is updated here is just to keep things in sync and, second, to eliminate usage of yaml.load() w/o an explicit loader everywhere where it is possible. The latter is because it was banned in recent versions of pyyaml in Gentoo Linux; see [1]. [1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c # tarantool-python 0.6.5 ## Breaking changes This release should not break existing code. ## New features * Added MeshConnection that allows to switch between nodes from a user provided list if a current node is down using round-robin strategy (#106). * Added connection_timeout parameter to Connection (#115). ## Bugfixes * Fixed auto-reconnection in Connection. * Eliminated deprecation warnings on Python 3 (#114). * Added TCP_NODELAY back (it was removed in 0.6.4) (#127). https://github.com/tarantool/tarantool-python/releases/tag/0.6.5

This update contains changes from 0.6.5 release (cited below) and usage of yaml.safe_load() instead of yaml.load() in tarantool-python tests (doesn't affect test-run behaviour). The reason why it is updated here is just to keep things in sync and, second, to eliminate usage of yaml.load() w/o an explicit loader everywhere where it is possible. The latter is because it was banned in recent versions of pyyaml in Gentoo Linux; see [1]. There was also related change 38400e9 ('Update pyyaml version') where yaml.load() was replaced with yaml.safe_load() within test-run itself. [1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c # tarantool-python 0.6.5 ## Breaking changes This release should not break existing code. ## New features * Added MeshConnection that allows to switch between nodes from a user provided list if a current node is down using round-robin strategy (#106). * Added connection_timeout parameter to Connection (#115). ## Bugfixes * Fixed auto-reconnection in Connection. * Eliminated deprecation warnings on Python 3 (#114). * Added TCP_NODELAY back (it was removed in 0.6.4) (#127). https://github.com/tarantool/tarantool-python/releases/tag/0.6.5

This commit handles the case when test-run found that some of scheduled tasks were not reported as finished by workers. The old behaviour leads to zero exit code in the case when --force option is not set (looks as the unintended mistake). Now this situation causes a testing fail (non-zero exit code) and it does not matter whether --force option is set. Show list of failed tasks always except the case when it is caused by a keyboard interruption. While we are here clarified and changed related test-run messages. All that is about the situation that should never occur. See #115 for such cases.

When test_run:cmd('start server foo') is called from an app test and tarantool fails to start, the following exception is shown: | [010] Worker "010_app-tap" received the following error; stopping... | [010] Traceback (most recent call last): | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/worker.py", line 294, in run_task | [010] task, self.server, self.inspector) | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test_suite.py", line 208, in run_test | [010] short_status = test.run(server) | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test.py", line 180, in run | [010] self.execute(server) | [010] File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/app_server.py", line 43, in execute | [010] tarantool.join() The commit handles this case and the exception should not be shown anymore. This commit follows the similar one for tarantool server: 7176ced ('Fix reporting of non-default server fail at start '). The reporting of such situations is not ideal for now (we show output and logs for a default server, but don't do that for non-default one), but this will be fixed in the scope of #159. Fixes #115.

This update contains changes from 0.6.5 release (cited below) and usage of yaml.safe_load() instead of yaml.load() in tarantool-python tests (doesn't affect test-run behaviour). The reason why it is updated here is just to keep things in sync and, second, to eliminate usage of yaml.load() w/o an explicit loader everywhere where it is possible. The latter is because it was banned in recent versions of pyyaml in Gentoo Linux; see [1]. There was also related change 38400e9 ('Update pyyaml version') where yaml.load() was replaced with yaml.safe_load() within test-run itself. [1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c # tarantool-python 0.6.5 ## Breaking changes This release should not break existing code. ## New features * Added MeshConnection that allows to switch between nodes from a user provided list if a current node is down using round-robin strategy (#106). * Added connection_timeout parameter to Connection (#115). ## Bugfixes * Fixed auto-reconnection in Connection. * Eliminated deprecation warnings on Python 3 (#114). * Added TCP_NODELAY back (it was removed in 0.6.4) (#127). https://github.com/tarantool/tarantool-python/releases/tag/0.6.5

Totktonada added the bug Something isn't working label Sep 3, 2018

Totktonada mentioned this issue Sep 5, 2018

Fix flaky fails with hang tests tarantool/graphql.0#213

Merged

Totktonada mentioned this issue Feb 18, 2019

Fix 'Address already in use' error with TCP sockets #141

Closed

Totktonada mentioned this issue Feb 18, 2019

test: fix 'address already in use' flaky fails tarantool/tarantool#4008

Closed

Totktonada mentioned this issue Apr 17, 2019

Fail testing when test-run fails internally #161

Merged

Totktonada mentioned this issue Apr 17, 2019

Catch non-default server start fail in app server #162

Merged

Totktonada mentioned this issue Apr 29, 2019

Update tarantool-python submodule #165

Merged

Totktonada closed this as completed in #162 Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes results of some tests disappear if an error occured #115

Sometimes results of some tests disappear if an error occured #115

ImeevMA commented Aug 17, 2018

Totktonada commented Aug 17, 2018

Totktonada commented Oct 29, 2018

Totktonada commented Feb 18, 2019

Totktonada commented Feb 18, 2019

Totktonada commented Feb 18, 2019

Totktonada commented Apr 17, 2019

Totktonada commented Apr 17, 2019

Sometimes results of some tests disappear if an error occured #115

Sometimes results of some tests disappear if an error occured #115

Comments

ImeevMA commented Aug 17, 2018

Totktonada commented Aug 17, 2018

Totktonada commented Oct 29, 2018

Totktonada commented Feb 18, 2019

Totktonada commented Feb 18, 2019

Totktonada commented Feb 18, 2019

Totktonada commented Apr 17, 2019

Totktonada commented Apr 17, 2019