Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes results of some tests disappear if an error occured #115

Closed
ImeevMA opened this issue Aug 17, 2018 · 7 comments · Fixed by #162
Closed

Sometimes results of some tests disappear if an error occured #115

ImeevMA opened this issue Aug 17, 2018 · 7 comments · Fixed by #162
Labels
bug Something isn't working

Comments

@ImeevMA
Copy link
Contributor

ImeevMA commented Aug 17, 2018

Error:

[001] box/errinj.test.lua                                             
[001] Worker "001_box" received the following error; stopping...
[001] Traceback (most recent call last):
[001]   File "/home/mimeev/Work/radiant/test-run/lib/worker.py", line 266, in run_task
[001]     task, self.server, self.inspector)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/test_suite.py", line 193, in run_test
[001]     short_status = test.run(server)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/test.py", line 166, in run
[001]     self.execute(server)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/tarantool_server.py", line 147, in execute
[001]     crash_occured = save_join(lua, timeout=self.TIMEOUT)
[001]   File "/home/mimeev/Work/radiant/test-run/lib/tarantool_server.py", line 50, in save_join
[001]     green_obj.join(timeout=timeout)
[001]   File "/home/mimeev/.local/lib/python2.7/site-packages/gevent/greenlet.py", line 483, in join
[001]     result = self.parent.switch()
[001]   File "/home/mimeev/.local/lib/python2.7/site-packages/gevent/hub.py", line 538, in switch
[001]     return greenlet.switch(self)
[001] TarantoolStartError
[001] 
[001] 
[001] 
[001] [Instance "proxy" returns with non-zero exit code: 1]
[001] 
[001] Last 15 lines of Tarantool Log file [Instance "proxy"][/home/mimeev/Work/radiant/test/var/001_box/proxy.log]:
[001] 2018-08-17 11:15:09.911 [27927] main/106/console/127.0.0.1:38426 I> started
[001] 2018-08-17 11:15:09.911 [27927] main C> entering the event loop
[001] 2018-08-17 11:15:10.252 [27927] main C> got signal 15 - Terminated
[001] Starting instance proxy...
[001] Run console at unix/:/home/mimeev/Work/radiant/test/var/001_box/proxy.control
[001] tcp_server: remove dead UNIX socket: /home/mimeev/Work/radiant/test/var/001_box/proxy.control
[001] started
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy C> Tarantool 1.9.1-70-gc1a16b2
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy C> log level 5
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy I> mapping 117440512 bytes for memtx tuple arena...
[001] 2018-08-17 11:15:11.619 [28413] main/101/proxy I> mapping 134217728 bytes for vinyl tuple arena...
[001] 2018-08-17 11:15:11.625 [28413] iproto/101/main evio.cc:351 E> binary: failed to bind on 127.0.0.1:38194: bind, called on fd 29, aka 0.0.0.0:0
[001] 2018-08-17 11:15:11.625 [28413] main/101/proxy evio.cc:356 !> SystemError binary: failed to bind, called on fd -1: Address already in use
[001] 2018-08-17 11:15:11.625 [28413] main/101/proxy F> can't initialize storage: binary: failed to bind, called on fd -1
[001] 2018-08-17 11:15:11.625 [28413] main/101/proxy F> can't initialize storage: binary: failed to bind, called on fd -1

How to reproduce:

TEST_RUN_TESTS=$(for i in $(seq 1 100); do echo -n 'box/errinj. '; done) make test

Possible result (should pass 100):

---------------------------------------------------------------------------------
Statistics:
* pass: 96
Built target test

OS:

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:  Ubuntu 16.04.5 LTS
Release:  16.04
Codename: xenial
@Totktonada
Copy link
Member

The 'Address already in use' error is likely due to the way test-run assign free ports to tarantool servers. It checks for a free port, then pass it to the tarantool via an environment variable and there is the lag between the check and actual bind. It becomes rarely and have no (known for me) ways to mitigate, because we should save the port in the tarantool_server class to connect to the server to pass lines of a test. Maybe we can read a port tarantool binds from logs and set python's variable then.

Anyway, it is not the problem of this issue. The problem is that test-run does not count the test in this case. It should be counted as failed ones.

@Totktonada Totktonada added the bug Something isn't working label Sep 3, 2018
Totktonada added a commit that referenced this issue Sep 5, 2018
It allows to mitigate the error 'Address already in use' while binding
admin (console) port. Check [1] for more information (see comment
itself, the issue is about the another problem).

[1]: #115 (comment)
Totktonada added a commit to tarantool/graphql.0 that referenced this issue Sep 5, 2018
The problem is that frontend wait infinitely for a storage server, which
have not been initialized due to the error like the following.

```
Start failed: builtin/box/console.lua:560: failed to create server
localhost:57988: Address already in use
```

The problem is described in [1] (see comment itself, the issue is about
the another problem).

The gist of the fix is using unix sockets for admin console of
non-default servers (storages). The option 'use_unix_sockets' affects
only console (admin) socket and does nothing with binary (listen) port
of a storage. It is critical in the benchmarking test suite.

TBD: update test-run from master after test-run's PR #123.

[1]: tarantool/test-run#115 (comment)
Totktonada added a commit that referenced this issue Sep 5, 2018
It allows to mitigate the error 'Address already in use' while binding
admin (console) port. Check [1] for more information (see comment
itself, the issue is about the another problem).

Unified boolean suite.ini options parsing.

[1]: #115 (comment)
Totktonada added a commit that referenced this issue Sep 5, 2018
It allows to mitigate the error 'Address already in use' while binding
admin (console) port. Check [1] for more information (see comment
itself, the issue is about the another problem).

Unified boolean suite.ini options parsing.

[1]: #115 (comment)
Totktonada added a commit to tarantool/graphql.0 that referenced this issue Sep 5, 2018
The problem is that frontend wait infinitely for a storage server, which
have not been initialized due to the error like the following.

```
Start failed: builtin/box/console.lua:560: failed to create server
localhost:57988: Address already in use
```

The problem is described in [1] (see comment itself, the issue is about
the another problem).

The gist of the fix is using unix sockets for admin console of
non-default servers (storages), it is why test-run update is needed. The
option 'use_unix_sockets' affects only console (admin) socket and does
nothing with binary (listen) port of a storage. It is critical in the
benchmarking test suite.

[1]: tarantool/test-run#115 (comment)
Totktonada added a commit to tarantool/graphql.0 that referenced this issue Sep 5, 2018
The problem is that frontend wait infinitely for a storage server, which
have not been initialized due to the error like the following.

```
Start failed: builtin/box/console.lua:560: failed to create server
localhost:57988: Address already in use
```

The problem is described in [1] (see comment itself, the issue is about
the another problem).

The gist of the fix is using unix sockets for admin console of
non-default servers (storages), it is why test-run update is needed. The
option 'use_unix_sockets' affects only console (admin) socket and does
nothing with binary (listen) port of a storage. It is critical in the
benchmarking test suite.

[1]: tarantool/test-run#115 (comment)
sergw pushed a commit to tarantool/tarantool that referenced this issue Sep 11, 2018
* Allow to use unix socket for non-default servers
  (tarantool/test-run#115)
* Fix inspector error reporting for app tests
* Add new config param 'show_reproduce_content'
  (tarantool/test-run#113)
* Print last 15 lines of .result for hung tests
  (tarantool/test-run#107)
@Totktonada
Copy link
Member

Raw idea: choose ports via the coordinator: the main process.

@Totktonada
Copy link
Member

Now we have use_unix_sockets_iproto option. Setting both use_unix_sockets and use_unix_sockets_iproto likely will help with 'address already in use' issue. But this issue is about handling of TarantoolStartError (for default and non-default servers).

@Totktonada
Copy link
Member

Another case:

TEST_RUN_TESTS=$(for i in $(seq 1 500); do echo -n "tarantoolctl.test.lua "; done) make test
[008] Worker "008_app-tap" received the following error; stopping...
[008] Traceback (most recent call last):
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/worker.py", line 291, in run_task
[008]     task, self.server, self.inspector)
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/test_suite.py", line 207, in run_test
[008]     short_status = test.run(server)
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/test.py", line 181, in run
[008]     self.execute(server)
[008]   File "/home/alex/projects/tarantool-meta/r/tarantool/test-run/lib/app_server.py", line 43, in execute
[008]     tarantool.join()
[008]   File "src/gevent/greenlet.py", line 661, in gevent._greenlet.Greenlet.join
[008]   File "src/gevent/greenlet.py", line 687, in gevent._greenlet.Greenlet.join
[008]   File "src/gevent/greenlet.py", line 676, in gevent._greenlet.Greenlet.join
[008]   File "src/gevent/_greenlet_primitives.py", line 59, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
[008]   File "src/gevent/_greenlet_primitives.py", line 63, in gevent.__greenlet_primitives.SwitchOutGreenletWithLoop.switch
[008]   File "src/gevent/__greenlet_primitives.pxd", line 35, in gevent.__greenlet_primitives._greenlet_switch
[008] TarantoolStartError
[008] 
[008] Exception: 
Statistics:
* pass: 499
Built target test
echo $?
0
cat test/var/008_app-tap/remote.log
<...>
2019-02-18 15:23:22.934 [27133] main/101/remote tarantoolctl:541 E> Start failed: builtin/box/console.lua:560: failed to create server localhost:44152: Address already in use

@Totktonada
Copy link
Member

Moved out 'address already in use' part into #141.

@Totktonada
Copy link
Member

The problem should be fixed by 7176ced for 'core = tarantool' test suites. I'll propose the fix for 'core = app' tests soon.

@Totktonada
Copy link
Member

The case with tarantoolctl.test.lua is a kinda specific: we check whether a task was reported using its ID (a test name + a configuration). When the same test runs many times; test-run is not able to detect that the test was run less times then should. It should not affect regular testing, so this part of the issue is not important.

Totktonada added a commit that referenced this issue Apr 17, 2019
This commit handles the case when test-run found that some of scheduled
tasks were not reported as finished by workers. The old behaviour leads
to zero exit code in the case when --force option is not set (looks as
the unintended mistake).

Now this situation causes a testing fail (non-zero exit code) and it
does not matter whether --force option is set.

Show list of failed tasks always except the case when it is caused by a
keyboard interruption.

While we are here clarified and changed related test-run messages.

All that is about the situation that should never occur. See #115 for
such cases.
Totktonada added a commit that referenced this issue Apr 17, 2019
When test_run:cmd('start server foo') is called from an app test and
tarantool fails to start, the following exception is shown:

 | [010] Worker "010_app-tap" received the following error; stopping...
 | [010] Traceback (most recent call last):
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/worker.py", line 294, in run_task
 | [010]     task, self.server, self.inspector)
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test_suite.py", line 208, in run_test
 | [010]     short_status = test.run(server)
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test.py", line 180, in run
 | [010]     self.execute(server)
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/app_server.py", line 43, in execute
 | [010]     tarantool.join()

The commit handles this case and the exception should not be shown
anymore.

This commit follows the similar one for tarantool server: 7176ced ('Fix
reporting of non-default server fail at start ').

The reporting of such situations is not ideal for now (we show output
and logs for a default server, but don't do that for non-default one),
but this will be fixed in the scope of #159.

Fixes #115.
Totktonada added a commit that referenced this issue Apr 29, 2019
This update contains changes from 0.6.5 release (cited below) and usage
of yaml.safe_load() instead of yaml.load() in tarantool-python tests
(doesn't affect test-run behaviour).

The reason why it is updated here is just to keep things in sync and,
second, to eliminate usage of yaml.load() w/o an explicit loader
everywhere where it is possible. The latter is because it was banned in
recent versions of pyyaml in Gentoo Linux; see [1].

[1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c

# tarantool-python 0.6.5

## Breaking changes

This release should not break existing code.

## New features

* Added MeshConnection that allows to switch between nodes from a user
  provided list if a current node is down using round-robin strategy (#106).
* Added connection_timeout parameter to Connection (#115).

## Bugfixes

* Fixed auto-reconnection in Connection.
* Eliminated deprecation warnings on Python 3 (#114).
* Added TCP_NODELAY back (it was removed in 0.6.4) (#127).

https://github.com/tarantool/tarantool-python/releases/tag/0.6.5
Totktonada added a commit that referenced this issue Apr 30, 2019
This update contains changes from 0.6.5 release (cited below) and usage
of yaml.safe_load() instead of yaml.load() in tarantool-python tests
(doesn't affect test-run behaviour).

The reason why it is updated here is just to keep things in sync and,
second, to eliminate usage of yaml.load() w/o an explicit loader
everywhere where it is possible. The latter is because it was banned in
recent versions of pyyaml in Gentoo Linux; see [1].

There was also related change 38400e9
('Update pyyaml version') where yaml.load() was replaced with
yaml.safe_load() within test-run itself.

[1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c

# tarantool-python 0.6.5

## Breaking changes

This release should not break existing code.

## New features

* Added MeshConnection that allows to switch between nodes from a user
  provided list if a current node is down using round-robin strategy (#106).
* Added connection_timeout parameter to Connection (#115).

## Bugfixes

* Fixed auto-reconnection in Connection.
* Eliminated deprecation warnings on Python 3 (#114).
* Added TCP_NODELAY back (it was removed in 0.6.4) (#127).

https://github.com/tarantool/tarantool-python/releases/tag/0.6.5
Totktonada added a commit that referenced this issue Apr 30, 2019
This commit handles the case when test-run found that some of scheduled
tasks were not reported as finished by workers. The old behaviour leads
to zero exit code in the case when --force option is not set (looks as
the unintended mistake).

Now this situation causes a testing fail (non-zero exit code) and it
does not matter whether --force option is set.

Show list of failed tasks always except the case when it is caused by a
keyboard interruption.

While we are here clarified and changed related test-run messages.

All that is about the situation that should never occur. See #115 for
such cases.
Totktonada added a commit that referenced this issue Apr 30, 2019
When test_run:cmd('start server foo') is called from an app test and
tarantool fails to start, the following exception is shown:

 | [010] Worker "010_app-tap" received the following error; stopping...
 | [010] Traceback (most recent call last):
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/worker.py", line 294, in run_task
 | [010]     task, self.server, self.inspector)
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test_suite.py", line 208, in run_test
 | [010]     short_status = test.run(server)
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/test.py", line 180, in run
 | [010]     self.execute(server)
 | [010]   File "/home/alex/projects/tarantool-meta/tarantool/test-run/lib/app_server.py", line 43, in execute
 | [010]     tarantool.join()

The commit handles this case and the exception should not be shown
anymore.

This commit follows the similar one for tarantool server: 7176ced ('Fix
reporting of non-default server fail at start ').

The reporting of such situations is not ideal for now (we show output
and logs for a default server, but don't do that for non-default one),
but this will be fixed in the scope of #159.

Fixes #115.
Totktonada added a commit that referenced this issue Apr 30, 2019
This update contains changes from 0.6.5 release (cited below) and usage
of yaml.safe_load() instead of yaml.load() in tarantool-python tests
(doesn't affect test-run behaviour).

The reason why it is updated here is just to keep things in sync and,
second, to eliminate usage of yaml.load() w/o an explicit loader
everywhere where it is possible. The latter is because it was banned in
recent versions of pyyaml in Gentoo Linux; see [1].

There was also related change 38400e9
('Update pyyaml version') where yaml.load() was replaced with
yaml.safe_load() within test-run itself.

[1]: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=79ba924d94cb0cf8559565178414c2a1d687b90c

# tarantool-python 0.6.5

## Breaking changes

This release should not break existing code.

## New features

* Added MeshConnection that allows to switch between nodes from a user
  provided list if a current node is down using round-robin strategy (#106).
* Added connection_timeout parameter to Connection (#115).

## Bugfixes

* Fixed auto-reconnection in Connection.
* Eliminated deprecation warnings on Python 3 (#114).
* Added TCP_NODELAY back (it was removed in 0.6.4) (#127).

https://github.com/tarantool/tarantool-python/releases/tag/0.6.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants