Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acceptance: TestDockerCLI/test_missing_log_output.tcl failure #48413

Closed
petermattis opened this issue May 4, 2020 · 11 comments · Fixed by #48431 or #48787
Closed

acceptance: TestDockerCLI/test_missing_log_output.tcl failure #48413

petermattis opened this issue May 4, 2020 · 11 comments · Fixed by #48431 or #48787
Assignees
Labels
C-investigation Further steps needed to qualify. C-label will change.

Comments

@petermattis
Copy link
Collaborator

Seen on CI of #48145: https://teamcity.cockroachdb.com/viewLog.html?buildId=1920211&buildTypeId=Cockroach_UnitTests

I believe the failure is unrelated to that PR and I can't reproduce it locally. The expect output is https://teamcity.cockroachdb.com/repository/download/Cockroach_UnitTests_Acceptance/1920202:id/acceptance/TestDockerCLI/test_missing_log_output.tcl/runMode%3Ddocker/a8c65a4c/logs/console-output.log

.200504 21:34:19.536416276 EXPECT TEST: START TEST: Check that a broken stderr prints a message to the log files.
...
Terminated
child process exited abnormally
    while executing
"system "grep -F 'log: exiting because of error: write /dev/stderr: broken pipe' logs/db/logs/cockroach.log""
    (file "/go/src/github.com/cockroachdb/cockroach/cli/interactive_tests/test_missing_log_output.tcl" line 26)
:/# non-zero exit code: 1

I'm not quite clear as to what happened. Looks like the grep failed to find the desired string in the log.

@blathers-crl
Copy link

blathers-crl bot commented May 4, 2020

Hi @petermattis, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@petermattis petermattis added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cli C-investigation Further steps needed to qualify. C-label will change. and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels May 4, 2020
@knz
Copy link
Contributor

knz commented May 5, 2020

The error is real and says that the cockroach process has terminated abruptly upon receiving SIGHUP, instead of tolerating the signal and having the log subsystem report (in the log file) that its output stream has become unavailable.

This means there is a regression in the code where the signal is not caught any more.

@knz
Copy link
Contributor

knz commented May 5, 2020

An alternative explanation is that the crdb process is not logging anything at that point. Investigating more.

@petermattis
Copy link
Collaborator Author

@knz This test failed again in roughly the same spot: https://teamcity.cockroachdb.com/viewLog.html?buildId=1923980&buildTypeId=Cockroach_UnitTests

:/# tail -f logs/db/logs/cockroach.log
sql:                 postgresql://root@7d88089bf3bc:26257?sslmode=disable
RPC client flags:    /cockroach/cockroach <client cmd> --host=7d88089bf3bc:26257 --insecure
logs:                /home/roach/logs/db/logs
temp dir:            /home/roach/logs/db/cockroach-temp035214900
external I/O path:   /home/roach/logs/db/extern
store[0]:            path=/home/roach/logs/db
storage engine:      pebble
status:              restarted pre-existing node
clusterID:           ceb2797b-860a-49d7-b959-3a76c196787a
nodeID:              1

.200506 15:24:02.616003736 EXPECT TEST: TIMEOUT WAITING FOR "log: exiting because of error: write /dev/stderr: broken pipe"
non-zero exit code: 1

The above indicates that #48431 is present in the failed build. The timestamps seem to indicate that we waited 30s for the tail. It is curious that db/logs/cockroach.log seems to end before some of the log messages that are printed to stderr.

Note that this CI ran on #48145. I'm not sure why running on this test on Pebble has any effect, but want to bring it to your attention in case it does.

@petermattis
Copy link
Collaborator Author

@knz This is flaking very frequently now. Any guesses as to why making Pebble the default storage engine would have an effect? Any guesses as to what is going on?

craig bot pushed a commit that referenced this issue May 12, 2020
48624: sql: fix bug where show create could show partitions of dropped indexes r=rohany a=rohany

Release note (bug fix): Fix a bug where the `SHOW CREATE` statement
would sometimes show a partitioning step for an index that has been
dropped.

48744: sql: allow global access for pg_database table r=rafiss a=rafiss

In Postgres, this table is world-readable, so we should match.

fixes #48726

Release note (sql change): The pg_database table in pg_catalog
no longer require privileges on any database in order for the
data to be visible.

48753: interactive_tests: skip test_missing_log_output.tcl r=otan a=otan

Touches #48413 - this has flaked quite a few times for me in master / on bors.

Release note: None

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
@knz
Copy link
Contributor

knz commented May 13, 2020

my guess is that the way the process is handling signals has changed. I don't know yet what is going on.

@knz
Copy link
Contributor

knz commented May 13, 2020

fwiw I am able to reproduce this readily now. It's clearly a regression.

@knz
Copy link
Contributor

knz commented May 13, 2020

ok I found it 🤦

@knz
Copy link
Contributor

knz commented May 13, 2020

actually there are two compound problems so I need to investigate more

@knz
Copy link
Contributor

knz commented May 13, 2020

ok found it too 🤦 x 2

@petermattis
Copy link
Collaborator Author

🍿

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change.
Projects
None yet
2 participants