Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/tests: TestRandomSyntaxFunctions failed #69731

Closed
cockroach-teamcity opened this issue Sep 2, 2021 · 18 comments · Fixed by #69963
Closed

sql/tests: TestRandomSyntaxFunctions failed #69731

cockroach-teamcity opened this issue Sep 2, 2021 · 18 comments · Fixed by #69963
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.

Comments

@cockroach-teamcity
Copy link
Member

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ 63679e7bf4a0244c6e46381751543cfe3f32388c:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Sep 2, 2021
@otan
Copy link
Contributor

otan commented Sep 2, 2021

@Azhng know anything about this?

@Azhng
Copy link
Contributor

Azhng commented Sep 2, 2021

Seems like duplicated for #69541. Spent some time investigating earlier this week, had difficult time reproducing it. Any thoughts on what I should be trying next ?

@rafiss
Copy link
Collaborator

rafiss commented Sep 2, 2021

fyi the reason it duplicated is because the issue title was changed. if the title were sql/tests: TestRandomSyntaxFunctions failed [reset_sql_stats] then i think it won't duplicate the issue.

@rafiss
Copy link
Collaborator

rafiss commented Sep 2, 2021

there's a pretty huge goroutine dump in https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RandomSyntaxTests/3397108:id/raw.0.json.txt.tgz!/artifacts/full_output.txt -- were you able to find anything helpful in there?

@rafiss
Copy link
Collaborator

rafiss commented Sep 2, 2021

this part seems related, but i can't tell if this indicates a problem or not


        goroutine 1887035 [select]:
        github.com/cockroachdb/cockroach/pkg/sql.(*ieResultChannel).firstResult(0xc010e4dc00, 0x6a4ec48, 0xc016880cf0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal_result_channel.go:114 +0x114
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).execInternal(0xc01a388480, 0x6a4ec48, 0xc016880cf0, 0x535b9ff, 0xf, 0xc010e4dc00, 0x0, 0x533455b, 0x4, 0x0, ...)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:795 +0xacf
        github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).ExecEx(0xc01a388480, 0x6a4ec48, 0xc0167fe930, 0x535b9ff, 0xf, 0x0, 0x533455b, 0x4, 0x0, 0x0, ...)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:533 +0x195
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*Controller).ResetClusterSQLStats.func1(0x5397479, 0x1b, 0xc0167fe930, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/controller.go:69 +0x192
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*Controller).ResetClusterSQLStats(0xc01b57cf90, 0x6a4ec48, 0xc0167fe930, 0x1893f85, 0x4b8a1e0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/controller.go:82 +0xb3
        github.com/cockroachdb/cockroach/pkg/sql/sem/builtins.glob..func305(0xc0167ee700, 0x92860d0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sem/builtins/builtins.go:5997 +0x62
        github.com/cockroachdb/cockroach/pkg/sql/sem/tree.(*FuncExpr).Eval(0xc0165a5c80, 0xc0167ee700, 0x1, 0xc007efca60, 0xc002d793e0, 0x1)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:4203 +0xbc
        github.com/cockroachdb/cockroach/pkg/sql.(*valuesNode).startExec(0xc00feb3a90, 0x6a4ec48, 0xc0167fe930, 0xc0167ee700, 0xc011fdb8d0, 0x0, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/values.go:156 +0x1fc
        github.com/cockroachdb/cockroach/pkg/sql.startExec.func2(0x533af0b, 0x6, 0x6a355d8, 0xc00feb3a90, 0x0, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan.go:509 +0xcb

i'd start by making sure you are using the internal executor correctly in that part of the code

@rafiss
Copy link
Collaborator

rafiss commented Sep 2, 2021

Hm it's doing a TRUNCATE - i guess it's possible that there'd be a lot of contention on these stats tables during this test, so maybe the TRUNCATE is being retried endlessly?

@Azhng
Copy link
Contributor

Azhng commented Sep 2, 2021

The code that uses internal executor is here

Effectively, it issues two queries:

TRUNCATE system.statement_statistics;
TRUNCATE system.transaction_statistics;

IIUC, these two queries would create two schema change jobs that drops the indexes and cleanup afterwards.

@Azhng
Copy link
Contributor

Azhng commented Sep 2, 2021

Ah yes, the flusher is also part of the stack trace. So it seems like it is possible that the stats flush is preventing the table from being dropped.

        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:270 +0x515
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).doFlushSingleTxnStats.func1.1(0x6a36830, 0xc010d10840, 0xc0138d7760, 0x4a47320, 0x1, 0xc00224bf60)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:87 +0x94
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).doInsertElseDoUpdate(0xc015cbfc20, 0x6a36830, 0xc010d10840, 0xc0138d7760, 0xc0214dd788, 0xc0214dd740, 0xc0214dd6f8, 0xc019af8a20, 0xc001aaf0e0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:204 +0x48
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).doFlushSingleTxnStats.func1(0x6a36830, 0xc010d10840, 0xc0138d7760, 0x0, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:115 +0x30c
        github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1(0x6a36830, 0xc010d10840, 0xc0138d7760, 0xc001aaf0e0, 0x3ff0000000000000)
        	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:839 +0x43
        github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec(0xc0138d7760, 0x6a36830, 0xc010d10840, 0xc0214ddb60, 0xc0138d7760, 0x14e27018d70)
        	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:879 +0xdc
        github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0xc01dc32e80, 0x6a36830, 0xc010d10840, 0xc0214ddbb0, 0xc0214ddbe0, 0x967546)
        	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:838 +0x105
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).doFlushSingleTxnStats(0xc015cbfc20, 0x6a36830, 0xc010d10840, 0xc001aaeea0, 0xed8bd1169, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:79 +0x85
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).Flush.func2.1(0xc015cbfc20, 0x266a465c)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:46 +0x45
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).doFlush(0xc015cbfc20, 0x6a36830, 0xc010d10840, 0xc0214ddd08, 0x53db1a5, 0x26)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:73 +0x10a
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).Flush.func2(0x6a36830, 0xc010d10840, 0xc001aaeea0, 0x0, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:45 +0x9c
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/sslocal.(*SQLStats).IterateTransactionStats(0xc01a09d500, 0x6a36830, 0xc010d10840, 0xc007b60432, 0xc0214dde08, 0x0, 0x0)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/sslocal/sslocal_provider.go:160 +0x8a
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).Flush(0xc015cbfc20, 0x6a36830, 0xc010d10840)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/flush.go:44 +0x29c
        github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats.(*PersistedSQLStats).startSQLStatsFlushLoop.func1(0x6a36830, 0xc010d10840)
        	/go/src/github.com/cockroachdb/cockroach/pkg/sql/sqlstats/persistedsqlstats/provider.go:152 +0x2ac
        github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2(0xc01dc32180, 0x6a36830, 0xc010d10840, 0x0, 0x0, 0xc0247bd788)
        	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:442 +0xf3
        created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx
        	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:434 +0x22b

@Azhng
Copy link
Contributor

Azhng commented Sep 2, 2021

Hmm let me write a script to prove my hypothesis.

@Azhng
Copy link
Contributor

Azhng commented Sep 2, 2021

Hmm seems like the flush is not that's blocking the table drop. I had the following script spamming unique statement fingerprints and called SELECT crdb_internal.reset_sql_stats() during the flush. The builtin finished within about 100ms.

#!/usr/bin/env bash

UNTIL=$1
START=1

set -eu

str='SELECT 1'

echo "Spamming stmt fingerprints"

for (( i = $START; i <= $UNTIL; i++)); do
  echo -n -e "$i/$UNTIL\r"
  str="$str,1"
  $HOME/src/github.com/cockroach/cockroach sql --insecure -e "$str" > /dev/null
done;

echo ""
echo "Done."

@Azhng
Copy link
Contributor

Azhng commented Sep 2, 2021

The only other piece of code that touches those two tables are SQLStatsCompactor. However, it's not present in the stack trace at all :/

@cockroach-teamcity
Copy link
Member Author

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ 42e5f9492d0d8d93638241303bca984fe78baae3:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ 44c95054cfe0108df872d2cda7952fc92259426b:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ e369d86b3bac2b9f40d760c3e6e49f55b7e15abe:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ 642e44afbe6098f25022618be76c6f7b6b97df45:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ 2a7ef521a38d923cca9ec96e229237bbedc0c5fa:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@rafiss
Copy link
Collaborator

rafiss commented Sep 9, 2021

@Azhng This is failing pretty reliably, so I think we'll need to do something about it. If you are confident there isn't a major bug with the function, then can you update the randomized test so that it doesn't try using this function?

Azhng added a commit to Azhng/cockroach that referenced this issue Sep 9, 2021
Previously, crdb_internal.reset_sql_stats() causes timeout
in TestRandomSyntaxFunctions. This is very unlikely due to
implementation of the function, and it is likely caused
by contentions.
This commit skip the tests for crdb_internal.reset_sql_stats()
to prevent nightly failures.

Related cockroachdb#69731

Release justification: Non-production code changes

Release note: None
@rafiss rafiss linked a pull request Sep 9, 2021 that will close this issue
craig bot pushed a commit that referenced this issue Sep 10, 2021
69806: kv/kvserver: use generalized engine keys in debug printing r=AlexTalks a=AlexTalks

Previously the CLI debug command only printed `MVCCKey`s, resulting in
the debug printer to error on key decoding whenever a `LockTableKey` was
encountered.  By switching to the more generalized `EngineKey` (and
utilizing the existing MVCC key formatting whenever the key has an MVCC
version), we can increase visibility into our debug logs while investigating
issues.

Related to #69414

Release justification: Non-production bug fix
Release note: None

69944: stats: add a histogram version number r=rytaft a=rytaft

This commit adds a histogram version number to the `HistogramData`
proto. This will allow us to identify what logic was used to construct
a particular histogram and possibly debug future issues.

Release note: None

Release justification: Low risk, high benefit change to existing
functionality.

69963: sql: skip reset sql stats in TestRandomSyntaxFunctions r=maryliag,rafiss a=Azhng

Previously, crdb_internal.reset_sql_stats() causes timeout
in TestRandomSyntaxFunctions. This is very unlikely due to
implementation of the function, and it is likely caused
by contentions.
This commit skip the tests for crdb_internal.reset_sql_stats()
to prevent nightly failures.

Related #69731

Release justification: Non-production code changes

Release note: None

69967: vendor: bump Pebble to 6c12d67b83e6 r=jbowens a=jbowens

```
6c12d67 internal/metamorphic: randomize FormatMajorVersion
e82fb10 db: randomize format major version in unit tests
535b8d6 db: add FormatUpgrade event to EventListener
53dda0f db: introduce format major version
8ec1a49 vfs/atomicfs: add ReadMarker
daf93f0 sstable: Free cache value when checksum type is corrupt
d89613d metamorphic: randomly use disk for tests
e3b6bec metamorphic: transform percentage of SINGLEDEL ops to DELETE ops
41239f8 db: add test demonstrating current SINGLEDEL behavior
```

Release note: none

Release justification: non-production code changes, and bug fix
necessary for a release blocker.

69974: backupccl: set sqlstats testing knobs for scheduled job test r=maryliag,miretskiy a=Azhng

Previsouly, backupccl tests did not set sql stats AOST testing knob
to override the AOST behavior. This causes sql stats error
stack trace to show up in backupccl tests.
This commit added sql stats testing knobs for backupccl test
helpers to mitigate this.

Release justification: Non-production code changes

Release note: None

Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
@cockroach-teamcity
Copy link
Member Author

sql/tests.TestRandomSyntaxFunctions failed with artifacts on master @ 62c5be51bbb3beeb51638032871365585d69f75d:

Random syntax error:

    rsg_test.go:776: Crash detected: server panic: statement exec timeout

Query:

        SELECT crdb_internal.reset_sql_stats() ;
Reproduce

To reproduce, try:

make stressrace TESTS=TestRandomSyntaxFunctions PKG=./pkg/sql/tests TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

Same failure on other branches

Internal log

mjibson marked as alumn{us/a}; resolving to rafiss instead

/cc @cockroachdb/sql-experience rafiss

This test on roachdash | Improve this report!

@craig craig bot closed this as completed in #69963 Sep 10, 2021
rafiss pushed a commit to rafiss/cockroach that referenced this issue Sep 10, 2021
Previously, crdb_internal.reset_sql_stats() causes timeout
in TestRandomSyntaxFunctions. This is very unlikely due to
implementation of the function, and it is likely caused
by contentions.
This commit skip the tests for crdb_internal.reset_sql_stats()
to prevent nightly failures.

Related cockroachdb#69731

Release justification: Non-production code changes

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants