Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: error while running du #53663

Closed
knz opened this issue Aug 31, 2020 · 4 comments · Fixed by #61650
Closed

roachtest: error while running du #53663

knz opened this issue Aug 31, 2020 · 4 comments · Fixed by #61650
Assignees
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).

Comments

@knz
Copy link
Contributor

knz commented Aug 31, 2020

I see most roachtest report the following error in their logs:

19:04:28 cluster.go:382: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod ssh teamcity-2231078-1598711360-73-n9cpu1 -- /bin/bash -c 'du -c /mnt/data1 > diskusage.txt'
teamcity-2231078-1598711360-73-n9cpu1: /bin/bash -c 'du -c /mnt/da...
   1: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   2: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   3: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   4: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   5: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   6: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   7: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   8: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
   9: du: cannot read directory '/mnt/data1/lost+found': Permission denied
COMMAND_PROBLEM: exit status 1
Error: COMMAND_PROBLEM: exit status 1
(1) COMMAND_PROBLEM
Wraps: (2) Node 1. Command with error:
  | ```
  | /bin/bash -c 'du -c /mnt/data1 > diskusage.txt'
  | ```
Wraps: (3) exit status 1
Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

@jlinder can you have a look?

@blathers-crl
Copy link

blathers-crl bot commented Aug 31, 2020

Hi @knz, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@knz knz added A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). labels Aug 31, 2020
@kenliu
Copy link

kenliu commented Mar 5, 2021

Moved this back to Triage. Could whoever is on dev-inf support next take a look at this and determine if it was a transient issue? If so please close this out.

@rickystewart rickystewart self-assigned this Mar 8, 2021
@rickystewart
Copy link
Collaborator

Can you provide at least one link to a roachtest demonstrating the failure? I can't find one, which of course is interfering with my ability to debug :)

@rickystewart
Copy link
Collaborator

Poked around on a roachtest machine and found the following:

ricky@teamcity-2755481-1615231777-01-n4cpu16-0001:~$ ls -lah /mnt/data1
total 164K
drwxrwxrwx 5 root   root   4.0K Mar  8 19:31 .
drwxr-xr-x 3 root   root   4.0K Mar  8 19:30 ..
drwxrwxr-x 4 ubuntu ubuntu 132K Mar  8 19:46 cockroach
drwxrwxrwx 2 root   root   4.0K Mar  8 19:30 cores
drwx------ 2 root   root    16K Mar  8 19:30 lost+found
-rw-r--r-- 1 root   root      0 Mar  8 19:30 .roachprod-initialized
ricky@teamcity-2755481-1615231777-01-n4cpu16-0001:~$ ls -lah /mnt/data1/lost+found/
ls: cannot open directory '/mnt/data1/lost+found/': Permission denied

So that's pretty unambiguous. Presumably we just need to --exclude lost+found when we call du.

craig bot pushed a commit that referenced this issue Mar 15, 2021
61406: sql: validate against array type usages when dropping enum values r=ajwerner a=arulajmani

Previously, when dropping an enum value, we weren't validating if the
enum value was used in a column of array type. This patch fixes the bug.

Fixes #60004

Release justification: bug fix to new functionality
Release note: None

61650: roachtest: exclude `lost+found` directory r=knz a=rickystewart

This directory is created by the filesystem and unowned file chunks are
put there by `fsck`. The directory and its contents aren't readable by
anyone except `root`, so this can cause the `du -c /mnt/data1` that
`roachtest` performs to fail -- add an `--exclude` to handle this.

We already ignore this directory in other contexts (for example, see
`pkg/storage/mvcc.go`).

Fixes #53663.

Release justification: Non-production code change
Release note: None

61788: importccl: unskip userfile benchmark r=pbardea a=adityamaru

I've run this ~20 times and it averages ~13s to run. I suspect the fixes to linked issues mentioned in #59126 might have mitigated this. Going to unskip due to lack of reproducibility.

Fixes: #59126

Release note: None

61828: contention: store contention events on non-SQL keys r=yuzefovich a=yuzefovich

Previously, whenever we tried to add a contention event on a non-SQL
key, it would encounter an error during decoding tableID/indexID pair,
and the event was dropped. This commit extends the contention registry
to additionally store information about contention on non-SQL keys. That
information is stored in two levels:
- on the top level, all `SingleNonSQLKeyContention` objects are ordered
  by their keys
- on the bottom level, all `SingleTxnContention` objects are ordered by the
  number of times that transaction was observed to contend with other
  transactions.

`SingleTxnContention` protobuf message is moved out of
`SingleKeyContention` and is reused for non-SQL keys. This commit also
updates the status server API response. I assume that no changes are
needed with regards to backwards compatibility since the original
version was merged just a few weeks ago, and we haven't had a beta
released since then.

Fixes: #60669.

Release note (sql change): CockroachDB now also stores the information
about contention on non-SQL keys.

61862: cliccl: add `load show backups` to display backup collection r=pbardea a=Elliebababa

Previously, users can list backups created by `BACKUP INTO`
with `SHOW BACKUP IN`in a sql session. But this listing task
can be also done offline without a running cluster.

This PR updates `load show` with `backups` subcommand, 
which allows users to list backups in a backup collection 
created by `BACKUP INTO`. 
With the same purpose as other `load show` subcommands, 
this update allows users to list backups without running 
`SHOW BACKUP IN` in a sql session.

see #61131 #61829 to checkout other `load show` subcommand.

Release note (cli change): Add `load show backups` to
display backup collection. Previously, users can list backups
created by `BACKUP INTO` via `SHOW BACKUP IN`in a sql
session. But this listing task can be also done offline without a
running cluster. Now, users are able to list backups in a collection
with `cockroach load show backups <collection_url>.

61877: bench/ddl_analysis: fix test for real r=ajwerner a=ajwerner

See individual commits. Last is the critical one. 

Fixes #61856.

61937: colexec: make vectorized stats concurrency safe r=yuzefovich a=yuzefovich

**colflow: clean up vectorized stats for rowexec processors**

Previously, the wrapped row-execution KV reading processors were
implementing `execinfra.KVReader` interface, but they were never used as
such, only the ColBatchScans would get used to retrieve the KV stats.
This is the case because the row-execution processors report their
execution stats themselves, and we don't want to duplicate that info.
This commit moves `KVReader` interface into `colexecop` package and now
only the ColBatchScans implement it. This allowed for some cleanup
around the vectorized stats code, but the main reason for performing
this change is that the contract of the interface will be modified by
the follow-up commit to mention the safety under concurrent usage, and
I didn't want to change the row-execution processors for that since the
relevant methods never get called anyway.

Additionally, this commit begins emitting of rows-read and bytes-read by
the zigzagJoiners and invertedJoiners to complete the metrics picture.

Release note: None

**colexec: make vectorized stats concurrency safe**

Previously, the collection of vectorized stats was not synchronized with
the operators themselves. Namely, it was possible to call methods like
`GetBytesRead` on the ColBatchScans and Inboxes from a different
goroutine (the root materializer or the outbox) than from the main
goroutine of the operator. This is now fixed by putting mutexes in place
and updating `colexecop.KVReader` interface to require concurrency-safe
implementations.

Fixes: #61899.

Release note: None

Co-authored-by: arulajmani <arulajmani@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Aditya Maru <adityamaru@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: elliebababa <ellie24.huang@gmail.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
@craig craig bot closed this as completed in eebeeb1 Mar 15, 2021
tbg pushed a commit to tbg/cockroach that referenced this issue Jun 24, 2021
This directory is created by the filesystem and unowned file chunks are
put there by `fsck`. The directory and its contents aren't readable by
anyone except `root`, so this can cause the `du -c /mnt/data1` that
`roachtest` performs to fail -- add an `--exclude` to handle this.

We already ignore this directory in other contexts (for example, see
`pkg/storage/mvcc.go`).

Fixes cockroachdb#53663.

Release justification: Non-production code change
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testing Testing tools and infrastructure C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants