Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTS: Test case failures #9769

Merged
merged 1 commit into from
Dec 26, 2019
Merged

ZTS: Test case failures #9769

merged 1 commit into from
Dec 26, 2019

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Dec 24, 2019

Motivation and Context

Resolve several ZTS failures now being regularly observed by the CI
after changing the test instance type.

Description

  • devices/devices_001_pos, devices/devices_002_neg - When only NVMe
    block devices exist no valid major/minor are found. Update the
    create_dev_file_linux function to handle this case.

  • large_dnode_008_pos - Force a pool sync before invoking zdb to
    ensure the updated dnode blocks have been persistented to disk.

  • refreserv_raidz - Wait for the /dev/zvol links to be both created
    and removed, this is important because the same device volume
    names are being used repeatedly.

  • btree_test - Add missing .gitignore file for btree_test binary.

How Has This Been Tested?

Locally verified using the ZTS. Pending full CI results to confirm
these changes do resolve the issues in the CI environment.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added Component: Test Suite Indicates an issue with the test framework or a test case Status: Code Review Needed Ready for review and testing labels Dec 24, 2019
@BrainSlayer
Copy link
Contributor

since the testcase failures where introduced by 9fb2771 and are currently still failing with this PR. dont you think that changing testcases are just hiding a problem? according to the logs the cause of the errors are a segfault in libzpool

@codecov
Copy link

codecov bot commented Dec 24, 2019

Codecov Report

Merging #9769 into master will increase coverage by <1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #9769    +/-   ##
========================================
+ Coverage      80%      80%   +<1%     
========================================
  Files         385      385            
  Lines      121470   121470            
========================================
+ Hits        96756    96874   +118     
+ Misses      24714    24596   -118
Flag Coverage Δ
#kernel 80% <ø> (ø) ⬆️
#user 67% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c587b2c...6346659. Read the comment docs.

@behlendorf
Copy link
Contributor Author

These ZTS failures don't coincide with a regression in the upstream code, but instead with an upgrade to the CI test environment which was made yesterday. Specifically the ec2 instance types were changed to types using newer generation cpus and NVMe based devices. This resulted in some existing tests consistently failing and some occasional failures becoming very likely. I've reverted the NVMe portion of that upgrade for now, and if that's not sufficient I'll revert back to the previous generation cpus until we update the failing tests.

since the testcase failures where introduced by 9fb2771... according to the logs the cause of the errors are a segfault in libzpool

Can you point me at the relevant logs. Looking at the recent testing results on the master branch I have't seen any evidence of this.

@BrainSlayer
Copy link
Contributor

in the zstd pull request you will find it
here is one sample http://build.zfsonlinux.org/builders/Debian%2010%20x86_64%20%28TEST%29/builds/1266/steps/shell_9/logs/summary

but you can also find this segfault on the test round of the most recent commit on the zfs master
like here
http://build.zfsonlinux.org/builders/Fedora%2030%20x86_64%20%28TEST%29/builds/2985/steps/shell_9/logs/summary

* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persistented to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@PrivatePuffin
Copy link
Contributor

@BrainSlayer I dont get it...
I dont see the error you are talking about in the log you gave us.

@BrainSlayer
Copy link
Contributor

BrainSlayer commented Dec 25, 2019

@Ornias1993 grep for segfault

22:54:37.67 [ 2139.712371] loop: module loaded
22:54:37.67 [ 2490.496004] btree_test[23007]: segfault at 0 ip 00007fedd9942ac6 sp 00007fff4dc452d0 error 4 in libzpool.so.2.0.0[7fedd990f000+210000]
22:54:37.67 [ 2490.508160] Code: 89 d3 48 83 ec 58 64 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8b 47 28 48 c7 45 a0 00 00 00 00 0f 29 45 90 48 85 c0 74 61 <48> 39 02 74 5c e8 d0 f4 ff ff 48 8d 5d 90 4c 89

@PrivatePuffin
Copy link
Contributor

PrivatePuffin commented Dec 25, 2019

@BrainSlayer
Those are all issues that have been known for ages. Thats why they are listed as "known issues"

cache/cache_010_neg (run as root) [00:00] [FAIL]

Tests with results other than PASS that are expected:
    FAIL cache/cache_010_neg (Known issue)

These segfaults are not what @behlendorf is fixing here, this just fixes the new test failures that are related to the buildbot failures.

Copy link
Contributor

@PrivatePuffin PrivatePuffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is clean and results are green.
LGTM

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 26, 2019
@behlendorf
Copy link
Contributor Author

22:54:37.67 [ 2490.496004] btree_test[23007]: segfault at 0 ip 00007fedd9942ac6 sp 00007fff4dc452d0 error 4 in libzpool.so.2.0.0[7fedd990f000+210000]

This was actually intentionally added recently by commit 523fc80, but I can definitely see how the segfault is misleading. The btree_test includes some negative test cases which intentionally trigger some ASSERTs in user space to verify several btree invariants are working as intended.

@behlendorf behlendorf merged commit 80bde2c into openzfs:master Dec 26, 2019
@BrainSlayer
Copy link
Contributor

i understand. yes this was indeed missleading. i was looking for the cause of the segfault but was not able to find anything

allanjude pushed a commit to allanjude/zfs that referenced this pull request Dec 28, 2019
* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persisted to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9769
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 2, 2020
* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persisted to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9769
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Jan 7, 2020
* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persisted to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9769
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
* large_dnode_008_pos - Force a pool sync before invoking zdb to
  ensure the updated dnode blocks have been persisted to disk.

* refreserv_raidz - Wait for the /dev/zvol links to be both created
  and removed, this is important because the same device volume
  names are being used repeatedly.

* btree_test - Add missing .gitignore file for btree_test binary.

Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9769
@behlendorf behlendorf deleted the zts-fixes-2 branch April 19, 2021 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants