Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix error handling in receive_writer_thread() #10320

Merged
merged 1 commit into from
May 15, 2020

Conversation

ahrens
Copy link
Member

@ahrens ahrens commented May 12, 2020

Motivation and Context

If receive_writer_thread() gets an error from receive_process_record(),
it should be saved in rwa->err so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly. However, if
there are more records to dequeue, the next time through the loop we
will reset rwa->err to zero, allowing us to try to process the
following record (2 after the failed record). Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

Description

The fix is to only set rwa->err if we got a non-zero error.

This bug was introduced by #10099 "Improve zfs receive performance by
batching writes".

How Has This Been Tested?

Do a receive that should fail due to an incorrectly-constructed send stream.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

@ahrens ahrens requested a review from pcd1193182 May 12, 2020 17:58
@ahrens ahrens added Status: Code Review Needed Ready for review and testing Type: Defect Incorrect behavior (e.g. crash, hang) labels May 12, 2020
module/zfs/dmu_recv.c Show resolved Hide resolved
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by openzfs#10099 "Improve zfs receive performance by
batching writes".

Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 15, 2020
@behlendorf behlendorf merged commit 1b9cd1a into openzfs:master May 15, 2020
ahrens added a commit to ahrens/zfs that referenced this pull request May 15, 2020
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by openzfs#10099 "Improve zfs receive performance by
batching writes".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10320
as-com pushed a commit to as-com/zfs that referenced this pull request Jun 20, 2020
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by openzfs#10099 "Improve zfs receive performance by
batching writes".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10320 
(cherry picked from commit 1b9cd1a)
as-com pushed a commit to as-com/zfs that referenced this pull request Jun 20, 2020
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by openzfs#10099 "Improve zfs receive performance by
batching writes".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10320 
(cherry picked from commit 1b9cd1a)
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
If `receive_writer_thread()` gets an error from `receive_process_record()`,
it should be saved in `rwa->err` so that we will stop processing records,
and the main thread will notice that the receive has failed.

When an error is first encountered, this happens correctly.  However, if
there are more records to dequeue, the next time through the loop we
will reset `rwa->err` to zero, allowing us to try to process the
following record (2 after the failed record).  Depending on what types
of records remain, we may incorrectly complete the receive
"successfully", but without actually having processed all the records.

The fix is to only set `rwa->err` if we got a *non-zero* error.

This bug was introduced by openzfs#10099 "Improve zfs receive performance by
batching writes".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#10320
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants