Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Backend was closed" error during backup image transfer #23

Closed
barpavel opened this issue Jan 17, 2022 · 1 comment · Fixed by #22
Closed

"Backend was closed" error during backup image transfer #23

barpavel opened this issue Jan 17, 2022 · 1 comment · Fixed by #22
Assignees
Labels
bug Issue is a bug or fix for a bug

Comments

@barpavel
Copy link
Member

barpavel commented Jan 17, 2022

When executing image download as part of the backup (happens both during a full backup and during a separate ./backup_vm.py ... download ... command) receiving the following error:

pbar@engine1 ~/dev/git/python-ovirt-engine-sdk4/examples (main) $ ./backup_vm.py -c engine1 download --backup-dir /tmp/pavel --backup-uuid 8b7a17db-dcee-45eb-b5e6-0dc22c4e5bc3 9ecd1ff4-e045-4c11-91f7-cc6b3551b527
[   0.0 ] Downloading VM '9ecd1ff4-e045-4c11-91f7-cc6b3551b527' disks
[   0.4 ] Downloading full backup for disk 'f626c162-7ab6-4acf-9c3c-69eebab8b990'
[   1.6 ] Image transfer '891b75ac-f8d7-4850-815f-30371546b44c' is ready
[ 100.00% ] 1.00 GiB, 0.17 seconds, 5.79 GiB/s                                 
[   1.8 ] Finalizing image transfer
[   4.8 ] Download completed successfully
Traceback (most recent call last):
  File "./backup_vm.py", line 521, in <module>
    main()
  File "./backup_vm.py", line 144, in main
    args.command(args)
  File "./backup_vm.py", line 218, in cmd_download
    connection, args.backup_uuid, args, incremental=args.incremental)
  File "./backup_vm.py", line 403, in download_backup
    incremental=has_incremental)
  File "./backup_vm.py", line 448, in download_disk
    **extra_args)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/client/_api.py", line 186, in download
    name="download")
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 69, in copy
    log.debug("Executor failed")
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 193, in __exit__
    self.stop()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 170, in stop
    raise self._errors[0]
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 235, in _run
    handler = self._handler_factory()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 264, in __init__
    self._dst = dst_factory()
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 77, in clone
    self._client.address,
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", line 228, in __getattr__
    raise IOError("Backend was closed")
OSError: Backend was closed

Scenario: cold backup, either:

  1. Full backup (start + download + stop).
  2. Separate steps - start, then download.

The flow actually works, even the output image is created:
-rw-r--r--. 1 pbar pbar 196624 Jan 17 10:53 f626c162-7ab6-4acf-9c3c-69eebab8b990.202201171053.full.qcow2
It works fine for VMs containing 1 disk, but in case there are several disks, only the 1st image is copied. The exception breaks the loop that should have continued for the remaining disks.
Logs are attached.
After some Google Chat at the RHV Storage room here are a few conclusions from Vojtech & Nir:

Vojtech Juranek
It looks like calling a client which was already closed, so not an issue in your env. and I don't think this is a known issue. Is it reproducible?
Nir Soffer
No need for daemon logs, this is an issue on the client side. For some reason we try to clone the nbd client at the end of the transfer after it was closed. Cloning the connection is done when starting a transfer, not sure why it happens at the end.
Nir Soffer
This is likey result of https://github.com/oVirt/ovirt-imageio/commit/4df9f1c9feb9405ba7c389f28b65180db057f2da
it reveals an issue in the ovirt_imageio._internal.io module, previously it was hidden
Vojtech Juranek
what is the size of downloaded image? Is it something very small?
Pavel Bar
yes, probably very small or even empty, actually I just added new disks (that support incremental backup), attached them to VM and run cold backup flows on the VM
Nir Soffer
based on the download rate (5.79 GiB/s) the disk is empty
Vojtech Juranek
it looks like a concurrency issue, that one worker finished the downloading before another worker was created
or right now don't see any other way, how dst handler can be created after download was finished and clint closed
Nir Soffer
right, this is a known issue
Nir Soffer
when the the download is very small (e.g. nothing if image is completely empty), the dst connection used for checking the source may be closed before we clone it.
The issue is known but was harmless before, please a file a bug

engine.log
engine-ovirt-imageio.log
example.log
vdsm.log
vdsm-ovirt-imageio.log

@barpavel barpavel changed the title "Backend was closed" during backup image trnsfer phase. "Backend was closed" error during backup image transfer Jan 17, 2022
nirs added a commit to nirs/ovirt-imageio that referenced this issue Jan 17, 2022
When using a thread pool we try to reuse the connection backend by
passing the connection backend to the first worker instead of cloning
it. This introduce some issues:

- The backend is closed twice: once by the worker, and again by the
  connection. This is safe since closing backend twice does nothing, but
  dirty. Will break if we try to use the backend after a copy.

- With very small images, the first worker may close the original
  backend before another worker try to clone it. This is unlikely to
  happen with real images but can break the tests.

Fix the issues by introducing a backend wrapper, overriding close() to
disconnect from the wrapped backend instead of closing it. This ensures
that dst is never closed before it is cloned, and that it is closed
exactly once.

Another approach to fix this is to close the initial backend used for
preparing a copy, pass the backends open argument to io.copy(), and open
new backends in the workers without all the hacks. This requires lot of
work, so I'm staring with this minimal change.

Fixes oVirt#23

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
@nirs nirs self-assigned this Jan 17, 2022
@nirs nirs added the bug Issue is a bug or fix for a bug label Jan 17, 2022
@nirs
Copy link
Member

nirs commented Jan 17, 2022

This is an old issue that was hidden by the nbd backend that
could be closed silently multiple times without any side affect.

In 4df9f1c we change the nbd
backend to behave like the file backend - you can close it multiple
times, but accessing the backend after closing rasie IOError.

vjuranek pushed a commit that referenced this issue Jan 18, 2022
When using a thread pool we try to reuse the connection backend by
passing the connection backend to the first worker instead of cloning
it. This introduce some issues:

- The backend is closed twice: once by the worker, and again by the
  connection. This is safe since closing backend twice does nothing, but
  dirty. Will break if we try to use the backend after a copy.

- With very small images, the first worker may close the original
  backend before another worker try to clone it. This is unlikely to
  happen with real images but can break the tests.

Fix the issues by introducing a backend wrapper, overriding close() to
disconnect from the wrapped backend instead of closing it. This ensures
that dst is never closed before it is cloned, and that it is closed
exactly once.

Another approach to fix this is to close the initial backend used for
preparing a copy, pass the backends open argument to io.copy(), and open
new backends in the workers without all the hacks. This requires lot of
work, so I'm staring with this minimal change.

Fixes #23

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is a bug or fix for a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants