Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipath-tools 0.10.1 #17

Draft
wants to merge 25 commits into
base: master
Choose a base branch
from
Draft

multipath-tools 0.10.1 #17

wants to merge 25 commits into from

Conversation

mwilck
Copy link
Collaborator

@mwilck mwilck commented Nov 14, 2024

multipath-tools 0.10.1, 2024/11

This is the first bug fix release on the stable-0.10.y branch. It contains
bug fixes from 0.11.0, and some CI-related fixes.

Bug fixes

  • Fixed the problem that multipathd wouldn't start on systems with certain types
    of device mapper devices, in particular devices with multiple DM targets.
    The problem was introduced in 0.10.0.
    Fixes #102.
  • Fixed a corner case in the udev rules which could cause a device not to be
    activated during boot if a cold plug uevent is processed for a previously
    not configured multipath map while this map was suspended. This problem existed
    since 0.9.8.
  • Fixed the problem that devices with no_path_retry fail and no setting
    for dev_loss_tmo might get the dev_loss_tmo set to 0, causing the
    device to be deleted immediately in the event of a transport disruption.
    This bug was introduced in 0.9.6.
  • Fixed the problem that, if there were multiple maps with deferred failback
    (failback value > 0 in multipath.conf), some maps might fail back later
    than configured. The problem existed since 0.9.6.
  • Removed a warning message that multipathd would print if systemd's
    WATCHDOG_USEC environment variable had the value "0", which means that the
    watchdog is simply disabled. This (minor) problem existed since 0.4.9.
  • Fixed a memory leak in the nvme foreign library. The bug existed since
    0.7.8.
  • Fixed a problem in the marginal path detection algorithm that could cause
    the io error check for a recently failed path to be delayed. This bug
    existed since 0.7.4.

mwilck and others added 18 commits November 14, 2024 17:42
Signed-off-by: Martin Wilck <mwilck@suse.com>
We can't use "container:" any more because upload-artifact@v4 doesn't
work on Debian Jessie.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Running "make" in a container and "make clean" outside doesn't
work (access right issues). So just use separate jobs.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Avoid "text file busy" error on GitHub.

dmevents-test: Text file busy
Makefile:74: recipe for target 'dmevents.out' failed

Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
The ABI should never change on a stable branch. This workflow
asserts that.

Signed-off-by: Martin Wilck <mwilck@suse.com>
dm_get_maps() traverses the entire list of dm maps. We shouldn't
give up just because probing a single map failed.

Fixes: bf3a4ad ("libmultipath: simplify dm_get_maps()")
Signed-off-by: Martin Wilck <mwilck@suse.com>
We import DM_COLDPLUG_SUSPENDED in all code flows below mpath_coldplug_end.
Clarify this in the code.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Since b22c273 ("11-dm-mpath.rules: Don't force activation while device is
suspended"), we've handled the case where a device is suspended while
an uevent is processed (e.g. because multipathd is reloading the
map again at the same time). But we were missing the case where
The device had never been initialized before. If .MPATH_DEVICE_READY_OLD
was empty, we'd jump to scan_import without setting MPATH_DEVICE_READY
to 0. This can cause a device not to be fully activated at boot time,
because in follow-up uevents we'd assume that the device had already
been set up.

Treat the case in which an uevent is processed for a previously not
fully set-up, suspended device like other situations where we set
MPATH_DEVICE_READY to 0.

Fixes: b22c273 ("11-dm-mpath.rules: Don't force activation while device is
suspended")
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Our code is always setting MPATH_UNCHANGED and DM_ACTIVATION in
pairs. While DM_ACTIVATION is a global DM property, MPATH_UNCHANGED
is owned by us. Just set MPATH_UNCHANGED, and adapt DM_ACTIVATION
when necessary just in one place.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
We set .DM_NOSCAN above where we set DM_UDEV_DISABLE_OTHER_RULES_FLAG, too. If
the latter isn't set, .DM_NOSCAN can't be set. Skip the redundant test.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
When multipath reloads a device or fails or restores a path, the udev
rules disable LVM scanning, but since .DM_NOSCAN isn't set, blkid is
still run on the device. When multipath devices that are set to
queue_if_no_path lose all their paths at close to the same time, udev
workers can hang trying to run blkid. The blkid results shouldn't
change when multipathd is adding, removing, failing or reinstating
paths, aside from avoiding hanging udev processes, we're skipping
unnecessary work.

Hence, set .DM_NOSCAN if MPATH_UNCHANGED is set, to avoid blkid from
being called in 13-dm.rules.

Suggested-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
If pp->dev_loss is DEV_LOSS_TMO_UNSET and min_dev_loss is 0 (which is
the case if no_path_retry is NO_PATH_RETRY_FAIL or NO_PATH_RETRY_UNDEF),
we will set pp->dev_loss to 0, which is wrong. Fix it.

Fixes: 058b5f5 ("libmultipath: fix dev_loss_tmo even if not set in configuration")
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
If reload_and_sync_map() removes the multipath device,
deferred_failback_tick() needs to decrement the counter so that it
doesn't skip the following device.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reported by coverity: "i--" may cause an underflow, which will again
cause an overflow when the loop continues. Use a signed int for
loops like this to make coverity happy.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Avoid the following error with clang 19:

msort.c:268:27: error: cast from '__compar_fn_t' (aka 'int (*)(const void *, const void *)') to '__compar_d_fn_t' (aka 'int (*)(const void *, const void *, void *)') converts to incompatible function type [-Werror,-Wcast-function-type-mismatch]
  268 |         return msort_r (b, n, s, (__compar_d_fn_t)cmp, NULL);
      |                                  ^~~~~~~~~~~~~~~~~~~~

Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
@mwilck
Copy link
Collaborator Author

mwilck commented Nov 14, 2024

Just a demo for @bmarzins and @cvaroqui.

In reality, this would be a PR against a stable-0.10.y branch on opensvc/multipath-tools. The failing "check-abi" workflow wouldn't be run in that case.

@bmarzins
Copy link

Looks good to me.

@bmarzins
Copy link

Actually, I do have some thoughts. First, I'm not sure we need to update News.md for the stable branch releases. You're certainly welcome to, and I can see how it would save distro maintainers some time if they wanted an overview of the changes. But I think simply bumping the version number for each pull request should be all that's really necessary, if you want to skip some extra work.

Also, looking at this release makes it obvious that it's not always going to be easy to come up with a good "Fixes" commit trailer for some of the fixes. Without these, it won't be as easy as I hoped for people to be able to check if fix is applicable to a previous stable branch. Although I expect that if a fix is only applicable in the latest stable branch, then it should be pretty easy to come up
with a commit to use for a "Fixes:" trailer.

@mwilck
Copy link
Collaborator Author

mwilck commented Nov 19, 2024

Updating NEWS.md is a minor effort, because I just copy/paste the relevant parts from the major release. The fact that I did it here doesn't mean I am committing myself to do it always in the future, but I think it wouldn't cost me much time.

About the "Fixes" trailers, I am not sure I understand your remark. Did you expect me to add "Fixes:" trailers to the bug fix commits that I'd identified? I opted to cherry-pick the 0.11.0 commits unchanged. IMO, if I add "Fixes:" here, I should do it in 0.11.0 as well (meaning that I need to rebase my "queue" branch, but I guess I'll have to do that anyway when I include the currently pending fixes).

In the future, we should agree to add "Fixes:" trailers to bug fix patches sent to dm-devel in the first place, but we haven't done so in the past (at least these trailers haven't been mandatory, we've been adding them rather sporadically).

Or were you referring to the other commits (like the github workflow commits) that don't fix actual bugs at all?

@bmarzins
Copy link

bmarzins commented Nov 19, 2024

About the "Fixes" trailes, I am not sure I understand your remark. Did you expect me to add "Fixes:" trailers to the bug fix commits that I'd identified? I opted to cherry-pick the 0.11.0 commits unchanged. IMO, if I add "Fixes:" here, I should do it in 0.11.0 as well (meaning that I need to rebase my "queue" branch, but I guess I'll have to do that anyway when I include the currently pending fixes).

No. I was expecting just what you did. We shouldn't be unnecessarily mucking with commit messages for the stable branch. I was just noticing that a number of the actual bug fixes didn't include "Fixes" trailers, and for some of them it's not even obvious what you would put there. They are (or at least could be) fixing issues that were introduced in multiple commits or issues that have existed long enough that the code has been refactored multiple times since the bug was introduced. The code may have originally been correct, and then you would need to track down the code change that caused it to stop being the correct thing to do.

I was hoping that downstream users who wanted to stick with a stable branch that we are no longer maintaining could just look through all of subsequent stable branches and pull out the fixes applicable to their branch. The easiest way to do that would be to look at the Fixes trailers, and see if their branch includes the referenced commit.

But like I said, it's not always obvious what you would put there. I agree that we should probably make a better effort to label bugfixes with a Fixes trailer when they are initially committed, but it can probably be a best effort kind of thing. If the bug has existed for years and the code has been tweaked and refactored multiple times since the initial bug was introduced, I don't think we should require git spelunking to find the initial breaking commit. But in that case, the bug has been around for years so finding the initial commit isn't that important. I suppose we could helpfully add something
like, for instance:

Fixes: pre-0.6.4

To indicate that the bug was been there since at least 0.6.4. For commits that fix bugs introduced in multiple commits, we can probably just get by with a best effort to find the oldest breaking commit and mention that there are others.
For example, with your commit c62254b ("11-dm-mpath.rules.in: clarify DM_ACTIVATION logic") we could probably just go with something like:

Fixes: 20cd0df96 ("multipath/kpartx rules: avoid superfluous scanning") and others

Or were you refering to the other commits (like the github workflow commits) that don't fix actual bugs at all?

Nope. Things like those don't need any trailers.

@mwilck
Copy link
Collaborator Author

mwilck commented Nov 21, 2024

@bmarzins discussion about future improvements and fixes tags aside, am I assuming correctly that you would ack this PR if I made it on opensvc?

I was hoping that downstream users who wanted to stick with a stable branch that we are no longer maintaining could just look through all of subsequent stable branches and pull out the fixes applicable to their branch.

Note that since 0.10.0, I have tried to point out in NEWS.md in which previous version a given bug had been introduced. I did not put the commit IDs there though because NEWS.md should also be readable by non-technical users.

I haven't spent this effort for earlier releases, but still the "Bug fixes" section in NEWS.md should help downstream users.

@bmarzins
Copy link

@bmarzins discussion about future improvements and fixes tags aside, am I assuming correctly that you would ack this PR if I made it on opensvc?

Yeah. My thoughts were just thoughts, not objections to the PR.

@mwilck
Copy link
Collaborator Author

mwilck commented Nov 21, 2024

I've pushed the bugfix part of my systemd watchdog patch. I hope this is fine for you. The 0.11.0 solution is not suitable for the stable branch IMO.

WATCHDOG_USEC may be set to 0, which means that the watchdog
is disabled in systemd.

Fixes: 9366cfb ("multipathd: Implement systemd watchdog integration")
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
On systems with LVM volumes, "multipath -ll" will spit out lots of messages
like

    libmp_mapinfo: map vg-lv0 has multiple targets

which is irritating. Reduce the log level of these messages to 3, as they
are harmless most of the time.

This is a backport of e8949c2 ("libmultipath: reduce log level of
libmp_mapinfo() messages") from the master branch. We can't apply exactly
the same fix because the stable branch is missing 8c772d3 ("libmultipath:
check DM UUID earlier in libmp_mapinfo__").

Signed-off-by: Martin Wilck <mwilck@suse.com>
@mwilck mwilck force-pushed the stable-0.10.y branch 2 times, most recently from de4efb7 to b2642d2 Compare November 25, 2024 17:07
bmarzins and others added 3 commits January 17, 2025 22:36
_find_controllers() needs to free the udev device if it doesn't get
added to a path. Otherwise it can leak memory whenever check_foreign()
is called, causing multipathd's memory usage to continually grow.

Fixes: 7b47762 ("libmultipath: nvme: fix path detection for kernel 4.16")
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
(cherry picked from commit 5a3d334)
In function io_err_stat_handle_pathfail(), path->io_err_dis_reinstate_time
is set to 0 to enqueue path to io error check as soon as possible. But
multipathd can not do it within marginal_path_err_recheck_gap_time seconds
after power-on, because curr_time is less than
marginal_path_err_recheck_gap_time.

To handle the early marginal path, we can enqueue path when
io_err_dis_reinstate_time is 0.

Signed-off-by: chenrenhui <chenrenhui1@huawei.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>

>

(cherry picked from commit a1e3cf2)
NEWS.md Fixed Show fixed Hide fixed

This comment has been minimized.

This comment has been minimized.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants