Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd:check lock ownership in tcmu_rbd_get_lock_tag #452

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tangwenji
Copy link
Contributor

The tcmu service restarts abnormally when the lock is obtained, the initiator sends an rtpg command to get the alua status is AO, but since the current client is not the ownership of the lock, the IO cannot be executed.
The ownership of the lock is the client before the tcmu service restarts, and it is also stored in the rbd metadata.

Signed-off-by: tangwenji tang.wenji@zte.com.cn

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
@mikechristie
Copy link
Collaborator

Hey,

Good to have you back working on runner!

I am not sure I understood the bug.

Back story on why we have that code:

The reason for that callout is ESX can send a RTPG to any path to the device. It then expects that the data returned for all ports is current and the same through all paths. In newer version of SPC (4r37) there is a note about this, but ESX already does what it does.

Windows has a similar issue where it might send a RTPG through different paths and if they do not match you end up with a inconsistent multiapth device state.

Linux will always send a RTPG through each path and only uses the info for the local port so it does not have a similar issues.

So the problem with the patch is:

  1. If the tcmu_rbd_has_lock were to fail due to the network to the ceph cluster being down we could still be holding the lock, but we incorrectly report standby from one path but if a RTPG is sent though other paths running on other nodes that can connect to the cluster we report active. The initiator may then not know to send a STPG because it is using the other paths RTPG info and that is reporting active for this path.

  2. I did not get the part about "he current client is not the ownership of the lock, the IO cannot be executed."

Do you mean we had the lock, it got taken and now we are blacklisted so rbd/ceph request will fail? If so, tcmu_rbd_get_lock_tag should have got a -ESHUTDOWN err from one of the rbd/ceph calls indicating it is blacklisted. It then should have returned TCMU_STS_FENCED. tcmur_device_get_lock_tag should then reopen the device to clear the blacklisting and we should retry the get_lock_tag_id operation and it should work.

Is that not working?

The ownership of the lock is the client before the tcmu service restarts, and it is also stored in the rbd metadata.

Ok, so I might be misunderstanding the problem. The above comment sounds like there an issue where the client has the lock (tag in metadata indicates that client), runner restarts due to crash, then on startup we do not have the lock anymore, but the metadata is invalid because it still references the old client? We then incorrectly report we have the lock because the now stale metadata tag info still points to our target port group, so no STPG is ever sent to this patch to activate it?

I see that bug in the code. If that is your bug, how about something like the attached patch which detects the problem and cleans up.

runner-reinit-lock-after-crash.txt

@tangwenji
Copy link
Contributor Author

Ok, so I might be misunderstanding the problem. The above comment sounds like there an issue where the client has the lock (tag in metadata indicates that client), runner restarts due to crash, then on startup we do not have the lock anymore, but the metadata is invalid because it still references the old client? We then incorrectly report we have the lock because the now stale metadata tag info still points to our target port group, so no STPG is ever sent to this patch to activate it?

Yes,I mean this。

I see that bug in the code. If that is your bug, how about something like the attached patch which detects the problem and cleans up.

You mean after apply with my patch,then your patch has bug?

@tangwenji
Copy link
Contributor Author

I think that when tcmu starts, no locks should be implicitly acquired, regardless of whether there is a lock before. The lock can only be obtained explicitly by issuing the stpg command through the Initiator.

@tangwenji
Copy link
Contributor Author

In the two TCMUs of HA, the alua group id of the same image cannot be set to the same value?

@mikechristie
Copy link
Collaborator

mikechristie commented Aug 17, 2018

You mean after apply with my patch,then your patch has bug?

I meant I see the bug you were describing.

I think that when tcmu starts, no locks should be implicitly acquired, regardless of whether there is a lock before. The lock can only be obtained explicitly by issuing the stpg command through the Initiator.

The problem is that if tcmu-runner crashes, the metadata and rbd/ceph still thinks there is a lock held (rbd_lock_get_owners will return the holder) . So you need to either re-acquire the lock or clean it up so the other nodes do not misreport the state.

My patch re-acquired it. To clean it up, we still need to grab the lock to clean up the metadata because we must hold that to access the metadata to prevent races where other nodes might be trying to grab the lock and update it.

In the two TCMUs of HA, the alua group id of the same image cannot be set to the same value?

I did not understand the question. If you have 2 node, each node has 1 iscsi target port group with 1 enabled portal in the group. image1234 is exported through both nodes, node2 has group id 2 and node3 has group id 3.

Does that answer your question? I attached a targetcli config .json file here:

saveconfig.json.txt

[edited the node numbering]

You would run this one node2 and then run a similar config file on node3 (this file would have the tpg local to that node enabled instead).

Here is what it looks like from targetcli ls on the node where the portal 192.168.32.101 is enabled.

[root@rh1 target]# targetcli ls
o- / ..................................................................... [...]
o- backstores .......................................................... [...]
| o- block .............................................. [Storage Objects: 0]
| o- fileio ............................................. [Storage Objects: 0]
| o- pscsi .............................................. [Storage Objects: 0]
| o- ramdisk ............................................ [Storage Objects: 0]
| o- user:fbo ........................................... [Storage Objects: 0]
| o- user:file .......................................... [Storage Objects: 0]
| o- user:glfs .......................................... [Storage Objects: 0]
| o- user:qcow .......................................... [Storage Objects: 0]
| o- user:rbd ........................................... [Storage Objects: 1]
| | o- testtcmulock ................... [rbd/testtcmulock (10.0GiB) activated]
| | o- alua ............................................... [ALUA Groups: 3]
| | o- default_tg_pt_gp ................... [ALUA state: Active/optimized]
| | o- tcmu2 ....................................... [ALUA state: Standby]
| | o- tcmu3 ....................................... [ALUA state: Standby]
| o- user:zbc ........................................... [Storage Objects: 0]
o- iscsi ........................................................ [Targets: 1]
| o- iqn.1999-09.com.tcmu:alua ..................................... [TPGs: 2]
| o- tpg1 .............................................. [gen-acls, no-auth]
| | o- acls ...................................................... [ACLs: 0]
| | o- luns ...................................................... [LUNs: 1]
| | | o- lun0 .................................. [user/testtcmulock (tcmu2)]
| | o- portals ................................................ [Portals: 1]
| | o- 192.168.32.101:3260 .......................................... [OK]
| o- tpg2 ....................................................... [disabled]
| o- acls ...................................................... [ACLs: 0]
| o- luns ...................................................... [LUNs: 1]
| | o- lun0 .................................. [user/testtcmulock (tcmu3)]
| o- portals ................................................ [Portals: 1]
| o- 20.15.0.200:3260 ............................................. [OK]

@tangwenji
Copy link
Contributor Author

I understanded, thanks!!

@mikechristie
Copy link
Collaborator

@tangwenji

Are you ok with my patch? Does it fix the issue you hit?

Do you use explicit failover? I think for the next release I am going to mark it experimental because our QE team did not get to fully test it.

@tangwenji
Copy link
Contributor Author

@mikechristie
your patch can fix the issue, thanks!

I use explicit failover,but it may have some bugs.

@mikechristie
Copy link
Collaborator

Just a status on this.

I found some bugs with my patch, and am fixing.

There also seems to be a bug for the case where tcmu-runner is stopped gracefully with

systemctl stop tcmu-runner
systemctl start tcmu-runner

This will start the paths in Standby. The problem is if Linux initiators/multipath had last seen the state where path1 == AO and path2 == Standby and IO was in flight during the tcmu-runner stop. On start up the IO will be failed with an error indicating the path is in standby. That is failed from the SCSI layer to the multipath layer. If multipathd tests the path before dm-multipath has tried to activate a new path (run scsi_dh_activate to send a STPG), then the test IO will succeed and the original path will just be used again. without sending a STPG. The IO will again be failed with a standby error and and we might repeat this until the initiator is rebooted.

A similar error can happen on Windows 2016.

Updating my patch to handle both the unclean and clean tcmu-runner shutdown cases.

@lxbsz lxbsz changed the base branch from master to main August 10, 2022 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants