Poor RBD performance as LIO-TCMU iSCSI target #359

DongyuanPan · 2018-01-23T03:22:51Z

Hi~ I am a senior university student and I've been learning ceph and iscsi recently.

I'm using fio to test the performance of the RBD,but performance degradation when using RBDs with
LIO-TCMU.

My test is mainly about the performance of the RBD as a target using LIO_TCMU、the performance of the RBD itself (no iSCSI or LIO-TCMU)、the performance of the RBD as a target using TGT.

Details about the test environment:

Single node test "cluster" (osd pool default size = 1) with Ceph (version 12.2.2)
CentOS 7.4 (3.10.0-693.11.6.el7.x86_64)
fio-2.99 tcmu-runner-1.3.0-re4
16 OSD and osd_objectstore is bluestore
rbd default features = 3

I use targetcli(or tgtadm) to create target device and use initiator to login it.And then，I use fio to test the device.
1）the performance of the RBD itself (no iSCSI or LIO-TCMU)
rbd create image-10 --size 102400 (rbd default features = 3)
fio test config

[global]
#logging
#write_iops_log=write_iops_log
#write_bw_log=write_bw_log
#write_lat_log=write_lat_log
ioengine=rbd
clientname=admin
pool=rbd
rbdname=image-10
rw=randwrite
bs=4k
numjobs=4
buffered=0
runtime=180
group_reporting=1

[rbd_iodepth32]
iodepth=128
#write_iops_log=write_rbd_default_feature_one
#log_avg_msec=1000

performance: 35-40 K IOPS

2)the performance of the RBD as a target using TGT.
create lun:
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1 --backing-store rbd/image-10 --bstype rbd

initiator
iscsiadm -m node --targetname iqn.2018-01.com.example02:iscsi -p 192.168.x.x:3260 -l

the lun was mounted as /dev/sdw

fio test

[global]
bs=4k
ioengine=libaio
iodepth=128
direct=1
#sync=1
runtime=30
size=60G
buffered=0
#directory=/mnt
numjobs=4
filename=/dev/sdw
group_reporting=1

[rand-write]
time_based
write_iops_log=write_tgt_default_feature_three
log_avg_msec=1000
rw=randwrite
#stonewall

performance: 18-20K IOPS

3）the performance of the RBD as a target using LIO_TCMU
use targetcli to create lun and tpg default_cmdsn_depth=512.
initiator side
node.session.cmds_max = 2048
node.session.queue_depth = 1024

fio
[global]
bs=4k
ioengine=libaio
iodepth=128
direct=1
#sync=1
runtime=180
size=50G
buffered=0
#directory=/mnt
numjobs=4
filename=/dev/sdv
group_reporting=1

[rand-write]
time_based
write_iops_log=write_tgt_default_feature_three
log_avg_msec=1000
rw=randwrite
#stonewall

/dev/sdv backend image-10

performance: 7K IOPS

I found an issue similar to me, but I still haven't found the problem
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044021.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-December/045347.html

Thanks for any help anyone can provide!

The text was updated successfully, but these errors were encountered:

mikechristie · 2018-01-23T06:17:54Z

We are just starting to investigate performance.

One known issue is that for LIO and open-iscsi you need to have node.session.cmds_max match the LIO default_cmdsn_depth setting. If they are not the same, then there seems to be a bug on the initiator side where IOs are requeued and do not get retried quickly like normal.

There is another issue for latency/IOPs type of tests where one command slows others. The attached patch

runner-dont-wait.txt

is a hack around it but it needs work because it can cause extra switches.

For target_core_user there are other issues like its memory allocation in the main path, but you might not be hitting that with the fio arguments you are using.

DongyuanPan · 2018-01-24T07:39:16Z

Thank you～
@mikechristie

I retested the performance with tcmu-runner-1.3.0 and set node.session.cmds_max match the LIO default_cmdsn_depth.There are some improvements in performance( 18.8 K IOPS)，and the performance the same as using TGT.
If I use tcmu-runner-1.3.0 without optimization . Is this normal for the performance（the same as TGT）？

There is another issue for latency/IOPs type of tests where one command slows others. The attached patch runner-dont-wait.txt is a hack around it but it needs work because it can cause extra switches.

if I test with the patch，the performance （32K IOPS） approach the RBD itself.
But the patch is only for tests？
The argument wakeup is determined by aio_track->tracked_aio_ops. AIO must be tracked ? What might occur if I do not track AIO ？Can this parameter be specified by the user？

mikechristie · 2018-01-24T16:19:18Z

Thanks for testing.

if I test with the patch，the performance （32K IOPS） approach the RBD itself. But the patch is only for tests？

Yeah, the patch needs some cleanup, because of what you notice below.

The argument wakeup is determined by aio_track->tracked_aio_ops. AIO must be tracked ? What might occur if I do not track AIO ？Can this parameter be specified by the user？

It is used during failover/failback and recovery to make sure IOs are not being executed in the handler modules (handler_rbd, handler_glfs, etc) when we execute a callout like lock() or (re)open().

So ideally, we have these issues:

In aio_command_finish we do not want to batch commands like we do today. We can either completely drop the batching like in the patch attached in the previous comment, or we can try to add some setting to try and limit how long we wait before calling tcmulib_processing_complete. For example we could do something like:


If (!wakeup && current_batch_wait > batch_timeout)
        tcmulib_processing_complete(dev);

We would like to remove the track_lock from the main IO path, but we still need a way to make sure IO is not running on the handler when we do the lock/open callouts. We can maybe replace the aio_wait_for_empty_queue calls with tcmu_flush_device calls.

MIZZ122 · 2018-01-25T12:01:42Z

@Github641234230 I noticed that your kernel version is 3.10.0-693.11.6.el7.x86_64.
Did you add some patch for your kernel?
Are you going to be HA?
My kernel is 3.10.0-693.11.6.el7.x86_64，tcmu-runner-1.3.0-re4.
There is IOERROE when modify the kernel parameter enable = 1.

MIZZ122 · 2018-01-25T12:14:10Z

@mikechristie If our product can only use CentOS 7.4 (3.10.0-693.11.6.el7.x86_64) and I want to do HA.what should I do?Which patch I can use?

I've tried using targetcli to export RBDs to all gateway nodes,On the iscsi client side, I use dm-multipath to find it and it can work well( both Active/Active and active/passive).Is there any problem using this method for HA?
And this issue #356
Active/Active is not supported. I am very confused.

mikechristie · 2018-01-25T14:10:27Z

@MIZZ122

For upstream tcmu-runner/ceph-iscsi-cli HA support you have to use RHEL 7.5 beta or newer kernel or this kernel:

https://github.com/ceph/ceph-client.

HA is only supported with active/passive. You must use the settings here

http://docs.ceph.com/docs/master/rbd/iscsi-initiators/

Just because dm-multipath let's you setup active/active does not mean it is safe. You can end up with data corruption. Use the settings in the docs.

If you are doing single node (non HA) then you can do active/active across multiple portals on that one node.

mikechristie · 2018-01-25T14:11:09Z

@MIZZ122 if you have other questions about active/active can you open a new issues or discuss it in the issue for active/active. This issue is for perf only.

dillaman · 2018-03-15T13:38:23Z

@mikechristie Any update on this issue?

mikechristie · 2018-03-15T16:50:03Z

@lxbsz was testing it out for gluster with the perf team. lxbsz, did it help and did you make the changes I requested and were they needed or was it ok to just always just complete right away?

It looks like you probably got busy with resize so I can do the changes. Are you guys working with the perf team still, so we can get them tested?

lxbsz · 2018-03-16T03:35:41Z

@mikechristie Yes, we and the perf team together test this.

The environment is base PostgreSQL database when running on Gluster Block Volume in a CNS environment.

1, by changing node.session.cmds_max to match the LIO default_cmdsn_depth.
The performance improved just very small improvement, about 5%?

2, by https://github.com/open-iscsi/tcmu-runner/files/1654757/runner-dont-wait.txt
The performance improved about 10%.

3, by changing the default_cmdsn_depth to 64:
The performance improved about 27%.

So we are preparing to have a more test about this later. These days we are busy with the RHGS's release.

mikechristie · 2018-03-16T04:20:36Z

Ok, assume this is back on me.

lxbsz · 2018-03-16T04:29:03Z

We will test this by mixing them up later once we have enough time.

serjponomarev · 2018-03-25T19:27:46Z

Can I use this (https://github.com/open-iscsi/tcmu-runner/files/1654757/runner-dont-wait.txt) patch for production ESXi environment?
If not recommended, how i can help you to investigate performance for fix it?
I have all needed hardware

mikechristie · 2018-03-26T03:32:42Z

It is perfectly safe crash wise but might cause other regressions. If you can test, I can give you a patch later this week that makes it configurable so we can try to figure out if there is some balance between the 2 extreme settings being used with and without the patch or if it might need to be configurable for the type of workload.

serjponomarev · 2018-03-26T07:09:08Z

Ok, i'am waiting patch and instruction for how test it (ceph, tcmu-runner, FIO)

DongyuanPan · 2018-04-04T06:29:49Z

In my test environment for ceph rbd, the tgt perf is better than lio-tcmu.
So I created an IBLOCK backstore from a /dev/sda block device by Targetcli in order to test the LIO perf without tcmu/tcmu-runner.

4K rand_write
LIO+SSD DISK -> IOPS=48.9k, BW=191MiB/s
TGT+SSD DISK -> IOPS=49.2k, BW=192MiB/s

4K rand_read
LIO+SSD DISK -> IOPS=44.9k, BW=175MiB/s
TGT+SSD DISK -> IOPS=46.5k, BW=182MiB/s

64K write
LIO+SSD DISK ->IOPS=6221, BW=389MiB/s
TGT+SSD DISK -> IOPS=9100, BW=569MiB/s

64K read
LIO+SSD DISK ->IOPS=8389, BW=524MiB/s
TGT+SSD DISK ->IOPS=19.3k, BW=1208MiB/s

The perf of TGT is better than LIO. It's strange.
Thanks for any help anyone can provide!

wwba · 2018-04-09T06:23:41Z

@mikechristie
In my ceph cluster, the throughput of the scsi disks is much lower than RBD's.
I run the LIO iscsi gateway in vm with kernel version '4.16.0-0.rc6'. In the vm, I compaired the performance of tcmu-runner with KRBD using fio util(sync=1, -ioengine=psync -bs=4M -numjobs=10).

4M seq write & one LIO gw for a RBD
KRBDBW=409MiB, avg lat = 97ms
LIO + TCMU BW=131MiB, avg lat = 305ms
TGT+rbd_bsBW=362MiB, avg lat = 110ms

4M seq read & one LIO gw for a RBD
KRBD BW=1571MiB, avg lat = 25ms
LIO + TCMU BW=256MiB, avg lat = 155ms
TGT+rbd_bsBW=1556MiB, avg lat = 26ms

4M seq write & one LIO gw for four RBDs
KRBD BW=205MiB, avg lat = 190ms
LIO + TCMUBW=42MiB, avg lat = 921ms
TGT+rbd_bsBW=193MiB, avg lat = 206ms

4M seq read & one LIO gw for four RBDs
KRBD BW=416MiB, avg lat = 96ms
LIO + TCMU BW=148MiB, avg lat = 270ms
TGT+rbd_bsBW=397MiB, avg lat = 100ms

I have a poor throughput for scsi disk using TCMU, is this having something to do with the

For target_core_user there are other issues like its memory allocation in the main path

as you say ?

shadowlinyf · 2018-06-12T06:28:55Z

@mikechristie Has the runner-dont-wait.txt patch already been merged to 1.4RC1?

mikechristie · 2018-06-12T17:02:18Z

Yes.

shadowlinyf · 2018-06-26T06:26:16Z

@mikechristie I am having performance issue with EC RBD as backend store. I am using 1.4RC1.KRBD seq write speed is about 600MB/s.TCUM+RBD seq write speed is around 30MB/s.

NUABO · 2018-09-18T03:08:30Z

hi @shadowlinyf ,will you test it again afterwards, is tcmu still very poor?

Allenscript · 2019-03-26T17:40:12Z

now i meet the same performance like this , fio with rbd ,the result was about 500MB/s , if with tcmu of user：rbd , the fio test result was about 15MB/s ，this performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11

deng-ruixuan · 2021-08-27T06:59:05Z

now i meet the same performance like this , fio with rbd ,the result was about 500MB/s , if with tcmu of user：rbd , the fio test result was about 15MB/s ，this performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11

i meet the same performance like this. I seem to have solved this my problem. Although there are other performance issues.
we can try to use gwcli to set the following parameters for disk:
/disks> reconfigure blockpool/image01 hw_max_sectors 8192
/disks> reconfigure blockpool/image01 max_data_area_mb 128
After setting，the performance of tcmu can approximate the performance of librbd in HHD scenarios

mikechristie closed this as completed Mar 16, 2018

mikechristie reopened this Mar 16, 2018

dillaman mentioned this issue Mar 30, 2018

tcmur: improve batch kernel wake up notifications #392

Merged

mikechristie mentioned this issue Apr 28, 2018

Why is the performance of my lio iscsi gateway so poor? ceph/ceph-iscsi-cli#92

Open

Allenscript mentioned this issue Mar 26, 2019

user:rbd performance is too poor , my env is ：kernel -5.0.4 , tcmu -lasest release 1.4.1 , ceph - 12.2.11 #543

Open

lxbsz mentioned this issue Aug 31, 2021

need help / tcmu-runner performance very slow #668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor RBD performance as LIO-TCMU iSCSI target #359

Poor RBD performance as LIO-TCMU iSCSI target #359

DongyuanPan commented Jan 23, 2018

mikechristie commented Jan 23, 2018

DongyuanPan commented Jan 24, 2018

mikechristie commented Jan 24, 2018

MIZZ122 commented Jan 25, 2018

MIZZ122 commented Jan 25, 2018

mikechristie commented Jan 25, 2018

mikechristie commented Jan 25, 2018

dillaman commented Mar 15, 2018

mikechristie commented Mar 15, 2018

lxbsz commented Mar 16, 2018

mikechristie commented Mar 16, 2018

lxbsz commented Mar 16, 2018

serjponomarev commented Mar 25, 2018

mikechristie commented Mar 26, 2018

serjponomarev commented Mar 26, 2018

DongyuanPan commented Apr 4, 2018 •

edited

Loading

wwba commented Apr 9, 2018 •

edited

Loading

shadowlinyf commented Jun 12, 2018

mikechristie commented Jun 12, 2018

shadowlinyf commented Jun 26, 2018

NUABO commented Sep 18, 2018

Allenscript commented Mar 26, 2019

deng-ruixuan commented Aug 27, 2021

Poor RBD performance as LIO-TCMU iSCSI target #359

Poor RBD performance as LIO-TCMU iSCSI target #359

Comments

DongyuanPan commented Jan 23, 2018

mikechristie commented Jan 23, 2018

DongyuanPan commented Jan 24, 2018

mikechristie commented Jan 24, 2018

MIZZ122 commented Jan 25, 2018

MIZZ122 commented Jan 25, 2018

mikechristie commented Jan 25, 2018

mikechristie commented Jan 25, 2018

dillaman commented Mar 15, 2018

mikechristie commented Mar 15, 2018

lxbsz commented Mar 16, 2018

mikechristie commented Mar 16, 2018

lxbsz commented Mar 16, 2018

serjponomarev commented Mar 25, 2018

mikechristie commented Mar 26, 2018

serjponomarev commented Mar 26, 2018

DongyuanPan commented Apr 4, 2018 • edited Loading

wwba commented Apr 9, 2018 • edited Loading

shadowlinyf commented Jun 12, 2018

mikechristie commented Jun 12, 2018

shadowlinyf commented Jun 26, 2018

NUABO commented Sep 18, 2018

Allenscript commented Mar 26, 2019

deng-ruixuan commented Aug 27, 2021

DongyuanPan commented Apr 4, 2018 •

edited

Loading

wwba commented Apr 9, 2018 •

edited

Loading