-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NBD integration into controller and replica #1109
base: master
Are you sure you want to change the base?
Conversation
This pull request is now in conflict. Could you fix it @Toutou98? 🙏 |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
What we can do for the feature later
cc @shuo-wu @PhanLe1010 @c3y1huang @mantissahz @Vicente-Cheng @WebberHuang1118 @innobead |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was closed because it has been stalled for 10 days with no activity. |
Reopening. I am actively investigate this PR |
This pull request is now in conflict. Could you fix it @Toutou98? 🙏 |
This PR is stale because it has been open for 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was closed because it has been stalled for 10 days with no activity. |
This pull request is now in conflict. Could you fix it @Toutou98? 🙏 |
Which issues this PR references
Issues: longhorn/longhorn#6590 (comment)
longhorn/longhorn#5002 (comment)
longhorn/longhorn#5374 (comment)
What this PR does
This PR integrates the Network Block Device (NBD) protocol as an option into both the kernel to controller (instead of iSCSI) and controller to replica communications (instead of the custom Longhorn engine protocol). NBD is fairly standard, available prebuilt in most Linux distributions. The current version also supports multiple concurrent connections, which this PR takes advantage of.
The benefit is increased performance, especially at the frontend. In my tests, using a fairly recent PC hardware configuration, I found tgtd to be a major bottleneck, limiting the frontend (kernel to controller) R/W IOPS to ~50k, whereas this number almost increases 10-fold with NBD and multiple concurrent connections. Similarly, the backend (controller to replica) performance improves, although there is probably still a lot of room for improvement inside the controller. Test setup and performance results are given below.
This PR includes all code changes, including additional parameters to the engine binaries to enable NBD and control the number of concurrent connections. It also includes some necessary additions to the container produced.
Test setup and results
Hardware configuration:
Three series of tests were conducted, using fio to load the longhorn device:
Below is a table of results, showing the best results with NBD that were achieved using 16 connections.
fio configuration file:
Notes for reviewer
To be able to replicate the results you need libnbd-dev/libnbd-devel (v1.13.1) and nbd-client to be installed in the host system. Also you need the nbd kernel module to be installed in the kernel (available on Ubuntu 22.04).
I have modified the
launch-simple-longhorn
script to use nbd both at the frontend and backend with a configurable number of parallel connections. The command used to run the engine with 16 frontend connections and 16 backend connections is:Additional information or context
Frontend implementation details:
Backend implementation details: