Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk mapping broken on DXP6800 Pro #9

Closed
b-reich opened this issue Jun 15, 2024 · 21 comments
Closed

Disk mapping broken on DXP6800 Pro #9

b-reich opened this issue Jun 15, 2024 · 21 comments

Comments

@b-reich
Copy link
Contributor

b-reich commented Jun 15, 2024

I tried the hctl and serial mapping methods

# led-disk mapping
mapping_method=hctl  # hctl, serial
led_map=(disk1 disk2 disk3 disk4 disk5 disk6 disk7 disk8)
# hctl, $> lsblk -S -o name,hctl
hctl_map=("0:0:0:0" "1:0:0:0" "2:0:0:0" "3:0:0:0" "4:0:0:0" "5:0:0:0" "6:0:0:0" "7:0:0:0")
# serial number, $> lsblk -S -o name,serial
serial_map=("ZR5DG9XX" "ZR5F8FXX" "ZR5EW2XX" "ZR5F8AX" "ZR5DH0XX" "ZR5F8GNXX" "placeholder6" "placeholder7")

hctl: changes by everytime I reboot or add/remove a disk
serial: Disk6 is off all the time, and I think there is bug when one disk is missing. I test it more later.

running on Debian 12, Linux 6.1.0-21 using dkms

@miskcoo
Copy link
Owner

miskcoo commented Jun 15, 2024

Can you check that whether disk6 appears in /sys/class/leds? If not, the issue is in the module.

Regarding the script, it is unfortunate to hear that the hctl mapping does not work on your model. For the disk6 issue, it may be because the script detects and creates the disk map at startup, so it can only handle disk removal. I may rewrite it to better handle the disk insertion event when I find some time.

If the issue lies in the problem mentioned above, a workaround could be adding a udev rule to restart the systemd service / the script when a disk removal or insertion event occurs.

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

maybe mapping by/dev/disk/by-path/ is a thing?

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

@miskcoo the kernel modules itself works.
grafik

Disk6 doenst work only with serial
and hctl is totaly random everytime

@miskcoo
Copy link
Owner

miskcoo commented Jun 15, 2024

maybe mapping by/dev/disk/by-path/ is a thing?

That seems work and is equivalent to HTCL in my model. How about yours? Would it be stable? I guess using the alphabetical order of these paths would be a possible solution.

# /dev/disks/by-path
lrwxrwxrwx 1 root root   9 Jun  9 16:41 pci-0000:00:17.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun  9 16:41 pci-0000:00:17.0-ata-1.0 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun 10 18:59 pci-0000:00:17.0-ata-2 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun 10 18:59 pci-0000:00:17.0-ata-2.0 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun  9 16:20 pci-0000:01:00.0-ata-1 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun  9 16:20 pci-0000:01:00.0-ata-1.0 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun 15 15:23 pci-0000:01:00.0-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun 15 15:23 pci-0000:01:00.0-ata-2.0 -> ../../sdb

# output of lsblk
NAME HCTL
sda  0:0:0:0
sdb  3:0:0:0
sdc  1:0:0:0
sdd  2:0:0:0

@miskcoo
Copy link
Owner

miskcoo commented Jun 15, 2024

For the disk6 issue in the serial mapping mode, can you apply this patch and run the script ugreen-diskiomon manually to observe the output?

diff --git a/scripts/ugreen-diskiomon b/scripts/ugreen-diskiomon
index 1c00117..76152ec 100755
--- a/scripts/ugreen-diskiomon
+++ b/scripts/ugreen-diskiomon
@@ -43,6 +43,7 @@ done <<< "$(lsblk -S -o name,${mapping_method} | tail -n +2)"
 for i in "${!led_map[@]}"; do
     led=${led_map[i]} 
     if [[ -d /sys/class/leds/$led ]]; then
+        echo "set ${led} to the oneshot mode"
         echo oneshot > /sys/class/leds/$led/trigger
         echo 1 > /sys/class/leds/$led/invert
         echo 100 > /sys/class/leds/$led/delay_on
@@ -57,6 +58,7 @@ for i in "${!led_map[@]}"; do
             devices[$led]=${dev}
         else
             # turn off the led if no disk installed on this slot
+            echo "${dev} not found, turn off ${led}"
             echo 0 > /sys/class/leds/$led/brightness
             echo none > /sys/class/leds/$led/trigger
         fi

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

root@r5d4:~# ls -lah /dev/disk/by-path/
total 0
drwxr-xr-x 2 root root 440 Jun 14 21:13 .
drwxr-xr-x 8 root root 160 Jun 14 21:02 ..
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-1 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-1.0 -> ../../sdc
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-2 -> ../../sda
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:00:17.0-ata-2.0 -> ../../sda
lrwxrwxrwx 1 root root  13 Jun 14 21:13 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:01:00.0-nvme-1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:01:00.0-nvme-1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root  13 Jun 14 21:13 pci-0000:56:00.0-nvme-1 -> ../../nvme1n1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:56:00.0-nvme-1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:56:00.0-nvme-1-part2 -> ../../nvme1n1p2
lrwxrwxrwx 1 root root  15 Jun 14 21:13 pci-0000:56:00.0-nvme-1-part3 -> ../../nvme1n1p3
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-1 -> ../../sdf
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-1.0 -> ../../sdf
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-2 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-2.0 -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-3 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-3.0 -> ../../sdd
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-4 -> ../../sde
lrwxrwxrwx 1 root root   9 Jun 14 21:13 pci-0000:59:00.0-ata-4.0 -> ../../sde


root@r5d4:~# lsblk -S -o name,hctl
NAME HCTL
sda  1:0:0:0
sdb  3:0:0:0
sdc  0:0:0:0
sdd  4:0:0:0
sde  5:0:0:0
sdf  2:0:0:0

@miskcoo
Copy link
Owner

miskcoo commented Jun 15, 2024

It seems they are also the same?

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

not sure, if i understand the arch wiki correct
https://wiki.archlinux.org/title/persistent_block_device_naming#by-id_and_by-path
it should be the correct mapping between the leds and disks.

I have some spare time later, I can give it a try

@meyergru
Copy link

meyergru commented Jun 15, 2024

I do not see anything that has not been said before. The mapping of sdX to physical devices (as denoted by /dev/disk/by-path or HCTL) is indeed random - that is why we do not rely on sdX enumeration for locating the disks.

In your cited example, we have:

0:0:0:0 = pci-0000:00:17.0-ata-1 = sdc = disk1
1:0:0:0 = pci-0000:00:17.0-ata-2 = sda = disk2
2:0:0:0 = pci-0000:59:00.0-ata-1 = sdf = disk3
3:0:0:0 = pci-0000:59:00.0-ata-2 = sdb = disk4
4:0:0:0 = pci-0000:59:00.0-ata-3 = sdd = disk5
5:0:0:0 = pci-0000:59:00.0-ata-4 = sde = disk6

So what is wrong, actually with using HCTL for location?

What I care more about is what happens when you remove a disk in the middle, say, disk4.
If in that case, the list above was missing 3:0:0:0, everything is fine. If only 5:0:0:0 was missing, we would be back to square one.

@yooooki
Copy link
Contributor

yooooki commented Jun 15, 2024

Your serial of disk 6 seems 1 letter longer than other disks. Is that a typo in your post or in your script?

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

Your serial of disk 6 seems 1 letter longer than other disks. Is that a typo in your post or in your script?

u are right...
there was a 5 from placeholder5 left
damit

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

I do not see anything that has not been said before. The mapping of sdX to physical devices (as denoted by /dev/disk/by-path or HCTL) is indeed random - that is why we do not rely on sdX enumeration for locating the disks.

In your cited example, we have:

0:0:0:0 = pci-0000:00:17.0-ata-1 = sdc = disk1
1:0:0:0 = pci-0000:00:17.0-ata-2 = sda = disk2
2:0:0:0 = pci-0000:59:00.0-ata-1 = sdf = disk3
3:0:0:0 = pci-0000:59:00.0-ata-2 = sdb = disk4
4:0:0:0 = pci-0000:59:00.0-ata-3 = sdd = disk5
5:0:0:0 = pci-0000:59:00.0-ata-4 = sde = disk6

So what is wrong, actually with using HCTL for location?

What I care more about is what happens when you remove a disk in the middle, say, disk4. If in that case, the list above was missing 3:0:0:0, everything is fine. If only 5:0:0:0 was missing, we would be back to square one.

Disk6 is actual sda at the moment.
sda == Disk6
sdb == Disk2
sdc == Disk5
sdd == Disk3
sde == Disk4
sdf == Disk1

serial seems to work now

@meyergru
Copy link

meyergru commented Jun 15, 2024

As I said: sdX is random with every reboot - what would be more interesting is if the slot numbers did not correlate with the HCTL order or are - as you claim - random, too (which I doubt).

Iff that is the case, we would have to reorder the HCTL values in hctl_map to fix it - BUT I have just verified for my DXP8800, the HCTL order does in fact correlate with the order of the disk slots and the by-path correlates with HCTL:

# /dev/disk/by-path
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-1 -> ../../sda
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-2 -> ../../sdc
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-3 -> ../../sdb
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:59:00.0-ata-4 -> ../../sdd
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-1 -> ../../sde
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-2 -> ../../sdf
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-3 -> ../../sdg
lrwxrwxrwx 1 root root  9 Jun 15 14:32 pci-0000:5a:00.0-ata-4 -> ../../sdh

# lsblk -S -x hctl -o name,hctl,serial
NAME HCTL       SERIAL
sda  0:0:0:0    XXKERENY
sdc  1:0:0:0    XXKG2BXY
sdb  2:0:0:0    XXGMU62K
sdd  3:0:0:0    XXKJEZBY
sde  4:0:0:0    XXKJHBYY
sdf  5:0:0:0    XXGT2Z9L
sdg  6:0:0:0    XXKH3SGZ
sdh  7:0:0:0    XXJDB19L

As you can see, sdb and sdc are out of order (i.e. they can be random), but HCTL is not - plus, I have verified the slot order by looking at each serial. I think @miskcoo did the same for his DXP4800.

In order to check if the DXP6800 is special, please show that HCTL order does not fit your slot order by showing us the output of "lsblk -S -x hctl -o name,hctl,serial" (we assume that the verified slot order of your serial numbers in the initial post is 1-6).

@meyergru
Copy link

meyergru commented Jun 15, 2024

@miskcoo: You should probably change the comment:

# serial number, $> lsblk -S -o name,serial

to:

# serial number, $> lsblk -S -x hctl -o hctl,serial,name

Otherwise, people will just execute that and list the serial numbers in the random sdX order (which will be wrong in almost all cases).

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

I am fine with the serial implementation at the moment.
I am currently trying to find out how UGOS solved it. Running it in a VM.

Also taking an eye on /dev/disk/by-path. Just because it's out of order doesn't mean it's random. But admittedly it looks very much like it. And unfortunately probably true.

Thanks for your work and support! :)

@b-reich
Copy link
Contributor Author

b-reich commented Jun 15, 2024

I can check hctl after a fresh reboot. Running badblocks at the moment (24h left)

@b-reich
Copy link
Contributor Author

b-reich commented Jun 16, 2024

hctl seems to work, maybe the readme should include a step to configure the mapping?

#1st reboot
root@r5d4:~# lsblk -S -x hctl -o hctl,serial,name
HCTL       SERIAL   NAME
0:0:0:0    ZR5DH0xx sda
1:0:0:0    ZR5F8Gxx sdb
2:0:0:0    ZR5DG9xx sdf
3:0:0:0    ZR5F8Fxx sdc
4:0:0:0    ZR5EW2xx sdd
5:0:0:0    ZR5F8Axx sde

#2nd reboot 
HCTL       SERIAL   NAME
0:0:0:0    ZR5DH0xx sdc
1:0:0:0    ZR5F8Gxx sdb
2:0:0:0    ZR5DG9xx sda
3:0:0:0    ZR5F8Fxx sde
4:0:0:0    ZR5EW2xx sdd
5:0:0:0    ZR5F8Axx sdf

Correct order is

ZR5DG9xx 
ZR5F8Fxx 
ZR5EW2xx
ZR5F8Axx 
ZR5DH0xx
ZR5F8Gxx

Think we can close this issue.

@meyergru
Copy link

meyergru commented Jun 16, 2024

O.K., so HCTL is predictable, but in the case of your DXP6800, the mapping between HCTL and slots is:

0:0:0:0 -> disk5
1:0:0:0 -> disk6
2:0:0:0 -> disk1
3:0:0:0 -> disk2
4:0:0:0 -> disk3
5:0:0:0 -> disk4

Which is unexpected and makes HCTL mapping for that device useless as-is.

Whereas with the DXP4800 it is (expected):

0:0:0:0 -> disk1
1:0:0:0 -> disk2
2:0:0:0 -> disk3
3:0:0:0 -> disk4

and with the DXP8800 it is:

0:0:0:0 -> disk1
1:0:0:0 -> disk2
2:0:0:0 -> disk3
3:0:0:0 -> disk4
4:0:0:0 -> disk5
5:0:0:0 -> disk6
6:0:0:0 -> disk7
7:0:0:0 -> disk8

So, the issue can be closed, but:

  1. Documentation should be changed
  2. HCTL order must be made model-dependent

@b-reich
Copy link
Contributor Author

b-reich commented Jun 16, 2024

Maybe, my dropbear-initramfs mess this up (my root volume is encrypted).
But thats just a random idea.

@meyergru
Copy link

Nope, most probably not. This has to do with how the SATA controllers are connected to the PCIe bus and then to the disk slots. There are up to 4 SATA channels on each controller chip, but yours is not in this configuration:

first controller -> disk1-4
second controller -> disk5-6

but:

first controller -> disk5-6
second controller -> disk1-4

which results in the HCTL order we see.

For the DXP8800, the configuration is:

first controller -> disk1-4
second controller -> disk5-8

@miskcoo
Copy link
Owner

miskcoo commented Jun 17, 2024

Interesting. I think I should mention this in the document or script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants