Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DS918+ with Sabrent NT-SS5G - Random Crash, DSM unresponsive, Unsafe Shutdown #96

Open
dedura opened this issue Jan 11, 2023 · 38 comments

Comments

@dedura
Copy link

dedura commented Jan 11, 2023

Description of the problem

Hi,
Since mid of December, I am experiencing random crashes and disconnects on my DS918+ with my Sabrent NT-SS5G Adapter, using the latest driver (v. 1.3.3.0-10). The DSM itself becomes totally unresponsive, wouldn't allow me to stop/restart the driver in Package Center and after 1-2 minutes, suddenly crashes/restarts the whole NAS. The NAS informs that the system was shut down unsafely and starts Data Scrubbing once booted.
This happens every 3-4 Days. Tried both the rear and front USB ports of the NAS, but the issue remained.

Description of your products

NAS: Synology DS918+
DSM: 7.1.1-42962 Update 3
Adapter: SABRENT NT-SS5G
Driver: 1.3.3.0-10 DSM-7.x (reuploaded)
RAM: 16GB
Other USB Port used for: (UPS) CP1500EPFCLCD - Cyber Power System, Inc.

Description of your environment

Connection: From "DS918+" to PC's NIC "Marvell® AQtion AQC107 10Gb Ethernet"
PC Motherboard: ASUS ROG MAXIMUS XII FORMULA Z490
PC OS: Windows 11 Pro 22H2
Ethernet Driver version: 3.1.7.0
Cable: VENTION 1m CAT 8 Ethernet Cable
Connection used for: SMB, WinNUT-2.0 (UPS)

The adapter was working fine before December without any issues, could this be caused after the latest DSM Update 3?
Hope you could help to fix this.
Thank you!

@bb-qq
Copy link
Owner

bb-qq commented Jan 28, 2023

Do you have any other USB devices connected, and what are the results of lsusb -a?

@dedura
Copy link
Author

dedura commented Jan 28, 2023

Hi,
I am now using my previous 2.5G CLUB 3D CAC-1420 Adapter with the driver "r8152, 2.16.3-3 DSM7.x (reuploaded)", which works fine without any issues.

Only the Ethernet Adapter and the UPS are connected, nothing else.
Please see below the output of lsusb:

|__usb1 1d6b:0002:0404 09 2.00 480MBit/s 0mA 1IF (Linux 4.4.180+ xhc i-hcd xHCI Host Controller 0000:00:15.0) hub
|__1-3 0764:0501:0001 00 2.00 12MBit/s 2mA 1IF (CPS CP1500EPFCLCD CRXLW2000395)
|__1-4 f400:f400:0100 00 2.00 480MBit/s 200mA 1IF (Synology DiskSta tion 7F008AFA20E41640)
|__usb2 1d6b:0003:0404 09 3.00 5000MBit/s 0mA 1IF (Linux 4.4.180+ xhc i-hcd xHCI Host Controller 0000:00:15.0) hub
|__2-2 0bda:8156:3000 00 3.20 5000MBit/s 512mA 1IF (Realtek USB 10/1 00/1G/2.5G LAN 000000001)

@bb-qq
Copy link
Owner

bb-qq commented Jan 29, 2023

Hmmm, from the symptoms it looks like a problem with the NT-SS5G, you might want to connect it to your PC to see if there are any stability issues.

Or you could try the QNA-UC5G1T if you can return NT-SS5G. I am also using a DS918+ and this device is running stable.

@dedura
Copy link
Author

dedura commented Jan 29, 2023

Thank you, I followed your advice and ordered the QNA-UC5G1T. Will provide feedback in the next couple of days after testing.

@dedura
Copy link
Author

dedura commented Feb 2, 2023

So, I have returned the NT-SS5G and got the QNA-UC5G1T. It's running fine now for 24 hours without crashing. I will monitor this for at least a week and update you again.
I have noticed that my max speed is 355-360 MB/s (SMB). If you are using a Windows PC, could you share the Network Adapter settings of your NIC in device manager? I could possibly tweak a little to get the full speed.

@dedura
Copy link
Author

dedura commented Feb 2, 2023

Providing iperf3 output:
(Only getting a max of 355-360 MB/s (SMB) as mentioned above)
OS: Windows 11 Pro 22H2

iperf3 -c 192.168.xx.xx -P 2
Connecting to host 192.168.xx.xx, port 5201
[ 4] local 192.168.yy.yy port 61286 connected to 192.168.xx.xx port 5201
[ 6] local 192.168.yy.yy port 61287 connected to 192.168.xx.xx port 5201
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 186 MBytes 1.56 Gbits/sec
[ 6] 0.00-1.00 sec 186 MBytes 1.56 Gbits/sec
[SUM] 0.00-1.00 sec 372 MBytes 3.12 Gbits/sec


[ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec
[ 6] 1.00-2.00 sec 200 MBytes 1.68 Gbits/sec
[SUM] 1.00-2.00 sec 401 MBytes 3.37 Gbits/sec


[ 4] 2.00-3.00 sec 194 MBytes 1.63 Gbits/sec
[ 6] 2.00-3.00 sec 190 MBytes 1.60 Gbits/sec
[SUM] 2.00-3.00 sec 384 MBytes 3.22 Gbits/sec


[ 4] 3.00-4.00 sec 208 MBytes 1.74 Gbits/sec
[ 6] 3.00-4.00 sec 206 MBytes 1.73 Gbits/sec
[SUM] 3.00-4.00 sec 414 MBytes 3.47 Gbits/sec


[ 4] 4.00-5.00 sec 171 MBytes 1.43 Gbits/sec
[ 6] 4.00-5.00 sec 170 MBytes 1.43 Gbits/sec
[SUM] 4.00-5.00 sec 340 MBytes 2.86 Gbits/sec


[ 4] 5.00-6.00 sec 205 MBytes 1.72 Gbits/sec
[ 6] 5.00-6.00 sec 204 MBytes 1.71 Gbits/sec
[SUM] 5.00-6.00 sec 409 MBytes 3.43 Gbits/sec


[ 4] 6.00-7.00 sec 195 MBytes 1.64 Gbits/sec
[ 6] 6.00-7.00 sec 194 MBytes 1.63 Gbits/sec
[SUM] 6.00-7.00 sec 389 MBytes 3.27 Gbits/sec


[ 4] 7.00-8.00 sec 203 MBytes 1.70 Gbits/sec
[ 6] 7.00-8.00 sec 202 MBytes 1.70 Gbits/sec
[SUM] 7.00-8.00 sec 406 MBytes 3.40 Gbits/sec


[ 4] 8.00-9.00 sec 194 MBytes 1.62 Gbits/sec
[ 6] 8.00-9.00 sec 192 MBytes 1.61 Gbits/sec
[SUM] 8.00-9.00 sec 386 MBytes 3.23 Gbits/sec


[ 4] 9.00-10.00 sec 209 MBytes 1.75 Gbits/sec
[ 6] 9.00-10.00 sec 208 MBytes 1.75 Gbits/sec
[SUM] 9.00-10.00 sec 417 MBytes 3.50 Gbits/sec


[ ID] Interval Transfer Bandwidth
[ 4] 0.00-10.00 sec 1.92 GBytes 1.65 Gbits/sec sender
[ 4] 0.00-10.00 sec 1.92 GBytes 1.65 Gbits/sec receiver
[ 6] 0.00-10.00 sec 1.91 GBytes 1.64 Gbits/sec sender
[ 6] 0.00-10.00 sec 1.91 GBytes 1.64 Gbits/sec receiver
[SUM] 0.00-10.00 sec 3.83 GBytes 3.29 Gbits/sec sender
[SUM] 0.00-10.00 sec 3.83 GBytes 3.29 Gbits/sec receiver

@dedura
Copy link
Author

dedura commented Feb 5, 2023

Update: Since my last post, it has disconnected 4 times, I had to manually stop the driver and start again.
The good news: It didn't freeze, crash or restart my NAS.
Have you encountered this problem?

@jaqb
Copy link

jaqb commented Mar 9, 2023

I'm experiencing the same issue with my DS920+. I have also returned NT-SS5G and got QNA-UC5G1T. Then I even got the recommended SABRENT hub with power adapter, but the issue still persists. One time my NAS restarted by itself, so that was bad. But usually is just loses connection and I need to restart the driver. Most of the time I can restart the driver but sometimes it's just impossible to do this.

@dedura
Copy link
Author

dedura commented Mar 9, 2023

I have installed an older driver version "1.3.3.0-8 DSM-7.x. Working completely fine without a single crash or reboot since 25th February. See if that works for you.

@jaqb
Copy link

jaqb commented Mar 9, 2023

Thanks. I have downgraded to 1.3.3.0-8. I kind of know what to do to make the driver crash so I'll test it out.

@jaqb
Copy link

jaqb commented Mar 10, 2023

Nope, already had 2 improper shutdowns. Downgrading does not fix the issue for me.

@dedura
Copy link
Author

dedura commented Mar 15, 2023

Same here, just crashed the whole system, rebooted and started Data Scrubbing. I went back to the 2.5G Adapter now.

@dedura
Copy link
Author

dedura commented Mar 16, 2023

OK, now my 2.5G Adapter crashes too with the latest "r8152" driver. As I mentioned in my initial post, I believe something got messed up after the DSM (3) update.

@jaqb
Copy link

jaqb commented Mar 16, 2023

I would love to hear from @bb-qq regarding this issue ? Is there a way I can help to pinpoint the problem ?

@bb-qq
Copy link
Owner

bb-qq commented Mar 18, 2023

I am wondering how much traffic is flowing through the adapter before it becomes unstable. Heat might be causing the problem.

If you plugged that adapter into a Windows PC and kept the same amount of traffic flowing through it, would it work stably for an extended period of time?

@bb-qq
Copy link
Owner

bb-qq commented Mar 18, 2023

I am also curious as to how much memory you have in your NAS.

The versions of the driver discussed in this thread include changes in kernel parameters related to memory, so it is possible that those changes are causing the problem.

@dedura
Copy link
Author

dedura commented Mar 18, 2023

I got 16GB Memory installed (2x 8GB) from Crucial.
Traffic does not seem to be an issue for me as the driver randomly crashes even when transferring some photos or multiple documents. Another scenario, when I open Surveillance Station on my PC or backup using Synology Drive, then the driver randomly crashes too.
I have tried the adapter on Windows 10 & 11 and copied multiple GB files without any issues, didn't crash.

The changes in Kernel Parameters could be true as the issue started with the DSM Update 3. Is there a fix for it?

@jaqb
Copy link

jaqb commented Mar 18, 2023

I've got 20GB of RAM (4+16). I also don't think it's about the amount of traffic and temperature but I can't be 100% sure. For me crashes happen when I do something with webdav and plex. Like streaming from webdav server. But sometimes also just refreshing the metadata on plex. The only thing I can say about the temperature is that one time when it crashed I have touched the casing of QNA-UC5G1T and it was just barely warm. Is there a way to check the internal temperature of QNA-UC5G1T ? I do have both "Low Power 5G" and "Thermal throttling" set to ON to make sure the temperature is in check.

@bb-qq
Copy link
Owner

bb-qq commented Mar 21, 2023

The changes in Kernel Parameters could be true as the issue started with the DSM Update 3. Is there a fix for it?

I was mentioning the changes on the driver's side. (#96 (comment))
I don't know the details of the changes on the DSM side.

Is there a way to check the internal temperature of QNA-UC5G1T ?

As far as I know, there is no way to know the internal temperature. The only measure I can think of is to place it in a well-ventilated area and see the difference.
(I saw a post once that said removing the case and installing a fan stabilized it, but I think it would be risky to go that far.)

@bb-qq
Copy link
Owner

bb-qq commented Mar 21, 2023

I don't have any ideas to investigate the cause, but since your NAS seems to have much memory, could you try doubling the value of target_value with the /var/packages/aqc111/scripts/apply-memory-setting, although it is unlikely to improve the situation?

@dedura
Copy link
Author

dedura commented Mar 21, 2023

Thanks for your reply @bb-qq
I have now doubled the target value and restarted the NAS. Will test it out and provide feedback.

`root@:/var/packages/aqc111/scripts# cat apply-memory-setting
#!/bin/sh

set -eu

target_value=524288
current_value=sysctl -n vm.min_free_kbytes
if [ "${current_value}" -lt "${target_value}" ]
then
sysctl -w vm.min_free_kbytes=${target_value}
fi
root@:/var/packages/aqc111/scripts# vim apply-memory-setting
root@:/var/packages/aqc111/scripts# cat apply-memory-setting
#!/bin/sh

set -eu

target_value=1048576
current_value=sysctl -n vm.min_free_kbytes
if [ "${current_value}" -lt "${target_value}" ]
then
sysctl -w vm.min_free_kbytes=${target_value}
fi
root@:/var/packages/aqc111/scripts#`

@dedura
Copy link
Author

dedura commented Mar 22, 2023

Hi @bb-qq - Whole NAS crashed in the morning. I turned the PC on and opened a file (Excel spreadsheet) via SMB, the adapter itself was cold, not even slightly warm and it crashed the whole NAS and rebooted. Upon boot, it started data scrubbing on the volume.
Also want to mention, I ran the Memory Test via Synology Assistant last night and it passed without any errors.
No idea what else I can do to troubleshoot.

Since you have the same Synology model, have you not encountered any of these issues yourself?
Do you mind me asking what your specs are, i.e. Memory (Official/Unofficial), DSM version, NIC on the PC and the driver version of that.
Not sure if my PC's NIC driver is probably causing these crashes.
I am using the latest driver from Marvell (v3.1.7.0)

@bb-qq
Copy link
Owner

bb-qq commented Mar 22, 2023

Since you have the same Synology model, have you not encountered any of these issues yourself?

I have experienced a few times a year when I did not have low power mode enabled on a device that the device would stop responding and I would have to reload the driver. However, I have never experienced a NAS crash.

Do you mind me asking what your specs are, i.e. Memory (Official/Unofficial), DSM version, NIC on the PC and the driver version of that.

My environment is as follows:

  • Memory: Unofficial
$ sudo dmidecode --type memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0023, DMI type 16, 23 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: None
        Maximum Capacity: 16 GB
        Error Information Handle: No Error
        Number Of Devices: 2

Handle 0x0024, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0023
        Error Information Handle: No Error
        Total Width: 8 bits
        Data Width: 8 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: ChannelA-DIMM0
        Bank Locator: BANK 0
        Type: DDR3
        Type Detail: Synchronous
        Speed: 1600 MT/s
        Manufacturer: Samsung
        Serial Number: 35701618
        Asset Tag: 9876543210
        Part Number: M471B1G73BH0-YK0
        Rank: Unknown
        Configured Memory Speed: 1600 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: Unknown

Handle 0x0025, DMI type 17, 40 bytes
Memory Device
        Array Handle: 0x0023
        Error Information Handle: No Error
        Total Width: 8 bits
        Data Width: 8 bits
        Size: 8192 MB
        Form Factor: SODIMM
        Set: None
        Locator: ChannelB-DIMM0
        Bank Locator: BANK 1
        Type: DDR3
        Type Detail: Synchronous
        Speed: 1600 MT/s
        Manufacturer: Samsung
        Serial Number: 35701618
        Asset Tag: 9876543210
        Part Number: M471B1G73BH0-YK0
        Rank: Unknown
        Configured Memory Speed: 1600 MT/s
        Minimum Voltage: Unknown
        Maximum Voltage: Unknown
        Configured Voltage: Unknown
  • DSM version: 7.1.1-42962 Update 4
$ cat /etc/VERSION
majorversion="7"
minorversion="1"
major="7"
minor="1"
micro="1"
productversion="7.1.1"
buildphase="GM"
buildnumber="42962"
smallfixnumber="4"
nano="4"
base="42962"
builddate="2023/02/01"
buildtime="20:01:57"
  • QNA-UC5G1T FW version: 3.1.6 (latest FW on the QNAP website)
  • Connected USB port: front port with a stock cable
  • PC NIC: AQN-107 (direct connection)
  • PC NIC Driver: 2.2.3.0

@dedura
Copy link
Author

dedura commented Mar 22, 2023

Thank you - the specs look nearly identical to mine.
The last option I could try is to update to the DSM 7.2 BETA version and see if that makes any difference.
It would be great if you can provide an updated driver that will work with the 7.2 Beta.
Thanks

@bb-qq
Copy link
Owner

bb-qq commented Mar 23, 2023

I created drivers for the DSM 7.2 BETA, but I think it is unlikely that the DSM update will improve symptoms.
https://github.com/bb-qq/aqc111/releases/tag/1.3.3.0-11

I wish I could at least find the cause of the reboot....

@dedura
Copy link
Author

dedura commented Mar 23, 2023

Thank you @bb-qq , appreciated.
I have also ordered 2x 4GB Memory, which is the maximum supported Memory as per Intel's website for the INTEL Celeron J3455.
Some users claim it won't utilise anything above 8GB or if it tries, the system crashes, so let me find out if this makes any difference. If you require any system outputs/logs from me, please let me know.

@jaqb
Copy link

jaqb commented Mar 23, 2023

bb-qq already said he also has 2x8GB so I don't think that's it. I'm currently testing something and it's looking good. I'm going to stay with 1.3.3.0-10 while I test my thing. Btw how full is your system partition ( /dev/md0) ? df -h

@dedura
Copy link
Author

dedura commented Mar 23, 2023

@jaqb - Here you go. Looking forward to hearing about your test results.
Does this look right?

root@:~# df -h /dev/md0
Filesystem Size Used Avail Use% Mounted on
/dev/md0 2.3G 1.9G 365M 84% /

@dedura
Copy link
Author

dedura commented Mar 25, 2023

@jaqb - Just wondering, do you use your M.2 SSD as Cache or Volume?
I had mine set up as volume for over a year and the aqc111 driver was installed on that volume (volume2) - Upon checking the log files (/var/log/messages), I found quite a few error messages related to volume2.

synostgvolume[840]: fs_btrfs_metadata_usage_query.c:137 Failed to check the btrfs metadata usage of volume [/volume2].

The above message is repeated multiple times.
I have now removed volume2 and using it as a normal cache now.
Also replaced the 16GB RAM with 2x 4GB.
So far it runs stable, even booting/restarting the NAS is much faster than before.
Will test and provide feedback.

@jaqb
Copy link

jaqb commented Mar 25, 2023

84% used seems about right. I have now 82% but I had 100% couple of days ago so I had a lot of weird issues. Had to delete a bunch of logs to get this low.

I use 2x m.2 ssd's as cache for read-write.

@jaqb
Copy link

jaqb commented Apr 4, 2023

@dedura So I don't have any crashes anymore. At first I lowered the MTU (Jumbo frame) to 5000 on Synology. This fixed the driver crashes for me but the speed to my pc was worse than before. Then I noticed that I can set my pc's network adapter's jumbo frame to 4088 bytes. So I matched that on synology too and now I get good transfer speed. (Synology is set to 4000)
image

@bb-qq Is you pc and synology both set to MTU 9000 and you don't experience any get driver crashes ?

@dedura
Copy link
Author

dedura commented Apr 5, 2023

@jaqb - No crashes or freezing for me since 2 weeks after replacing the RAM with 2x 4GB, even though the 16GB passed the memory test.
MTU on PC (9014) > Synology (9000) > Stable, no issues.
It looks like our resolutions are entirely different, but happy it works now.

@bb-qq
Copy link
Owner

bb-qq commented Apr 22, 2023

I also have the MTU set to 9000 on my PC and NAS and have never experienced a crash.

@jaqb
Copy link

jaqb commented Aug 10, 2023

I have found a way to crash the driver.

I have mounted a folder using NFS on windows. Then tried playing 2 4K movie/show remuxes (==high bitrate) using mpv for at the same time and then just started seeking forward through the video. This crashed aqc111 driver every time for me.

Hopefully you will be able to reproduce this.

@jaqb
Copy link

jaqb commented Nov 16, 2023

@bb-qq Have you tried reproducing this issue ?

@bb-qq
Copy link
Owner

bb-qq commented Nov 19, 2023

I didn't know how to handle NFS on Windows, so I mounted it with CIFS (SMB) and loaded it, but the symptoms did not reproduce. The connection is retained.

I also ran iperf and CrystalDiskmark under load for an extended period of time and could not reproduce the problem.

One time my NAS restarted by itself, so that was bad. But usually is just loses connection and I need to restart the driver. Most of the time I can restart the driver but sometimes it's just impossible to do this.

I am concerned about this symptom. While problems with driver instability are often reported depending on the environment, reports of the NAS itself crashing are rare. As the posts in this thread indicate, it was usually due to hardware related issues such as RAM or SSD.

@jaqb
Copy link

jaqb commented Dec 2, 2023

@bb-qq To enable NFS on Windows you just go to "Turn Windows features on or off" in Windows settings and check "Services for NFS". Then you enable NFS for shared folder on Synology (for your Window pc) and access the NFS share on Windows by using the full path. e.g. \\SYNOLOGYNAS\volume1\sharedfolder
I think for you to replicate this is the key to resolving the crash issues in your awesome driver. Hopefully you will be able to replicate it now.

@bb-qq
Copy link
Owner

bb-qq commented Jun 15, 2024

I have tried applying the load both the way you taught me and using WSL2, but the problem did not reproduce in my environment.

I still think it has to do with the hardware problem as described in one of the previous comments.
If removing the SSD or replacing it with the factory-installed RAM causes the same problem, I don't know what to do...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants