u/Professional_Worry_1

▲ 12 r/zfs

USB Enclosure Running ZFS - My problems and solution!

For the past few years, my homelab server has been rock solid. It's a mini PC with a TB4 eGPU and a 9-bay Terramaster USB 3.2 gen2 enslosure with 12TB disks running raidz2 in debian.

My first and only issue really was the random i/o and reset errors on the usb drives/enclosure during heavy read/write usage. Usually during a scrub, i'd see these errors which would put my pool in a degraded state. Went through customer support emails, many USB cables, and power supplies.The solution has been to disable the UAS driver and use the old school BOT usb-storage. It has been fantastic with no issues at all servering production applications and media needs.

UAS/UASP is the newer USB storage protocol that lets a drive behave more like a real SATA/SCSI device over USB: it supports command queuing, multiple commands in flight, better full-duplex behavior, lower CPU overhead, and usually better random I/O and multi-drive enclosure performance. usb-storage is the older BOT mode: simpler and often more stable with flaky USB bridges, but it handles one command at a time and can bottleneck harder when ZFS is reading/writing many blocks across drives. For your ZFS/DAS use, UAS is “better” when the enclosure/controller is stable because it allows higher throughput and responsiveness, but usb-storage can be safer if UAS causes resets, disconnects, or I/O timeouts.

This week, I started planning to use ZFS's expand feature, followed by numerous scrubs and then a rewrite. So far, so good, but it's taking a lot of time as usual. This got me thinking about the newer, more efficient UAS driver that I haven't been using. Theoretically, this driver would be much more ideal while I'm doing all of this ZFS rework and using the storage simultaneously. So I started switching back to UAS and trying to Linux-quirk my way out of this USB issue.

Not much info out there other than to just disable UAS and use usb-storage, but that's not what i want to do at all. So far, since adding the following grub commands, it has been stable and giving me better parallel processing speeds!

>GRUB_CMDLINE_LINUX_DEFAULT="quiet splash usbcore.autosuspend=-1 pci=noaer nvidia-drm.modeset=1 iommu=pt"
GRUB_CMDLINE_LINUX="usb-storage.quirks=0bda:9201:rg

The g flag in the usb-storage quirk is what solved my issue. I also enabled IOMMU passthrough, and I'm going ahead and adding the r flag to disregard anything questionable that this USB-to-SATA controller is sending. Hopefully, it isn't anything meaningful. Here is the documentation on the flags and their uses. Here is the documentation on the flags and their uses.

The g flag caps the maximum number of 512-byte sectors per single I/O command at 240 sectors, which isn't ideal, but it still gives me huge improvements over the other driver.

Anyways, I hope this helps someone in a similar situation. I'll also post the i/o and USB errors I was getting below. Would love to hear the elite folks' thoughts on what I found and if there would be any other solutions to this. Just know I've tried quite a bit!

[  363.554936] sd 2:0:0:0: [sdc] tag#26 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD IN
[  363.554949] sd 2:0:0:0: [sdc] tag#26 CDB: Read(16) 88 00 00 00 00 00 83 68 64 00 00 00 01 b0 00 00
[  363.555103] sd 2:0:0:0: [sdc] tag#9 uas_eh_abort_handler 0 uas-tag 3 inflight: CMD IN
[  363.555109] sd 2:0:0:0: [sdc] tag#9 CDB: Read(16) 88 00 00 00 00 00 83 68 63 b8 00 00 00 48 00 00
[  363.555193] sd 2:0:0:0: [sdc] tag#6 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[  363.555198] sd 2:0:0:0: [sdc] tag#6 CDB: Read(16) 88 00 00 00 00 00 83 68 62 e0 00 00 00 d8 00 00
[  363.555281] usb 4-1.1.3: stat urb: no pending cmd for uas-tag 1
[  363.590943] scsi host2: uas_eh_device_reset_handler start
[  363.667172] usb 4-1.1.3: reset SuperSpeed USB device number 8 using xhci_hcd
[  363.807651] scsi host2: uas_eh_device_reset_handler success
[  374.010991] sd 2:0:0:0: [sdc] tag#26 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD
[  374.011005] sd 2:0:0:0: [sdc] tag#26 CDB: Test Unit Ready 00 00 00 00 00 00
[  374.011012] scsi host2: uas_eh_device_reset_handler start
[  374.087185] usb 4-1.1.3: reset SuperSpeed USB device number 8 using xhci_hcd
[  374.226044] scsi host2: uas_eh_device_reset_handler success
[  374.226053] sd 2:0:0:0: Device offlined - not ready after error recovery
[  374.226067] sd 2:0:0:0: Device offlined - not ready after error recovery
[  374.226071] sd 2:0:0:0: Device offlined - not ready after error recovery
[  374.226083] sd 2:0:0:0: [sdc] tag#6 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=41s
[  374.226089] sd 2:0:0:0: [sdc] tag#6 CDB: Read(16) 88 00 00 00 00 00 83 68 62 e0 00 00 00 d8 00 00
[  374.226092] I/O error, dev sdc, sector 2204656352 op 0x0:(READ) flags 0x0 phys_seg 27 prio class 2
[  374.226127] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128783003648 size=110592 flags=2148533424
[  374.226145] sd 2:0:0:0: [sdc] tag#9 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=41s
[  374.226149] sd 2:0:0:0: [sdc] tag#9 CDB: Read(16) 88 00 00 00 00 00 83 68 63 b8 00 00 00 48 00 00
[  374.226152] I/O error, dev sdc, sector 2204656568 op 0x0:(READ) flags 0x0 phys_seg 9 prio class 2
[  374.226177] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128783114240 size=36864 flags=3145904
[  374.226190] sd 2:0:0:0: [sdc] tag#26 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=41s
[  374.226194] sd 2:0:0:0: [sdc] tag#26 CDB: Read(16) 88 00 00 00 00 00 83 68 64 00 00 00 01 b0 00 00
[  374.226196] I/O error, dev sdc, sector 2204656640 op 0x0:(READ) flags 0x0 phys_seg 54 prio class 2
[  374.226220] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128783151104 size=221184 flags=2148533424
[  374.226284] sd 2:0:0:0: rejecting I/O to offline device
[  374.226284] I/O error, dev sdc, sector 2204657072 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 2
[  374.226297] I/O error, dev sdc, sector 2204659088 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 2
[  374.226326] I/O error, dev sdc, sector 2204658096 op 0x0:(READ) flags 0x0 phys_seg 124 prio class 2
[  374.226339] I/O error, dev sdc, sector 2204660112 op 0x0:(READ) flags 0x0 phys_seg 124 prio class 2
[  374.226358] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128783372288 size=1032192 flags=2148533424
[  374.226364] I/O error, dev sdc, sector 2204661104 op 0x0:(READ) flags 0x4000 phys_seg 128 prio class 2
[  374.226373] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128784404480 size=1032192 flags=2148533424
[  374.226374] I/O error, dev sdc, sector 2204662128 op 0x0:(READ) flags 0x0 phys_seg 124 prio class 2
[  374.226378] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128785436672 size=1032192 flags=2148533424
[  374.226430] I/O error, dev sdc, sector 2576 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 2
[  374.226459] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=270336 size=8192 flags=1245377
[  374.226476] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=12000127623168 size=8192 flags=1245377
[  374.226490] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=12000127885312 size=8192 flags=1245377
[  374.226556] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128786468864 size=1032192 flags=2148533424
[  374.226695] zio pool=tank vdev=/dev/disk/by-id/usb-ST12000N_T001--0:0-part1 error=5 type=1 offset=1128787501056 size=1032192 flags=2148533424
reddit.com
u/Professional_Worry_1 — 8 days ago