u/__cplusplus2

▲ 7 r/zfs

ZFS L2ARC collapsed from 700GB to 360GB, high l2_abort_lowmem

The zfs pool:

                                                            capacity     operations     bandwidth 
pool                                                      alloc   free   read  write   read  write
--------------------------------------------------------  -----  -----  -----  -----  -----  -----
Exos                                                      23.7T  41.8T     21     40  1.22M  3.39M
  raidz1-0                                                23.7T  41.8T     21     40  1.22M  3.39M
    ata-ST24000NM001H-3KS113_ZYD5B1W9                         -      -      7     13   414K  1.13M
    ata-ST24000NM001H-3KS113_ZYD5AV8T                         -      -      7     13   420K  1.13M
    ata-ST24000NM001H-3KS113_ZYD5B1DT                         -      -      7     13   419K  1.13M
cache                                                         -      -      -      -      -      -
  nvme-Lexar_SSD_NM1090_PRO_2TB_QC6614R000385P3200-part1   364G   590G     21     10  2.66M  1.16M
--------------------------------------------------------  -----  -----  -----  -----  -----  -----

kernel: 6.19.13+deb14-amd64

installed package:

zfsutils-linux:
  Installed: 2.4.1-1
  Candidate: 2.4.1-1
  Version table:
     2.4.2-2 100
        100 http://deb.debian.org/debian sid/contrib amd64 Packages
 *** 2.4.1-1 500
        500 http://deb.debian.org/debian testing/contrib amd64 Packages
        100 /var/lib/dpkg/status

System Memory: 128GB

I limited my ARC to 8GB of total system memory, as it's needed elsewhere. At first that was fine, up to about 700GB of L2ARC where it just stopped taking in more data. When I researched the issue I thought that l2arc_meta_percent might be limiting me, but after increasing it to 50, nothing changed.

Then one day I had some very non-sequential read/write workload run overnight and when I woke up, my L2ARC was drained to only 424G and the header size (l2_hdr_size) decreased to tiny amounts (822MiB from around 2GB). I also noticed l2_abort_lowmem = 254671. I never saw this number being more than zero and it hasn't increased since (1 day). Note that nothing then was RAM-heavy, I still had like good 60GB free (not accounting for linux page cache).

The trend of my L2ARC decreasing in size continued and currently it's at 364G.

Only non-default settings are:

options zfs zfs_arc_max=8589934592
options zfs l2arc_write_max=16777216
options zfs l2arc_meta_percent=50

The evicted data wasn't cold or stale, before the event my L2ARC hit rate was well above 90% in the 95% regions, Now it's nowhere near that and I see my HDDs being utilized way more. Here is a link to arcstats just after the anomaly: https://pastebin.com/raw/cxnY82e3, and another one after a day: https://pastebin.com/raw/79Dfx7hu

reddit.com
u/__cplusplus2 — 2 days ago