u/krismatu

suboptimal allocator behavior under heavy load with somehow asymmetric devices setup

Greetings everybody
Some troubles with my volume 3xNVME 3xHDD.

Device label                Device     State   Size   Used  Use%  Leaving
bhdd.seaJ6ER (device 24):   sdc4       rw     15.8T   174G    1%
bhdd.tosh21F0 (device 13):  sda4       rw     10.5T  3.21T   30%    4.25M
bhdd.tosh4310 (device 14):  sdb4       rw     10.5T  2.96T   28%
bnvme.970evo (device 5):    nvme2n1p6  rw     62.8G  61.8G   97%    27.8G
bnvme.990pro (device 23):   nvme1n1p6  rw      387G   217G   57%     212G
bnvme.sn720 (device 11):    nvme0n1p6  rw     74.0G  72.8G   97%    38.2G

nvme1 is substantially bigger (and faster) hdc is somehow bigger and added recently thus filled a little.

Now. Under heavy load iostat

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.4%    0.0%    1.0%   38.4%    0.0%   58.2%

          rkB/s   rrqm/s  %rrqm r_await rareq-sz Device
   35.33      4.4M    37.53  51.5%    1.45   126.4k nvme0n1
   16.80      1.0M     0.00   0.0%    0.79    63.6k nvme1n1
   21.33      3.3M    33.40  61.0%    3.52   158.0k nvme2n1
  163.00     15.2M   259.80  61.4%  229.71    95.7k sda
  211.27     28.1M   582.67  73.4%  171.60   136.3k sdb
   90.87      5.5M    27.60  23.3%   27.99    61.8k sdc

     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz Device
   30.60      2.9M     5.20  14.5%    0.59    98.7k nvme0n1
   82.00     44.0M    67.80  45.3%    1.82   549.8k nvme1n1
   23.67      2.3M     4.87  17.1%    2.46    98.7k nvme2n1
   26.73     43.4M   119.40  81.7%  324.65     1.6M sda
    5.20      2.5M    35.93  87.4%  381.90   498.4k sdb
    3.80    136.5k     1.93  33.7%    4.18    35.9k sdc

     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz Device
    6.93      2.7M     3.87  35.8%    0.55   398.8k nvme0n1
    3.80      5.2M     1.40  26.9%    0.81     1.4M nvme1n1
    8.27      2.1M     0.00   0.0%    2.26   256.0k nvme2n1
    0.00      0.0k     0.00   0.0%    0.00     0.0k sda
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdb
    0.00      0.0k     0.00   0.0%    0.00     0.0k sdc

     f/s f_await  aqu-sz  %util Device
    3.40    0.25    0.07   0.6% nvme0n1
    3.40    2.16    0.17   2.9% nvme1n1
    3.40    2.08    0.16   2.7% nvme2n1
    3.33  114.66   46.50  85.9% sda
    3.33   90.46   38.54  85.3% sdb
    3.33    3.84    2.57   6.7% sdc

As you can see sdc is used a little. Thus heavy sda/sdb use makes a bottle-neck.

The same hardware is used, by different partitions, to make another bcachefs volume that is used at the same time- for reading. One is simply data volume and the other I'm giving details- as backup. Data volume is somehow different but relatively similar to backup one I'm giving details now.

Is this a case for optimalization per parameters tuning or per sourcecode patching?

Any suggestions welcome

reddit.com
u/krismatu — 9 days ago