u/Bassmaster187

Intel Arc Pro B70 performance and stability

Are there any other users out there with the B70 and can share some experiences?

I made some tests and this is what I got:

Vulcan on llama.cpp is better than sycl:

C:\Users\mail>"C:\Program Files\llama.cpp-vulcan\llama-bench.exe" -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M

| model | size | params | backend | ngl | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |

| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | Vulkan | 99 | pp512 | 700.44 ± 13.44 |

| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | Vulkan | 99 | tg128 | 27.22 ± 0.07 |

build: 99d4026b1 (9286)

C:\Users\mail>"C:\Program Files\llama.cpp\llama-bench.exe" -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M

| model | size | params | backend | ngl | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |

| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | SYCL | 99 | pp512 | 315.00 ± 2.66 |

| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | SYCL | 99 | tg128 | 21.93 ± 0.37 |

build: 47c0eda9d (9279)

Qwen3.5-35B-A3B with SYCL is very unstable:

C:\Users\mail>"C:\Program Files\llama.cpp\llama-bench.exe" -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M

load_backend: loaded RPC backend from C:\Program Files\llama.cpp\ggml-rpc.dll

load_backend: loaded SYCL backend from C:\Program Files\llama.cpp\ggml-sycl.dll

load_backend: loaded CPU backend from C:\Program Files\llama.cpp\ggml-cpu-zen4.dll

| model | size | params | backend | ngl | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |

level_zero backend failed with error: 40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)

Exception caught at file:D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp, line:2954, func:operator()

SYCL error: CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].row_low, dev[i].row_high, src1_ncols, src1_padded_col_size, stream)): Exception caught in this line of code.

in function ggml_sycl_op_mul_mat at D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:2954

D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\..\ggml-sycl\common.hpp:143: SYCL error

with Vulcan you can get 102t/s

C:\Users\mail>"C:\Program Files\llama.cpp-vulcan\llama-bench.exe" -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M

| model | size | params | backend | ngl | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |

| qwen35moe 35B.A3B Q4_K - Medium | 20.49 GiB | 34.66 B | Vulkan | 99 | pp512 | 1940.93 ± 91.59 |

| qwen35moe 35B.A3B Q4_K - Medium | 20.49 GiB | 34.66 B | Vulkan | 99 | tg128 | 102.15 ± 0.70 |

build: 99d4026b1 (9286)

I didn't test vLLM, LM Studio or anything else. Do anybody have some tricks to run it faster or better?

reddit.com
u/Bassmaster187 — 3 hours ago