Intel Arc Pro B70 performance and stability
Are there any other users out there with the B70 and can share some experiences?
I made some tests and this is what I got:
Vulcan on llama.cpp is better than sycl:
C:\Users\mail>"C:\Program Files\llama.cpp-vulcan\llama-bench.exe" -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | Vulkan | 99 | pp512 | 700.44 ± 13.44 |
| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | Vulkan | 99 | tg128 | 27.22 ± 0.07 |
build: 99d4026b1 (9286)
C:\Users\mail>"C:\Program Files\llama.cpp\llama-bench.exe" -hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | SYCL | 99 | pp512 | 315.00 ± 2.66 |
| qwen35 27B Q4_K - Medium | 15.65 GiB | 26.90 B | SYCL | 99 | tg128 | 21.93 ± 0.37 |
build: 47c0eda9d (9279)
Qwen3.5-35B-A3B with SYCL is very unstable:
C:\Users\mail>"C:\Program Files\llama.cpp\llama-bench.exe" -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M
load_backend: loaded RPC backend from C:\Program Files\llama.cpp\ggml-rpc.dll
load_backend: loaded SYCL backend from C:\Program Files\llama.cpp\ggml-sycl.dll
load_backend: loaded CPU backend from C:\Program Files\llama.cpp\ggml-cpu-zen4.dll
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
level_zero backend failed with error: 40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)
Exception caught at file:D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp, line:2954, func:operator()
SYCL error: CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].row_low, dev[i].row_high, src1_ncols, src1_padded_col_size, stream)): Exception caught in this line of code.
in function ggml_sycl_op_mul_mat at D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\ggml-sycl.cpp:2954
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-sycl\..\ggml-sycl\common.hpp:143: SYCL error
with Vulcan you can get 102t/s
C:\Users\mail>"C:\Program Files\llama.cpp-vulcan\llama-bench.exe" -hf unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen35moe 35B.A3B Q4_K - Medium | 20.49 GiB | 34.66 B | Vulkan | 99 | pp512 | 1940.93 ± 91.59 |
| qwen35moe 35B.A3B Q4_K - Medium | 20.49 GiB | 34.66 B | Vulkan | 99 | tg128 | 102.15 ± 0.70 |
build: 99d4026b1 (9286)
I didn't test vLLM, LM Studio or anything else. Do anybody have some tricks to run it faster or better?