u/thekhronosgroup — reddlx

New Tutorial: Advanced Vulkan Compute -- The Power of Parallelism

"Unlock the GPU as a general-purpose engine, not just a rasterizer."

This series takes you past `vkCmdDispatch` and into how compute actually executes on real hardware — occupancy, latency hiding, the Vulkan memory model, and subgroup operations that let invocations talk to each other without touching global memory.

* Vulkan 1.4 scalar layouts, shared memory (LDS), and memory consistency deep-dives

* Subgroup partitioning and non-uniform indexing — the "hidden power" most tutorials skip

* Run OpenCL kernels on top of Vulkan for a heterogeneous compute ecosystem

* Indirect dispatch, GPU-driven pipelines, and async compute orchestration

* Cooperative matrices, performance auditing, and AI-assisted compute diagnostics

* Dedicated coverage of mobile and embedded compute constraints

https://docs.vulkan.org/tutorial/latest/Advanced_Vulkan_Compute/introduction.html

reddit.com

u/thekhronosgroup — 18 hours ago

▲ 19 r/vulkan

Streamlining Resource Binding with End-to-End Support for Vulkan Descriptor Heaps

This blog from NVIDIA describes the new descriptor heap feature in Vulkan that refactors resource binding from the ground up, addressing long-standing user feedback to streamline and bring greater parity to how it works in Direct3D 12 (D3D12). This post highlights what descriptor heaps add, how they compare to descriptor sets, and how to get started.

https://developer.nvidia.com/blog/streamlining-resource-binding-with-end-to-end-support-for-vulkan-descriptor-heaps/

u/thekhronosgroup — 8 days ago

▲ 23 r/GraphicsProgramming+1 crossposts

Shader Ecosystem Survey (due July 10)

The Khronos Group is running a survey on the broader shader ecosystem — languages, tools, pain points, and where standardization is needed most. If you write shading code in any language (Slang, HLSL, GLSL, WGSL, etc.), your input helps shape where Khronos focuses next.

Results will be shared at the SIGGRAPH Real-Time Shading BOF this year.

Deadline: July 10

Take the survey: https://www.surveymonkey.com/r/LKWFQ3M

It's anonymous, takes a few minutes, and covers everything from debugging/profiling pain points to what you think the future of shader standards should look like.

u/thekhronosgroup — 11 days ago

▲ 13 r/webgl+1 crossposts

Call for Participation: WebGL+WebGPU BOF at SIGGRAPH 2026

Khronos will host a WebGL+WebGPU BOF at SIGGRAPH 2026 in Los Angeles on Wednesday, July 22 at 9:00 AM. (Full schedule coming soon.) Do you have a product or demo based on these APIs that you would like to share? If so, please email events@khronosgroup.org and we'll get you on the agenda.

reddit.com

u/thekhronosgroup — 1 month ago

▲ 39 r/vulkan

New Sample: Shader Execution Reordering (SER)

Khronos Group's Vulkan Working Group has published a new sample demonstrating Shader Execution Reordering (SER) via the VK_EXT_ray_tracing_invocation_reorder extension.

Ray tracing workloads suffer when adjacent rays hit different materials, invoking different shaders and scattering memory access across geometry, textures, and acceleration structures. SER tackles this head-on by separating ray traversal from shader invocation, giving the GPU an opportunity to reorder threads for better coherency before execution begins.

The new sample features an interactive scene with three material types (diffuse, refraction, and emissive) specifically designed to maximize divergence, with a live toggle to compare SER on vs. off. Real-world path tracing workloads have seen 11-24% gains, with synthetic high-divergence scenarios showing 40-50% improvement.

Key topics covered in the sample:

Hit objects and the reorderThreadEXT() / ReorderThread() pattern
Coherence hints to guide reordering by material or instance
Minimizing live state across reorder calls for maximum benefit
Device capability detection and backward compatibility

Shaders are authored in Slang by default (GLSL reference files included), compiling to SPIR-V via the Slang compiler.

Explore the sample: https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/extensions/ray_tracing_invocation_reorder

reddit.com

u/thekhronosgroup — 1 month ago

▲ 24 r/vulkan

New Vulkan sample: Rasterization Order Attachment Access

This sample demonstrates VK_EXT_rasterization_order_attachment_access, which enables framebuffer attachment reads from one fragment to the next in rasterization order, without requiring explicit synchronization or subpass self-dependencies. Techniques like programmable blending become far more practical.

The sample pairs the extension with VK_KHR_dynamic_rendering and VK_KHR_dynamic_rendering_local_read to show how to implement framebuffer fetch with guaranteed fragment ordering using modern Vulkan patterns.

Explore the sample: https://github.com/KhronosGroup/Vulkan-Samples/tree/main/samples/extensions/rasterization_order_attachment_access

reddit.com

u/thekhronosgroup — 2 months ago

▲ 21 r/vulkan

Tile Based Rendering Best Practices

Are you optimizing Vulkan applications for mobile or tile-based GPUs? The Vulkan Guide has a dedicated section on Tile Based Rendering (TBR) best practices worth bookmarking.

Unlike traditional immediate-mode GPUs, tile-based architectures process the framebuffer in small screen regions, keeping work on fast on-chip memory before writing results to main system memory. For Vulkan developers, this means memory bandwidth is often the dominant performance factor.

Key takeaways from the guide:

Use load/store ops intentionally. Render pass attachment configuration is your primary tool for controlling bandwidth. Setting the right loadOp and storeOp values tells the driver whether to clear, load, or discard data, directly impacting whether data has to travel off-chip.
Keep depth and stencil transient. If depth and stencil buffers are not needed after a render pass, mark them with LOAD_OP_CLEAR and STORE_OP_DONT_CARE. This allows the driver to keep them entirely on-chip and avoid the cost of writing them back to external memory.
Favor compact pixel formats. On-chip tile memory is fixed in size. Smaller bit-depth formats allow the hardware to fit more data per tile, reducing spills to external memory and improving efficiency.
Optimize for the binning pass. Tilers process geometry twice: once to bin triangles into tiles, then again to shade pixels. Separating vertex positions from other attributes like UVs and normals lets the GPU read only what it needs during binning, cutting unnecessary bandwidth.
Provide clear intent to the driver. Because Vulkan abstracts hardware details like tile size, the best way to optimize is to use correct render pass configurations and memory flags so the driver can make informed decisions on your behalf.

The full guide covers these topics in depth, including guidance on MSAA, transient attachments, and dynamic rendering:

https://docs.vulkan.org/guide/latest/tile_based_rendering_best_practices.html

reddit.com

u/thekhronosgroup — 2 months ago

▲ 34 r/vulkan

Khronos has released the Vulkan SC SDK

Vulkan SC Working Group and RasterGrid have released the Vulkan SC SDK, bringing together the full set of tools for safety-critical Vulkan development into a single installer for Linux and Windows. What's included: Loader, Validation Layers, Device Simulation Layer, Pipeline Cache Compiler integration, CMake support, the VulkanSC-cube sample, and more. The goal was to reduce setup friction significantly with automatic environment configuration, better CMake integration, and automatic PCC discovery are all part of it.

Feedback and contributions are very much encouraged. Drop thoughts in the GitHub Discussions thread or on Discord -- the Working Group is actively listening.

https://khr.io/1o0

u/thekhronosgroup — 2 months ago

▲ 9 r/HPC

The IWOMP 2026 Call for Papers is open.

The 22nd International Workshop on OpenMP takes place October 7-9, 2026 at TU Wien in Vienna, Austria. The theme this year is "OpenMP: Adaptability for Heterogeneous Multi-Device Systems."

Topics of interest include accelerated computing and offloading, performance portability, machine learning with OpenMP, runtime environments, tasking, vectorization, memory management, and more.

Submissions are limited to 12 pages (excluding references). Accepted papers will be published in Springer's Lecture Notes in Computer Science (LNCS) series.

Submission deadline: May 29, 2026 (AoE)

Learn more and submit: https://www.iwomp.org/call-for-papers/

u/thekhronosgroup — 2 months ago