
r/VoxelGameDev

Customizable Terrain Builder
Built with help from Claude. I mainly used Three.js to create a randomized 3D polygon mesh using random divisions on a square grid. Slapped a basic slope based lighting on top for better view. Source code on GitHub: https://github.com/Hegho/Terrain-Creator
Greedy Meshing with Vertex Pulling is SLOWER to render?
I finally got down to implementing vertex pulling.
My current setup (pre-VP) of uploading already greedy meshed chunks to the GPU and storing vertex data as follows:
struct PackedVoxelVertex {
uint16_t x, y, z; // local pos within chunk, only using first 9 bits in each to represent positions from 0 to 256.
uint16_t w; // only using first 8 bits for normal and block type, the rest is padding.
};
I know the layout is far from ideal and I am wasting some memory here, but I thought it's temporary anyway as I would eventually switch to vertex pulling. The actual upload code in OpenGL looks like that:
GLuint VAO, VBO, IBO;
void setupStandardGreedy(const std::vector<PackedVoxelVertex>& verts, const std::vector<uint32_t>& indices) {
glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
glGenBuffers(1, &IBO);
glBindVertexArray(VAO);
glBindBuffer(GL_ARRAY_BUFFER, VBO);
glBufferData(GL_ARRAY_BUFFER, verts.size() * sizeof(PackedVoxelVertex), verts.data(), GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, IBO);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, indices.size() * sizeof(uint32_t), indices.data(), GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribIPointer(0, 4, GL_UNSIGNED_SHORT, sizeof(PackedVoxelVertex), (void*)0);
glBindVertexArray(0);
}
void drawStandardGreedy(GLsizei indexCount) {
glBindVertexArray(VAO);
glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
}
and vertex code reading the data just unpacks it:
#version 430 core
layout (location = 0) in uvec4 aVertex; // VAO
layout (location = 0) out vec3 WorldPos;
layout (location = 1) flat out uint vNormalDir;
layout (location = 2) flat out uint vBlockType;
layout (std140, row_major) uniform SceneData {
mat4 view;
mat4 proj;
vec4 lodColor;
vec4 cameraPos;
vec4 normalBlendParams;
vec4 chunkOrigin;
};
void main() {
vec3 localPos = vec3(float(aVertex.r), float(aVertex.g), float(aVertex.b));
vec3 worldPos = chunkOrigin.xyz + localPos * chunkOrigin.w;
uint normalDir = aVertex.a & 0x7u;
uint blockType = (aVertex.a >> 3u) & 0x1Fu;
gl_Position = vec4(worldPos, 1.0) * view * proj;
WorldPos = worldPos;
vNormalDir = normalDir;
vBlockType = blockType;
}
Now I read about vertex pulling and decided to try and adapt my code to it. Most examples online were about drawing a single side of the voxel, not a greedy meshed face, so I had to adapt. In the end, instead of sending 16*4 = 64b per face, I started sending just 8b:
struct PackedVPFace {
uint8_t posX; // local X (0-255)
uint8_t posY; // local Y (0-255)
uint8_t posZ; // local Z (0-255)
uint8_t dimW; // width-1 (0-255)
uint8_t dimH; // height-1 (0-255)
uint8_t normalDir; // normal dir (0-5)
uint8_t blockType;
uint8_t padding;
};
The process of uploading that data to the GPU is as follows:
GLuint dummyVAO;
GLuint SSBO;
GLuint sharedIBO;
void setupGreedyVP(const std::vector<uint64_t>& faces) {
glGenVertexArrays(1, &dummyVAO);
glGenBuffers(1, &SSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, SSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, faces.size() * sizeof(uint64_t), faces.data(), GL_STATIC_DRAW);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
glGenBuffers(1, &sharedIBO);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, sharedIBO);
std::vector<uint32_t> vpIB;
vpIB.reserve(faces.size() * 6);
for (uint32_t f = 0; f < faces.size(); ++f) {
uint32_t v = f * 4;
vpIB.push_back(v); vpIB.push_back(v+2);
vpIB.push_back(v+1); vpIB.push_back(v+1);
vpIB.push_back(v+2); vpIB.push_back(v+3);
}
glBufferData(GL_ELEMENT_ARRAY_BUFFER, vpIB.size() * sizeof(uint32_t), vpIB.data(), GL_STATIC_DRAW);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0);
}
void drawGreedyVP(GLsizei faceCount) {
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, SSBO);
glBindVertexArray(dummyVAO);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, sharedIBO);
glDrawElements(GL_TRIANGLES, faceCount * 6, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
}
And the process of recreating vertices on the GPU is:
#version 430 core
layout (std430, binding = 0) buffer FaceBuffer { uvec2 faces[]; };
layout (location = 0) out vec3 WorldPos;
layout (location = 1) flat out uint vNormalDir;
layout (location = 2) flat out uint vBlockType;
layout (std140, row_major) uniform SceneData {
mat4 view;
mat4 proj;
vec4 lodColor;
vec4 cameraPos;
vec4 normalBlendParams;
vec4 chunkOrigin;
};
// i=normalDir*4+corner
const vec3 BASE_OFFSETS[6] = vec3[](
vec3(0,1,0), vec3(0,0,0), vec3(1,0,0),
vec3(0,0,0), vec3(0,0,1), vec3(0,0,0)
);
const vec3 U_SCALES[24] = vec3[](
vec3(0,0,0), vec3(0,0,0), vec3(1,0,0), vec3(1,0,0), // 0 (+Y)
vec3(0,0,0), vec3(0,0,0), vec3(1,0,0), vec3(1,0,0), // 1 (-Y)
vec3(0,0,1), vec3(0,0,1), vec3(0,0,0), vec3(0,0,0), // 2 (+X)
vec3(0,0,0), vec3(0,0,0), vec3(0,0,1), vec3(0,0,1), // 3 (-X)
vec3(0,0,0), vec3(1,0,0), vec3(0,0,0), vec3(1,0,0), // 4 (+Z)
vec3(1,0,0), vec3(0,0,0), vec3(1,0,0), vec3(0,0,0) // 5 (-Z)
);
const vec3 V_SCALES[24] = vec3[](
vec3(0,0,1), vec3(0,0,0), vec3(0,0,1), vec3(0,0,0), // 0 (+Y)
vec3(0,0,1), vec3(0,0,0), vec3(0,0,1), vec3(0,0,0), // 1 (-Y)
vec3(0,0,0), vec3(0,1,0), vec3(0,0,0), vec3(0,1,0), // 2 (+X)
vec3(0,0,0), vec3(0,1,0), vec3(0,0,0), vec3(0,1,0), // 3 (-X)
vec3(0,1,0), vec3(0,1,0), vec3(0,0,0), vec3(0,0,0), // 4 (+Z)
vec3(0,1,0), vec3(0,1,0), vec3(0,0,0), vec3(0,0,0) // 5 (-Z)
);
void main() {
uint faceIndex = uint(gl_VertexID) >> 2u;
uint corner = uint(gl_VertexID) & 3u;
uvec2 face = faces[faceIndex];
vec4 p0 = unpackUnorm4x8(face.x) * 255.0;
vec3 localPos = p0.xyz;
float W = p0.w + 1.0;
float H = float(face.y & 0xFFu) + 1.0;
uint normalDir = (face.y >> 8u) & 0x7u;
uint blockType = (face.y >> 16u) & 0x1Fu;
int lutIndex = (int(normalDir) << 2) + int(corner);
vec3 worldPos = chunkOrigin.xyz + (localPos + BASE_OFFSETS[normalDir] + U_SCALES[lutIndex] * W + V_SCALES[lutIndex] * H) * chunkOrigin.w;
gl_Position = vec4(worldPos, 1.0) * view * proj;
WorldPos = worldPos;
vNormalDir = normalDir;
vBlockType = blockType;
}
I went through several iterations with the shader code, initially involving a shitton of branching and eventually coming to this layout to abuse unpackUnorm and vector multiplication. Given fragment shader is identical, this is as good as I got (I could get rid of base offset LUT to send 9 bits per location axis but I would need to do extra ALU so I am not sure if it would make any meaningful difference).
I benchmarked both methods switched in runtime to see the FPS on three devices, high end, middle end and lower end. Same scene, same resolution, same everything, just different code executing to send the vertices to the GPU and read them there. My engine is GPU bound so any changes in FPS are equivalent to changes in GPU times. Results are as follows:
- On my high end machine (4090) vertex pulling gave about 5-7% improvement in raw FPS, giving me 1690 FPS instead of 1610. Not that I needed it, but just to note that the algorithm did work on some hardware.
- On a 3060, the difference was within noise (1-2%), it was not obvious whether vertex pulling was winning or not.
- On an integrated GPU (i3-10110U's UHD Graphics) vertex pulling resulted in ~15% REDUCTION in raw FPS compared to just sending vertices directly.
I always hear vertex pulling mentioned as an optimization, and it makes sense on paper - I am sending 1/4th of the data per face, and even with 2.7x more instructions, I should be saving in total, but as measurements show, this is clearly not the case.
Can someone explain to me what might be at hand here and can I do something about it to make VP actually act better on lower end harware?
🌿Tropical Lake House model 🌿 -Made in #magicavoxel Free Download from my Gumroad.
Godot opti voxel V2 (GPU)
Hey everyone! About a month ago, I posted on this sub to share the very first version of my Godot voxel library:
https://www.reddit.com/r/VoxelGameDev/comments/1spww15/yet_another_voxel_library_on_godot/
Since then, I've continued working on it here: https://github.com/aobayama-gaming/opti-voxel
Today, I just wanted to share a new milestone: I've ported my mesher over to the GPU! (I'm currently using an optimized Surface Nets implementation). To do this, I used native Godot capabilities, specifically GDExtension and hacking my way through the RenderingServer and compute shaders.
I'm still experiencing some stutter, and I'm not entirely happy with the CPU-GPU data transfer overhead (even though I'm using highly compressed datatypes).
The short video I'm sharing tonight shows the mesher pushed to its absolute limits (using 5GB of VRAM and processing batches of 1024 chunks) generating mountains and caverns.
I hope looking at my code helps some of you wrap your heads around Godot's compute shader pipeline and memory passing, which can be pretty confusing and poorly documented!
How to balance shader work vs bandwidth?
I am working on a small scale voxel engine and currently just trying to push rendering distance to its absolute limits.
One of the optimisations I hear often is reducing the amount of data sent to the GPU. So I reduced my vertex buffer 7x to 4 bytes (32 bits) by storing local chunk coordinates instead of float global coord, packing normal vector into first 3 bits of a byte (as it can only ever have 6 values) and using the rest for block type.
But the work I had to do in a shader to decode those values ended up resulting in (slightly but still) worse performance than when sending all the data raw, at least on my high end GPU.
Is there a rule of thumb somewhere about how much to send vs what to delegate to a shader? Is less bandwidth always better or does it only start to become an issue once you reach certain amount of data sent? Is this balance any different on lower end GPUs, and I will feel the optimisation if I benchmark on a different machine?
Sorry if the question is stipud, I’m just a beginner.
Procedural tree collapse
Procedurally collapsing structures is a good way to stylistically control chopping down trees, something that was always a bit underwhelming for me in Minecraft. I think this is a good compromise. It's not limited to trees nor harcoded, any detached set of blocks can tumble if the center of mass is over the "contact point".
How to combat extreme Moire pattern when generating terrain with extremely small voxels?
This is on a 4k screen. MSAA helps a bit, using LOD chunks with larger voxels helps further but if I decrease the LOD distance to the point where Moire disappears, the pop-in of LODS becomes obvious. Any other solutions I am not thinking of?
Chunk seams in a cross-platform Rust voxel mesher are humbling
First serious voxel mesher I’ve worked on.
The mesher is Rust and builds to browser (wasm), iOS (xcframework), and native desktop.
The thing that surprised me most: chunk seams in Dual Contouring / MDC are way nastier than the papers make them look.
Tiny QEF differences across chunk boundaries were enough to create little cracks/open edges.
After digging through a bunch of voxel projects (fast-surface-nets-rs, Veloren, godot_voxel, etc.), I kept seeing the same pattern:
- padded SDF
- halo Hermite data
- single-owner boundary edges
So the Rust port is converging on the same setup with a 73³ Hermite-data cache for baked chunks.
Also lost an embarrassing amount of time because my wireframe debug overlay was gated on brush authority instead of actual terrain rendering, so it hid almost every important LOD/padding chunk I needed to inspect :)
Debug visualization bugs are evil
WebGPU Hexagonal Prism Voxel Test
I tried making a procedural generation game using WebGPU, without much prior experience. This is what I got. Probably not very good, but I was trying my best.
What do you think of using hexagonal prisms instead of cubes?
I got my most amount of Wishlists from a static image and a £50 campaign
Hey folks - creator of Luminids (www.luminids.com) here.
Wanted to share a good result and how I did it: 500 wish lists from a £50 campaign on Reddit.
I've experimented a lot with short form media and content across all the major social platforms - this so far is the best ROI for me. Shockingly, it wasn't a video; it was a static image - an infographic I made of the game, with some variable text adds (A/B stuff on reddit) being shown.
In a world where everyone is so desperate to create videos and reels, it seems people are exhausted with so much video content, and surprisingly a simple static image has been my best performer.
I know it sounds like common sense, but surprisingly I thought static images would perform the worst and didn't experiment even.. Hope this helps folks out there pushing their game!
Noob question, in a Voxel + SDF in subpixel space, how would you reduce the cost of grazing angles?
Legend:
- Blue: less expensive, raymarch converges faster
- Red: more expensive, raymarch converges slower
Just found a very simple almost stupid optimisation in HC's voxel renderer
On trace alone, we get from 25 to 36, or 44% increase! Not sure if 44% is reasonable in every scenario, but that's still something that good to take!
I found an opportunistic way to optimize the raymarched renderer that was fairly simple. I never tested whether a ray exited the voxel domain when it entered one!
So basically, when a ray entered the voxel bounds and wasn't touching any shape, it would eventually leave the bounds, but would continue evaluating the SDF function for... nothing.
That's crazy, thinking this simple optimization was there all this time 🤯
(blue screenshots are render cost per pixel, blue fast, red slow)
Potentially, voxel game about bioms and terraforming
WGPU+WGSL, pretty usual tech stack :)
Grid 1024x256x1024, two triangles in the scene (I'm lying, there's more because of UI)
Nothing special, compared to other people in this subreddit, but I'm pretty happy with the results so far. Happy enough to try to make a full-fledged game from this.
just a fun interaction between trees growing and a very strong magnet pulling them apart in my cellular automata voxel engine
Voxel Vendredi 15 May 2026
This is the place to show off and discuss your voxel game and tools. Shameless plugs, links to your game, progress updates, screenshots, videos, art, assets, promotion, tech, findings and recommendations etc. are all welcome.
- Voxel Vendredi is a discussion thread starting every Friday - 'vendredi' in French - and running over the weekend. The thread is automatically posted by the mods every Friday at 00:00 GMT.
- Previous Voxel Vendredis
Voxels, Cars and Excavators in Javascript
Buildings and roads generated with OpenStreetMap data.
Marching cube meshes generated in Javascript, rendered with BabylonJS.
C# Voxel Engine with Vulkan: Mantle
I have now been tinkering on this voxel engine of mine named Mantle for a good 8 weeks, and it's finally getting somewhere.
I myself love to mod games so after creating several modding tools for different games it's now time to create an engine that is build around modability.
While I use C#, mods won't be written in it. I simply went with C# as it is the language I am best and have the most experience with, and as my engine is compiled to Native AOT the performance is also very close to C++.
To modify a game made in Mantle you can either edit the config files, which are just plain yaml, or add your own game logic in lua. All changes will have an immediate effect if the engine is run with the `-debug` flag. The same goes to any modification of any asset.
So working on the games or in extension a - mods data does never require an engine restart.
I also just finished the first handful of UiElements for my custom UI Framework, which integrates directly into my localization system supporting dynamic values, different plural forms as well as reactive, allocation less queries to the game state.
With those two systems combined it is possible to create different UI-Screens in a simple .yaml file.
While there is plenty of work to be done it's really nice to see it finally coming together.
What do you think of my approach to modability as a feature and what would u make differently?
Current Capabilities:
- Bindless textures
- Raycasting for scene interaction
- Custom Ui Framework
- Hot reloading of any asset / game data
- Zero-Allocation Data Binding for localisation
The current engine viewport with some custom ui elements
The definition of the UI Elements visible in the screenshot of the viewport
The localisation used in the UI Elements visible in the screenshot of the viewport