# GPU Mesh Voxelizer Part 5: Filling the inner voxels

In this article, we’ll fill the inside voxels of our voxelized mesh. Doing this will complete our GPU-based mesh voxelizer. To do this, we’ll implement the Möller-Trumbore algorithm for ray/triangle intersection. If you haven’t seen the previous parts, you can read them here: part1, part2, part3, and part4.

## How to fill the inside voxels

I’ll describe the algorithm before we begin. For every voxel, we cast a ray in any direction to find the first triangle intersection. If the dot product of the ray and the triangle normal is greater than zero, it implies that we’re exiting the mesh. That means we must’ve started inside the mesh. I’m not sure if this algorithm covers every possible edge case. However, from what I can tell, it seems to work well enough. Initially, I used a different method, but I had to swap it out at the last minute when I discovered voxels that were mistaken to be inside the mesh.

Now that we understand the goal, how can we achieve it? Well, we can’t perform raycasts on the GPU yet, so let’s start there. Let’s implement the Möller-Trumbore algorithm.

## The Möller-Trumbore algorithm

What’s the Möller-Trumbore algorithm? It’s a fast way to compute ray/triangle intersection, invented by Tomas Möller and Ben Trumbore. I won’t explain all the math behind it, but I’ll leave some resources at the end of this article if you’re interested. Since I’m not going to explain the math, I’ll go ahead and drop the function here.

```
// VoxelizeMesh.compute
bool IntersectsRayTriangle(float3 rayOrigin, float3 rayDirection, Triangle tri, out float hitDistance, out float3 hitNormal)
{
const float epsilon = 0.0000001;
float3 ab = tri.b - tri.a;
float3 ac = tri.c - tri.a;
float3 h = cross(rayDirection, ac);
float a = dot(ab, h);
if (a > -epsilon && a < epsilon) return false;
float f = 1.0 / a;
float3 s = rayOrigin - tri.a;
float u = f * dot(s, h);
if (u < 0.0 || u > 1.0) return false;
float3 q = cross(s, ab);
float v = f * dot(rayDirection, q);
if (v < 0.0 || u + v > 1.0) return false;
float t = f * dot(ac, q);
float3 bc = tri.c - tri.b;
hitNormal = cross(ab, bc);
hitDistance = t;
return t > epsilon;
}
```

To use this function, we’ll write a new kernel in the compute shader. That’s because I decided to break up the process into two steps. First, we calculate the shell of the mesh, and then we fill the volume. Breaking it up made it easier to write and debug, but it would be interesting to combine them and test the performance difference. So let’s get to it.

## The Fill Volume kernel

Let’s add the new kernel to the `VoxelizeMesh.compute`

file. By the way, if you don’t want to follow along, I’ll link the project at the end of this article.

```
#pragma kernel VoxelizeMesh
```**#pragma kernel FillVolume**
...
**[numthreads(4,4,4)]
void FillVolume(uint3 id : SV_DispatchThreadID)
{
if (id.x >= _GridWidth || id.y >= _GridHeight || id.z >= _GridDepth)
return;
const int voxelIndex = id.x + _GridWidth * (id.y + _GridHeight * id.z);
if (_Voxels[voxelIndex].isSolid > 0.0)
return;
}**
...

Here’s the boilerplate out of the way. We added a new kernel, and in that kernel, we exit early if we’re outside the bounds of the voxel grid or if the voxel is already solid. So now we have to start raycasting against triangles. Typically we would do a broad phase first to rule out as many triangles as possible quickly, then raycast against what’s left. However, for simplicity, I’m raycasting against every triangle for each voxel. So that means there’s plenty of room for optimization. Luckily GPUs are very fast, which means we can get away with this for now. So, let’s iterate over every triangle and find the closest hit.

```
...
const float cellSize = _CellHalfSize * 2.0;
float3 rayOrigin = mul(_WorldToLocalMatrix, float4(
id.x * cellSize + _CellHalfSize + _BoundsMin.x,
id.y * cellSize + _CellHalfSize + _BoundsMin.y,
id.z * cellSize + _CellHalfSize + _BoundsMin.z, 1.0));
float3 rayDirection = float3(0, 0, 1);
float minDistance = 10000000.0;
float3 minDistanceNormal;
bool didIntersect = false;
for (int i = 0; i < _TriangleCount; i += 3)
{
Triangle tri;
tri.a = _MeshVertices[_MeshTriangleIndices[i]];
tri.b = _MeshVertices[_MeshTriangleIndices[i + 1]];
tri.c = _MeshVertices[_MeshTriangleIndices[i + 2]];
float3 outNormal;
float distance;
if (IntersectsRayTriangle(rayOrigin, rayDirection, tri, distance, outNormal))
{
if (distance < minDistance)
{
minDistance = distance;
minDistanceNormal = outNormal;
didIntersect = true;
}
}
}
...
```

We start by calculating the origin of the ray. Remember, we’re running one thread per voxel. With one ray per thread, each ray is at the center of a voxel. Then, we continue by choosing a ray direction and initializing our minimum distance. By the way, it doesn’t matter what direction you pick because when you’re inside the mesh, every direction will lead out before going back in. Then we check every triangle for a ray intersection and store the normal of the closest hit. Then let’s finish by checking the dot product of the normal and the ray. If the dot product is greater than 0, then our ray points in more-or-less the same direction as the triangle normal, which implies it’s exiting the mesh. If the ray is leaving the mesh, then we can assume this voxel is inside the mesh.

```
...
if (didIntersect && dot(minDistanceNormal, rayDirection) > 0.0)
_Voxels[voxelIndex].isSolid = 1.0;
}
```

Here’s the entire function, so it’s easier to read.

```
[numthreads(4,4,4)]
void FillVolume(uint3 id : SV_DispatchThreadID)
{
if (id.x >= _GridWidth || id.y >= _GridHeight || id.z >= _GridDepth)
return;
const int voxelIndex = id.x + _GridWidth * (id.y + _GridHeight * id.z);
if (_Voxels[voxelIndex].isSolid > 0.0)
return;
const float cellSize = _CellHalfSize * 2.0;
float3 rayOrigin = mul(_WorldToLocalMatrix,
float4(id.x * cellSize + _CellHalfSize + _BoundsMin.x,
id.y * cellSize + _CellHalfSize + _BoundsMin.y,
id.z * cellSize + _CellHalfSize + _BoundsMin.z, 1.0));
float3 rayDirection = float3(0, 0, 1);
float minDistance = 10000000.0;
float3 minDistanceNormal;
bool didIntersect = false;
for (int i = 0; i < _TriangleCount; i += 3)
{
Triangle tri;
tri.a = _MeshVertices[_MeshTriangleIndices[i]];
tri.b = _MeshVertices[_MeshTriangleIndices[i + 1]];
tri.c = _MeshVertices[_MeshTriangleIndices[i + 2]];
float3 outNormal;
float distance;
if (IntersectsRayTriangle(rayOrigin, rayDirection, tri, distance, outNormal))
{
if (distance < minDistance)
{
minDistance = distance;
minDistanceNormal = outNormal;
didIntersect = true;
}
}
}
if (didIntersect && dot(minDistanceNormal, rayDirection) > 0.0)
_Voxels[voxelIndex].isSolid = 1.0;
}
```

That wraps up the GPU side of things. All we have to do is call our kernel from our C# script. Over in `VoxelizedMesh.cs`

add this block to the end of `VoxelizeMeshWithGPU()`

, after dispatching the first kernel.

```
var volumeKernel = _voxelizeComputeShader.FindKernel("FillVolume");
_voxelizeComputeShader.SetBuffer(volumeKernel, Voxels, _voxelsBuffer);
_voxelizeComputeShader.SetBuffer(volumeKernel, "_MeshVertices",
_meshVerticesBuffer);
_voxelizeComputeShader.SetBuffer(volumeKernel, "_MeshTriangleIndices",
_meshTrianglesBuffer);
_voxelizeComputeShader.GetKernelThreadGroupSizes(voxelizeKernel,
out xGroupSize, out yGroupSize, out zGroupSize);
_voxelizeComputeShader.Dispatch(volumeKernel,
Mathf.CeilToInt(xGridSize / (float) xGroupSize),
Mathf.CeilToInt(yGridSize / (float) yGroupSize),
Mathf.CeilToInt(zGridSize / (float) zGroupSize));
```

If you’ve followed along this far, then this won’t be surprising in any way. We get the kernel, set the buffers, get the group sizes, and dispatch the kernel. By the way, you might notice I didn’t set the variables that aren’t buffers. That’s because non-buffer variables are set per compute shader, whereas buffers are set per kernel. Since our new kernel is inside the same compute shader as the previous one, we don’t have to set those variables a second time.

## Wrapping Up

Barring any bugs I haven’t encountered, that concludes this GPU voxelizer. I think it’s worth mentioning that it took about an afternoon of work to implement a naive CPU voxelizer, and five weeks to implement this one. On the other hand, the GPU voxelizer can run in real-time, depending on your machine and the voxel size, and that’s neat. Next we’ll explore what kind of tools we can build using this voxelizer.

**Explore the complete project ****here**** on GitHub. If you’d like more detail about the Möller-Trumbore algorithm, you can read about it ****here****. If you want to see what we can do with this voxelizer, ****join my mailing list**** to be notified when the next part is released.**