Sand Balls Mechanics Implementation: The Best Way To Deform A Mesh In Unity
In this post you will learn how to deform a mesh in Unity using a real-world example. We will apply different techniques and measure performance of each approach using performance testing to find the fastest one. As a sample, we will look at mesh deformation in the Sand Balls game and reimplement its mechanics.
It has a very simple concept — you need to clear the way for balls:
I had the idea of how it might be implemented. And the goal of this post is not only to share my implementation but also to find out how naive implementation could be to have a stable frame rate on a mobile device. What is the simplest implementation the original developers could get away with?
This post consists of two parts. In Part 1 we will learn 3 different ways how to deform a mesh, starting with the most naive to a more complex solution and hopefully more performant:
- Naive single-threaded implementation
- Jobified naive implementation
- Compute shader with AsyncGPUReadback
- Performance Testing
In Part 2 we will check the MeshData API out, which allows to work with a mesh inside a job. The most interesting point for me is whether it outperforms the compute shader solution in our case or not.
Setup
Firstly let’s set up the scene. We would need a sphere with Rigidbody and SphereCollider components. Then two planes: one for background and one for sand, I used different colors for them. The front plane must have a MeshCollider component with Convex property turned off, so when we deform it the ball could fall through it. Finally, we need to add 2 invisible colliders the size of the plane and place them in front and behind the plane at a distance of a little bit bigger than the ball diameter, so the ball won’t fall off the plane’s mesh collider. This setup is good enough to achieve the desired behavior. You can take a look at the exact scene in the repository for this post on GitHub.
Also, let’s create a base abstract class for the front plane with common properties and an abstract method which we will implement using different approaches to modify the plane mesh. Components derived from this one should be added to the front plane that imitates sand.
public abstract class DeformablePlane : MonoBehaviour { [SerializeField] protected float _radiusOfDeformation = 0.2f; [SerializeField] protected float _powerOfDeformation = 1f; public abstract void Deform(Vector3 positionToDeform); }
Now we need to detect an input event and determine where a player has pressed to remove sand. For that, we should create a new MonoBehaviour and add it to the Camera game object. It simply makes a raycast from the clicked point towards the plane that we would like to deform.
[RequireComponent(typeof(Camera))] public class Deformer : MonoBehaviour { [SerializeField] private DeformablePlane _deformablePlane; [SerializeField] private InputProviderType _inputProviderType; private Camera _camera; private void Awake() { _camera = transform.GetComponent<Camera>(); } private void Update() { if (Input.GetMouseButton(0)) { OnInputReceived(Input.mousePosition); } } private void OnInputReceived(Vector3 position) { var ray = _camera.ScreenPointToRay(position); DeformMesh(ray); } private void DeformMesh(Ray ray) { if (!Physics.Raycast(ray, out var hit)) { return; } _deformablePlane.Deform(hit.point); } }
The Naive Implementation
The simplest solution would be just deforming the mesh when Deform() method is called. We check every vertex if its position is inside the radius of a clicked point on the screen and push it in the Vector3.up direction to move the vertex behind the ground plane and imitate sand being removed.
[RequireComponent(typeof(MeshFilter), typeof(MeshCollider))] public class DeformableMeshPlane : DeformablePlane { private Mesh _mesh; private MeshCollider _collider; private Vector3[] _vertices; private void Awake() { var meshFilter = GetComponent<MeshFilter>(); _collider = GetComponent<MeshCollider>(); _mesh = meshFilter.mesh; _vertices = _mesh.vertices; } public override void Deform(Vector3 positionToDeform) { positionToDeform = transform.InverseTransformPoint(positionToDeform); var somethingDeformed = false; for (var i = 0; i < _vertices.Length; i++) { var dist = (_vertices[i] - positionToDeform).sqrMagnitude; if (dist < _radiusOfDeformation) { _vertices[i] -= Vector3.up * _powerOfDeformation; somethingDeformed = true; } } if (!somethingDeformed) { return; } _mesh.vertices = _vertices; _collider.sharedMesh = _mesh; } }
Adding The Job System
The first improvement would be jobifying the vertices’ position check to iterate over the array in parallel. For that, we need to schedule a new job when Deform() is invoked. And after the job is completed we set vertices back and update the collider.
Firstly I come up with the following job (the source is available on GitHub): it uses only one position of a click, but I quickly noticed that it skips some areas when sweeping rapidly over the screen. So I added the second version of this job that goes over a list of deformation points and checks every vertex whether it is inside the radius of any point:
public struct MultipleDeformationPointsMeshDeformerJob : IJobParallelFor { [ReadOnly] private readonly float _radius; [ReadOnly] private readonly float _power; [ReadOnly] private NativeArray<Vector3> _deformationPoints; public NativeArray<Vector3> Vertices; public MultipleDeformationPointsMeshDeformerJob( float radius, float power, NativeArray<Vector3> vertices, NativeArray<Vector3> deformationPoints) { _radius = radius; _power = power; Vertices = vertices; _deformationPoints = deformationPoints; } public void Execute(int index) { var vertex = Vertices[index]; foreach (var point in _deformationPoints) { var dist = (vertex - point).sqrMagnitude; if (dist < _radius) { vertex -= Vector3.up * _power; Vertices[index] = vertex; } } } }
Instead of scheduling the job right away when Deform() is called we need to add a point to the list. Then schedule the job in Update() when we have any points in the list and no job is scheduled already:
public override void Deform(Vector3 point) { _deformationPoints.Add(transform.InverseTransformPoint(point)); } ... private void Update() { ScheduleJob(); } private void ScheduleJob() { if (_scheduled || _deformationPoints.Length == 0) { return; } _scheduled = true; _job = new MultipleDeformationPointsMeshDeformerJob( _radiusOfDeformation, _powerOfDeformation, _vertices, _deformationPoints); _handle = _job.Schedule(_vertices.Length, 64); }
Well, it looks better, but still, when a cursor moves too fast skipping large areas between frames we could see how there are still gaps between deformation points. It happens not because we are skipping some inputs while a job is not completed, but because a job is scheduled only with 1 point, and the next frame we schedule another job with a new input being far away from the previous one. I have an idea how to fix this, but first I will check performance as is and how the current solution works on mobile, as I don’t expect a finger to go as fast as a mouse on a PC can. If it is an issue, then I will address it in the next part.
Once again I split implementations into different files so it is easier to test different approaches and follow this post. Here is the full script, also available on GitHub JobDeformableMeshPlane.cs:
[RequireComponent(typeof(MeshFilter), typeof(MeshCollider))] public class JobDeformableMeshPlane : DeformablePlane { private Mesh _mesh; private MeshCollider _collider; private NativeArray<Vector3> _vertices; private bool _scheduled; private MultipleDeformationPointsMeshDeformerJob _job; private JobHandle _handle; private NativeList<Vector3> _deformationPoints; public override void Deform(Vector3 point) { _deformationPoints.Add(transform.InverseTransformPoint(point)); } private void Awake() { _mesh = GetComponent<MeshFilter>().mesh; _mesh.MarkDynamic(); _collider = GetComponent<MeshCollider>(); _vertices = new NativeArray<Vector3>(_mesh.vertices, Allocator.Persistent); _deformationPoints = new NativeList<Vector3>(Allocator.Persistent); } private void Update() { ScheduleJob(); } private void LateUpdate() { CompleteJob(); } private void OnDestroy() { _vertices.Dispose(); _deformationPoints.Dispose(); } private void ScheduleJob() { if (_scheduled || _deformationPoints.Length == 0) { return; } _scheduled = true; _job = new MultipleDeformationPointsMeshDeformerJob( _radiusOfDeformation, _powerOfDeformation, _vertices, _deformationPoints); _handle = _job.Schedule(_vertices.Length, 64); } private void CompleteJob() { if (!_scheduled) { return; } _handle.Complete(); _job.Vertices.CopyTo(_vertices); _mesh.vertices = _vertices.ToArray(); _collider.sharedMesh = _mesh; _deformationPoints.Clear(); _scheduled = false; } }
Compute Shader With AsyncGPUReadback
Using the job system to parallelize iteration over the vertices array helps. But it is still done on a CPU, while we have a GPU – a piece of hardware designed exactly for highly parallelized work. As during iteration we may write only to an element with the current index, our data is ideal to be processed in a compute shader. The compute shader is pretty straightforward. It still checks a vertex position of an element with a given index and pushes it behind the plane if it is inside the radius of any clicked point.
#pragma kernel CSMain struct VertexData { float3 position; float3 normal; float2 uv; }; RWStructuredBuffer<VertexData> vertexBuffer; float3 _DeformPosition; float3 _DeformPositions[30]; int _DeformPositionsCount; float _Radius; float _Force; [numthreads(32,1,1)] void CSMain(uint3 id : SV_DispatchThreadID) { float3 pos = vertexBuffer[id.x].position; for (int i = 0; i < _DeformPositionsCount; i++) { const float distance = length(pos - _DeformPositions[i]); pos.y -= _Force * step(distance, _Radius); } vertexBuffer[id.x].position.y = pos.y; }
The MonoBehaviour script becomes a little bit more complicated compared to previous implementations. Let’s start with the initialization:
... // Serialized field to set compute shader via the inspector [SerializeField] private ComputeShader _computeShader; // Fields to cache data private Mesh _mesh; private ComputeBuffer _computeBuffer; private int _kernel; private int _dispatchCount; private NativeArray<VertexData> _vertexData; private AsyncGPUReadbackRequest _request; private bool _isDispatched; private MeshCollider _meshCollider; private readonly List<Vector4> _deformationPoints = new List<Vector4>(10); // Cache properties ids, so the hash is calculated only once instead of each time we access it private readonly int _deformationPointsPropertyId = Shader.PropertyToID("_DeformPositions"); private readonly int _deformationPointsCountPropertyId = Shader.PropertyToID("_DeformPositionsCount"); // Setup all the necessary data private void Awake() { if (!SystemInfo.supportsAsyncGPUReadback) { gameObject.SetActive(false); return; } var meshFilter = GetComponent<MeshFilter>(); _meshCollider = GetComponent<MeshCollider>(); _mesh = meshFilter.mesh; SetKernel(); CreateVertexData(); SetMeshVertexBufferParams(); _computeBuffer = CreateComputeBuffer(); SetComputeShaderValues(); } // Find and cache the kernel index given the name we used in the compute shader. // It is used to dispatch computation later private void SetKernel() { _kernel = _computeShader.FindKernel("CSMain"); _computeShader.GetKernelThreadGroupSizes(_kernel, out var threadX, out _, out _); _dispatchCount = Mathf.CeilToInt(_mesh.vertexCount / threadX + 1); } // Create a NativeArray<VertexData> using a mesh data (vertices, normals, UVs) // It will be passed into the compute shader, modified there and then set back into the mesh private void CreateVertexData() { _vertexData = new NativeArray<VertexData>(_mesh.vertexCount, Allocator.Temp); for (var i = 0; i < _mesh.vertexCount; ++i) { var v = new VertexData { Position = _mesh.vertices[i], Normal = _mesh.normals[i], Uv = _mesh.uv[i] }; _vertexData[i] = v; } } // Create a descriptor so we can use _mesh.SetData<VertexData>(...) to update the mesh private void SetMeshVertexBufferParams() { var layout = new[] { new VertexAttributeDescriptor(VertexAttribute.Position, _mesh.GetVertexAttributeFormat(VertexAttribute.Position), 3), new VertexAttributeDescriptor(VertexAttribute.Normal, _mesh.GetVertexAttributeFormat(VertexAttribute.Normal), 3), new VertexAttributeDescriptor(VertexAttribute.TexCoord0, _mesh.GetVertexAttributeFormat(VertexAttribute.TexCoord0), 2), }; _mesh.SetVertexBufferParams(_mesh.vertexCount, layout); } private void SetComputeShaderValues() { _computeShader.SetBuffer(_kernel, "vertexBuffer", _computeBuffer); _computeShader.SetFloat("_Force", _powerOfDeformation); _computeShader.SetFloat("_Radius", _radiusOfDeformation); } private ComputeBuffer CreateComputeBuffer() { var computeBuffer = new ComputeBuffer(_mesh.vertexCount, 32); if (_vertexData.IsCreated) { computeBuffer.SetData(_vertexData); } return computeBuffer; } ... }
Now in a similar fashion to how we use jobs, prepare data, request an async operation in Update, and try to complete it in LateUpdate():
private void Update() { Dispatch(); } private void LateUpdate() { GatherResult(); } private void Dispatch() { if (_deformationPoints.Count == 0) { return; } if (_isDispatched) { return; } _isDispatched = true; _computeShader.SetVectorArray(_deformationPointsPropertyId, _deformationPoints.ToArray()); _computeShader.SetInt(_deformationPointsCountPropertyId, _deformationPoints.Count); _computeShader.Dispatch(_kernel, _dispatchCount, 1, 1); _deformationPoints.Clear(); // Dispatch a request _request = AsyncGPUReadback.Request(_computeBuffer); } private void GatherResult() { if (!_isDispatched || !_request.done || _request.hasError) { return; } _isDispatched = false; _vertexData = _request.GetData<VertexData>(); _mesh.MarkDynamic(); _mesh.SetVertexBufferData(_vertexData, 0, 0, _vertexData.Length); _meshCollider.sharedMesh = _mesh; _request = AsyncGPUReadback.Request(_computeBuffer); }
As usual here is the full script which is also available on GitHub ComputeShaderAsyncGpuReadbackDeformablePlane.cs:
[RequireComponent(typeof(MeshFilter), typeof(MeshCollider))] public class ComputeShaderAsyncGpuReadbackDeformablePlane : DeformablePlane { [SerializeField] private ComputeShader _computeShader; private Mesh _mesh; private ComputeBuffer _computeBuffer; private int _kernel; private int _dispatchCount; private NativeArray<VertexData> _vertexData; private AsyncGPUReadbackRequest _request; private bool _isDispatched; private MeshCollider _meshCollider; private readonly List<Vector4> _deformationPoints = new List<Vector4>(30); private readonly int _deformationPointsPropertyId = Shader.PropertyToID("_DeformPositions"); private readonly int _deformationPointsCountPropertyId = Shader.PropertyToID("_DeformPositionsCount"); public override void Deform(Vector3 positionToDeform) { var point = transform.InverseTransformPoint(positionToDeform); _deformationPoints.Add(point); } private void Awake() { if (!SystemInfo.supportsAsyncGPUReadback) { gameObject.SetActive(false); return; } var meshFilter = GetComponent<MeshFilter>(); _meshCollider = GetComponent<MeshCollider>(); _mesh = meshFilter.mesh; SetKernel(); CreateVertexData(); SetMeshVertexBufferParams(); _computeBuffer = CreateComputeBuffer(); SetComputeShaderValues(); } private void Update() { Dispatch(); } private void LateUpdate() { GatherResult(); } private void SetKernel() { _kernel = _computeShader.FindKernel("CSMain"); _computeShader.GetKernelThreadGroupSizes(_kernel, out var threadX, out _, out _); _dispatchCount = Mathf.CeilToInt(_mesh.vertexCount / threadX + 1); } private void CreateVertexData() { _vertexData = new NativeArray<VertexData>(_mesh.vertexCount, Allocator.Temp); for (var i = 0; i < _mesh.vertexCount; ++i) { var v = new VertexData { Position = _mesh.vertices[i], Normal = _mesh.normals[i], Uv = _mesh.uv[i] }; _vertexData[i] = v; } } private void SetMeshVertexBufferParams() { var layout = new[] { new VertexAttributeDescriptor(VertexAttribute.Position, _mesh.GetVertexAttributeFormat(VertexAttribute.Position), 3), new VertexAttributeDescriptor(VertexAttribute.Normal, _mesh.GetVertexAttributeFormat(VertexAttribute.Normal), 3), new VertexAttributeDescriptor(VertexAttribute.TexCoord0, _mesh.GetVertexAttributeFormat(VertexAttribute.TexCoord0), 2), }; _mesh.SetVertexBufferParams(_mesh.vertexCount, layout); } private void SetComputeShaderValues() { _computeShader.SetBuffer(_kernel, "vertexBuffer", _computeBuffer); _computeShader.SetFloat("_Force", _powerOfDeformation); _computeShader.SetFloat("_Radius", _radiusOfDeformation); } private ComputeBuffer CreateComputeBuffer() { var computeBuffer = new ComputeBuffer(_mesh.vertexCount, 32); if (_vertexData.IsCreated) { computeBuffer.SetData(_vertexData); } return computeBuffer; } private void Dispatch() { if (_deformationPoints.Count == 0) { return; } if (_isDispatched) { return; } _computeShader.SetVectorArray(_deformationPointsPropertyId, _deformationPoints.ToArray()); _computeShader.SetInt(_deformationPointsCountPropertyId, _deformationPoints.Count); _computeShader.Dispatch(_kernel, _dispatchCount, 1, 1); _deformationPoints.Clear(); _isDispatched = true; _request = AsyncGPUReadback.Request(_computeBuffer); } private void GatherResult() { if (!_isDispatched || !_request.done || _request.hasError) { return; } _isDispatched = false; _vertexData = _request.GetData<VertexData>(); _mesh.MarkDynamic(); _mesh.SetVertexBufferData(_vertexData, 0, 0, _vertexData.Length); _meshCollider.sharedMesh = _mesh; _request = AsyncGPUReadback.Request(_computeBuffer); } private void CleanUp() { _computeBuffer?.Release(); } private void OnDestroy() { CleanUp(); } }
Performance Tests
For performance testing, I am using the Performance Testing package by Unity. I also made a post on how to setup it in your project.
For this post, I have created a bunch of performance tests to measure different aspects of mesh deformation. All these tests are available on GitHub: PerformanceTests. Here we will take a look at DeformationPerformanceTests.cs where the approaches described above are compared.
Disclaimer: Unfortunately, Unity 2022/2021/2020 Performance Testing package doesn’t produce a test report after running it on a target platform (at least for me). I created a sample project and reproduced the issue to report a bug. And it turned out it works correctly only in Unity 2019. Luckily I found steps how to get it working when running tests on a device via the Test Runner window in the editor and updated my post about setting up the performance testing package, so check this out if you have a similar issue.
As a target device I used OnePlus 9 (Snapdragon 888).
To measure performance I use the following approach DeformationTestHelper.cs which measures how much time each frame takes:
using (Measure.Frames().Scope()) { while (pos.x < max.x) { while (pos.y < max.y) { plane.Deform(pos); yield return null; pos = new Vector3(pos.x, pos.y + 1f, pos.z); } pos = new Vector3(pos.x + 1f, min.y, pos.z); } }
The whole plane is deformed by one step per frame. And here is how it is executed:
So here are the results of running performance tests on my Android device:
| Method | Median | Dev | StdDev |
|---------------------------------------------------|---------:|---------:|---------:|
| DeformableMeshPlane_ComputeShader_PerformanceTest | 8,27 ms | 0,43 ms | 3,53 ms |
| DeformableMeshPlane_Naive_PerformanceTest | 8,27 ms | 0,39 ms | 3,25 ms |
| DeformableMeshPlane_NaiveJob_PerformanceTest | 8,27 ms | 0,19 ms | 1,58 ms |
Since my device has a 120Hz refresh rate and VSync is forced even if you turn it off in Quality Settings or via code, frame time will always include WaitForTargetFPS. As I don’t use custom profile tags in these tests, these awaits are included in a total frame time. Anyway, it shows that all approaches are in a frame budget to run the game at 120 fps, which is more than enough on mobile.
I still want to compare how these approaches perform relative to each other. For that, we can add custom profile tags and measure them in performance tests, or run the same tests on a PC with VSync turned off. As this post is getting big already, for now I chose the second option and in the next part will elaborate on this where a newer approach with MeshData is added to the test.
PC benchmarks are performed on the following configuration using a standalone build:
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores NVIDIA GeForce GTX 1070
So results are the following:
| Method | Median | Dev | StdDev |
|---------------------------------------------------|---------:|---------:|---------:|
| DeformableMeshPlane_ComputeShader_PerformanceTest | 1,37 ms | 3,91 ms | 5,66 ms |
| DeformableMeshPlane_Naive_PerformanceTest | 15,37 ms | 0,11 ms | 1,65 ms |
| DeformableMeshPlane_NaiveJob_PerformanceTest | 6,76 ms | 0,88 ms | 5,97 ms |
The compute shader is a clear winner here. However, the frame time over the test looks like this:
Since deformation is calculated on a GPU, while the result is awaited on a CPU, three frames always take around 1 ms each, followed by one frame during which Mesh.Bake happens when a deformed mesh is applied to a mesh collider. This operation takes around 10 ms. In the end, the median is 1.37 ms, so the compute shader looks way better. But to be fair given the current resolution of the mesh all approaches in terms of calculating a deformation are ok, baking a mesh is what takes the most and can cause frame drops as it goes out of frame budget of 8 ms even on a PC, as well as on mobile. We will address mesh baking in the next part when I will try to optimize the game and mechanics further based on the best approach to calculate vertices deformation.
Conclusion
The result looks like this:
My tests show that the compute shader solution is the fastest here. However, playing on mobile with 120 fps looks well even using the naive implementation where everything is calculated on the main thread, while the main contributor to the frame time is mesh baking for a mesh collider. This definitely should be addressed as even rare frames going out of a frame budget due to the Bake call may disrupt the player’s experience. And of course, the referenced game has more balls and bigger levels (so more vertices to take into account when calculating a deformation and baking a mesh), but we will look at how performance scales with a mesh resolution and how the algorithm can be improved to work decently on mobile with any amount of vertices on a level (to a certain extent) in the next part.
I’d be curious what a standard off thread implementation might look like, with a threadsafe list of deformation points, like a concurrent queue, and then locking to update the mesh. Probably slower than the GPU, but it might help with the lag you were mentioning. (not that it seemed to be a real problem once you got to the phone)
Also, you could lock the z axis of motion on the ball’s rigidbody to get rid of two colliders in front/back.
Lastly, total nit, but you should seal your classes for slightly better perf.
Hey Josh, thanks for the suggestions! Definitely should try locking the Z axis, probably will let squeeze more performance given there are more balls in a real game. Fortunately, the setup allows me to compare it pretty easily to the current baseline. Going to address it in the next part.
Talking about multi-threading, of course, it should be profiled for this exact case, but looking at general tests in different blogs I see that burst compiled jobs are running faster than .NET threads. And lag itself is caused by baking in the main thread, while the engine allows to bake a mesh inside a job: https://docs.unity3d.com/ScriptReference/Physics.BakeMesh.html, so using this approach and splitting the plane into a bunch of smaller meshes could make the process lightning fast.
Oh nice, I’ve not come across BakeMesh before. I have a spot where I could use that!
Awesome, would be great if you share your use case for mesh baking and how your performance improved using jobs after you tried this.
Great post! Thanks!
I can’t understand why sqrMagnitude is used for calculating distance between the vertex and the clicked point? Shouldn’t it be
if (dist < _radius * _radius)
instead of
if (dist < _radius)
in that case?
Thanks, great question!
It is a small optimization since magnitude contains a square root operation which is quite expensive. Especially compared to no operation at all 🙂
Here we don’t need to know the exact distance to each vertex (magnitude), we just need to compare the distance to the threshold. So a squared distance (sqrMagnitude) can be used.
I named it _radius since it is a pretty clear name and allows us to directly modify the actual deformation radius via the inspector without knowing unimportant details like if it is compared as a squared distance or not.