Setting up Unity LipSync for a RPG game

This is some text inside of a div block.
This is some text inside of a div block.
Feb 14, 2019

Synchronising our characters' mouths with the phonemes they pronounce to make realistic dialogues

Animating every dialogue sequence for an indie game can prove to be both an expensive and laborious endeavor. Therefore, we decided to tackle this issue from a different perspective.

In this tutorial, we will discuss the implementation of systemic LipSync Pro in Ravensword Legacy, a mobile premium game in development in collaboration with Crescent Moon Games.

Once the characters for this game were created, it was time to bring them to life, so they could talk to each other.

After some research, we found a Unity software plugin called LipSync Pro. This is an extraordinary tool that makes it easy to add key frames to audio clips, allowing 3D characters to synchronise their mouth movements with speech. It also offers other blend shapes, such as blinking and yawning, as well as expressions such as anger and happiness.

Without further ado, let's see how to implement this Unity plugin to make our characters' dialogues look realistic!


The core of spoken languages

A phoneme is one of the minimum units of sound that distinguish one word from another in a particular language. For example, in English there are 44 phonemes. Similarly to VRChat's system, LipSync uses phonemes to choose between the different mouth shapes that represent a specific sound.

This way, we can assign each key frame from the Audio Clip to one phoneme, and the mouth will adapt.

English Phonemes

For this kind of work, game developers usually group the phonemes together. For example, the letter k in key sounds the same as the c in car, hence needing only one phoneme for that sound. Same with m, b, and p and so on.

LipSync Phonemes List

This is the simplified list that LipSync asks for us to work its magic. You don't need to fill all of them at all, in fact we're just using 3 blend shapes (A/I, E, O)+ the resting one.

Adapting to the new needs

We proceeded to modify the models and open their mouths, add the inside of the mouth (commonly called mouthbag), tongue and teeth. We also had to modify the textures so the teeth, tongue and mouthbag were textured.

Head model with tongue and teeth

After this, we duplicated and modified three times the resting pose for the A, E/I and O phonemes. As the game is low poly, has pixel post processing and a limited colour palette (sometimes even as low as 8 bits!), too much fidelity and/or fluidity would make it look uncanny.

4 heads with different mouth positions

Each of these heads were exported as a single head with 4 blend shapes, using the modified mouth's ones as targets for the said blend shapes.

Hero 3D model complete assembly

Then we repeated this process for a couple of NPCs and the other 8 head variations of our hero. Once that was done, I headed towards Unity and imported the new heads replacing the old ones. We also imported one character line from one of our favourite video games for testing purposes.

Setup of the system

We created a LipSync info .asset file from that Audio Clip via LipSync's Clip Editor (the shortcut is Ctrl + Alt + A) and started adding the phonemes that matched with what the line was saying.

Having only 3 phonemes really sped up this process, otherwise it'd have been too tedious. After that was done, we must save the LipSync info .asset file in the same folder as my Audio Clip.

LipSync Clip Editor

LipSync program

Each of these black markers means that the mouth will change to the specified phoneme at the specified time.

After completing this task, I returned to the character head prefab. We added the LipSync script and designated the head mesh as the primary mesh while designating the teeth as the secondary mesh.

This arrangement ensures that the head's blend shapes will also influence the teeth. Additionally, we assigned the character's Audio Output as the source of the line's sound and placed it in the corresponding slot.

LipSync Script

We specified which blend shapes were to be assigned to which phonemes so that LipSync knew what blend shape it had to change every time the time slider passed through a phoneme marker.

LipSync Settings


And we would already have the final result: the mouth of our character moves according to the phonemes he pronounces. This is a fairly simple method to follow and use, applicable to a multitude of personal 3D projects.

If this has been useful for you, pass this article on to other game developers, so they can get to know the Unity LipSync plugin and learn how to use it in a very simple way.

Polygonal Mind
Creative Development 3D Studio

Since 2015 creating cool experiences, games and avatars on digital platforms

Creating a VRChat avatar with blend shapes visemes
Add dynamic bones to your 3D character in Unity
Creating eye-tracking with Blender’s CATS plugin