
Setting up Unity LipSync for a RPG game
HeadingSynchronising our characters' mouths with the phonemes they pronounce to make realistic dialogues
Animating every dialogue sequence for an indie game can prove to be both an expensive and laborious endeavor. Therefore, we decided to tackle this issue from a different perspective.
In this tutorial, we will discuss the implementation of systemic LipSync Pro in Ravensword Legacy, a mobile premium game in development in collaboration with Crescent Moon Games.
Once the characters for this game were created, it was time to bring them to life, so they could talk to each other.
After some research, we found a Unity software plugin called LipSync Pro. This is an extraordinary tool that makes it easy to add key frames to audio clips, allowing 3D characters to synchronise their mouth movements with speech. It also offers other blend shapes, such as blinking and yawning, as well as expressions such as anger and happiness.
Without further ado, let's see how to implement this Unity plugin to make our characters' dialogues look realistic!
Resources
The core of spoken languages
A phoneme is one of the minimum units of sound that distinguish one word from another in a particular language. For example, in English there are 44 phonemes. Similarly to VRChat's system, LipSync uses phonemes to choose between the different mouth shapes that represent a specific sound.
This way, we can assign each key frame from the Audio Clip to one phoneme, and the mouth will adapt.

For this kind of work, game developers usually group the phonemes together. For example, the letter k in key sounds the same as the c in car, hence needing only one phoneme for that sound. Same with m, b, and p and so on.

This is the simplified list that LipSync asks for us to work its magic. You don't need to fill all of them at all, in fact we're just using 3 blend shapes (A/I, E, O)+ the resting one.
Adapting to the new needs
We proceeded to modify the models and open their mouths, add the inside of the mouth (commonly called mouthbag), tongue and teeth. We also had to modify the textures so the teeth, tongue and mouthbag were textured.

After this, we duplicated and modified three times the resting pose for the A, E/I and O phonemes. As the game is low poly, has pixel post processing and a limited colour palette (sometimes even as low as 8 bits!), too much fidelity and/or fluidity would make it look uncanny.

Each of these heads were exported as a single head with 4 blend shapes, using the modified mouth's ones as targets for the said blend shapes.

Then we repeated this process for a couple of NPCs and the other 8 head variations of our hero. Once that was done, I headed towards Unity and imported the new heads replacing the old ones. We also imported one character line from one of our favourite video games for testing purposes.
Setup of the system
We created a LipSync info .asset file from that Audio Clip via LipSync's Clip Editor (the shortcut is Ctrl + Alt + A) and started adding the phonemes that matched with what the line was saying.
Having only 3 phonemes really sped up this process, otherwise it'd have been too tedious. After that was done, we must save the LipSync info .asset file in the same folder as my Audio Clip.

LipSync program
Each of these black markers means that the mouth will change to the specified phoneme at the specified time.
After completing this task, I returned to the character head prefab. We added the LipSync script and designated the head mesh as the primary mesh while designating the teeth as the secondary mesh.
This arrangement ensures that the head's blend shapes will also influence the teeth. Additionally, we assigned the character's Audio Output as the source of the line's sound and placed it in the corresponding slot.


We specified which blend shapes were to be assigned to which phonemes so that LipSync knew what blend shape it had to change every time the time slider passed through a phoneme marker.

Conclusion
And we would already have the final result: the mouth of our character moves according to the phonemes he pronounces. This is a fairly simple method to follow and use, applicable to a multitude of personal 3D projects.
If this has been useful for you, pass this article on to other game developers, so they can get to know the Unity LipSync plugin and learn how to use it in a very simple way.

