Feb 14, 2019

Setting up Unity LipSync for a RPG game

Synchronising our characters' mouths with the phonemes they pronounce to make realistic dialogues

The mission

Animating every dialogue sequence for an indie game can prove to be both an expensive and laborious endeavor. Therefore, we decided to tackle this issue from a different perspective.

In this tutorial, we will discuss the implementation of systemic LipSync Pro in Ravensword Legacy, a mobile premium game in development in collaboration with Crescent Moon Games.

Once the characters for this game were created, it was time to bring them to life, so they could talk to each other.

After some research, we found a Unity software plugin called LipSync Pro. This is an extraordinary tool that makes it easy to add key frames to audio clips, allowing 3D characters to synchronise their mouth movements with speech. It also offers other blend shapes, such as blinking and yawning, as well as expressions such as anger and happiness.

Without further ado, let's see how to implement this Unity plugin to make our characters' dialogues look realistic!


The core of spoken languages

A phoneme is one of the minimum units of sound that distinguish one word from another in a particular language. For example, in English, there are 44 phonemes. Similarly to VRChat's system, LipSync uses phonemes to choose between the different mouth shapes that represent a specific sound.

This way, we can assign each keyframe from the Audio Clip to one phoneme, and the mouth will adapt.

English phonemes

For this type of work, game developers tipically group the phonemes together. For example, the letter "k" in "key" sounds the same as the "c" in "car", hence requiring only one phoneme for that sound. The same applies to "m", "b", and "p" and so on.

LipSync phonemes list

This is the simplified list that LipSync asks for us to work its magic. You don't need to fill all of them at all; in fact, we're just using 3 blend shapes (A/I, E, O) plus the resting one.

Adapting to the new needs

We proceeded to modify the models by opening their mouths and adding the inside of the mouth (commonly called mouthbag), tongue and teeth. We also had to modify the textures so that the teeth, tongue and mouthbag were textured.

Head model with tongue and teeth

After this, we duplicated and modified the resting pose three times for the A, E/I and O phonemes. Since the game is low poly, has pixel post-processing and a limited color palette (sometimes even as low as 8 bits!), too much fidelity and/or fluidity would make it look uncanny.

4 heads with different mouth positions

Each of these heads were exported as a single head with 4 blend shapes, using the modified mouth's ones as targets for the said blend shapes.

Hero 3D model complete assembly

Then we repeated this process for a couple of NPCs and the other 8 head variations of our hero. Once that was done, I headed towards Unity and imported the new heads, replacing the old ones. We also imported one character line from one of our favourite video games for testing purposes.

Setup of the system

We created a LipSync info .asset file from that Audio Clip via LipSync's Clip Editor (the shortcut is "Ctrl + Alt + A") and started adding the phonemes that matched with what the line was saying.

Having only 3 phonemes really sped up this process, otherwise, it would have been too tedious. After that was done, we must save the LipSync info .asset file in the same folder as my Audio Clip.

LipSync Clip Editor

LipSync program

Each of these black markers signifies that the mouth will change to the specified phoneme at the specified time.

After completing this task, I returned to the character head prefab. We added the LipSync script and designated the head mesh as the primary mesh while designating the teeth as the secondary mesh.

This arrangement ensures that the head's blend shapes will also influence the teeth. Additionally, we assigned the character's Audio Output as the source of the line's sound and placed it in the corresponding slot.

LipSync script

We specified which blend shapes were to be assigned to which phonemes so that LipSync knew what blend shape it had to change every time the time slider passed through a phoneme marker.

LipSync settings


And we would already have the final result: the mouth of our character moves according to the phonemes he pronounces. This is a fairly simple method to follow and use, applicable to a multitude of personal 3D projects.

If this has been useful for you, pass this article on to other game developers, so they can get to know the Unity LipSync plugin and learn how to use it in a very simple way.

Polygonal Mind
Creative Development 3D Studio

Since 2015 creating cool experiences, games and avatars on digital platforms

Case Study
Making 3D Avatars for Bored Ape Yacht Club
Transforming Icons: Bored Ape Yacht Club 3D Avatar Collection
How to import Decentraland SkyboxEditor into Unity