Tutorial

Heading

Feb 14, 2019

Setting up Unity LipSync for a RPG game

Synchronising our characters' mouths with the phonemes they pronounce to make realistic dialogues

The mission

‍

Animating every dialogue sequence for an indie game can prove to be both an expensive and laborious endeavor. Therefore, we decided to tackle this issue from a different perspective.

‍

In this tutorial, we will discuss the implementation of systemic LipSync Pro in Ravensword Legacy, a mobile premium game in development in collaboration with Crescent Moon Games.

‍

Once the characters for this game were created, it was time to bring them to life, so they could talk to each other.

‍

After some research, we found a Unity software plugin called LipSync Pro. This is an extraordinary tool that makes it easy to add key frames to audio clips, allowing 3D characters to synchronise their mouth movements with speech. It also offers other blend shapes, such as blinking and yawning, as well as expressions such as anger and happiness.

‍

Without further ado, let's see how to implement this Unity plugin to make our characters' dialogues look realistic!

‍

Resources

‍

The core of spoken languages

‍

A phoneme is one of the minimum units of sound that distinguish one word from another in a particular language. For example, in English, there are 44 phonemes. Similarly to VRChat's system, LipSync uses phonemes to choose between the different mouth shapes that represent a specific sound.

This way, we can assign each keyframe from the Audio Clip to one phoneme, and the mouth will adapt.

‍

‍

For this type of work, game developers tipically group the phonemes together. For example, the letter "k" in "key" sounds the same as the "c" in "car", hence requiring only one phoneme for that sound. The same applies to "m", "b", and "p" and so on.

‍

‍

This is the simplified list that LipSync asks for us to work its magic. You don't need to fill all of them at all; in fact, we're just using 3 blend shapes (A/I, E, O) plus the resting one.

‍

Adapting to the new needs

‍

We proceeded to modify the models by opening their mouths and adding the inside of the mouth (commonly called mouthbag), tongue and teeth. We also had to modify the textures so that the teeth, tongue and mouthbag were textured.

‍

‍

After this, we duplicated and modified the resting pose three times for the A, E/I and O phonemes. Since the game is low poly, has pixel post-processing and a limited color palette (sometimes even as low as 8 bits!), too much fidelity and/or fluidity would make it look uncanny.

‍

*4 heads with different mouth positions*

‍

Each of these heads were exported as a single head with 4 blend shapes, using the modified mouth's ones as targets for the said blend shapes.

‍

‍

Then we repeated this process for a couple of NPCs and the other 8 head variations of our hero. Once that was done, I headed towards Unity and imported the new heads, replacing the old ones. We also imported one character line from one of our favourite video games for testing purposes.

‍

Setup of the system

‍

We created a LipSync info .asset file from that Audio Clip via LipSync's Clip Editor (the shortcut is "Ctrl + Alt + A") and started adding the phonemes that matched with what the line was saying.

Having only 3 phonemes really sped up this process, otherwise, it would have been too tedious. After that was done, we must save the LipSync info .asset file in the same folder as my Audio Clip.

‍

‍

LipSync program

‍

Each of these black markers signifies that the mouth will change to the specified phoneme at the specified time.

After completing this task, I returned to the character head prefab. We added the LipSync script and designated the head mesh as the primary mesh while designating the teeth as the secondary mesh.

This arrangement ensures that the head's blend shapes will also influence the teeth. Additionally, we assigned the character's Audio Output as the source of the line's sound and placed it in the corresponding slot.

‍

‍

We specified which blend shapes were to be assigned to which phonemes so that LipSync knew what blend shape it had to change every time the time slider passed through a phoneme marker.

‍

‍

Conclusion

‍

And we would already have the final result: the mouth of our character moves according to the phonemes he pronounces. This is a fairly simple method to follow and use, applicable to a multitude of personal 3D projects.

If this has been useful for you, pass this article on to other game developers, so they can get to know the Unity LipSync plugin and learn how to use it in a very simple way.

Unity

Avatars

Polygonal Mind

Creative Development 3D Studio

Since 2015 creating cool experiences, games and avatars on digital platforms

‍

services

blog

START EXPERIENCE

Setting up Unity LipSync for a RPG game

The mission

Resources

The core of spoken languages

Adapting to the new needs

Setup of the system

LipSync program

Conclusion

Polygonal Mind

Creative Development 3D Studio

follow us

JOIN OUR TEAM

© 2017-2024 POLYGONAL MIND SL. ALL RIGHTS RESERVED

services

blog

START EXPERIENCE

Setting up Unity LipSync for a RPG game

The mission

Resources

The core of spoken languages

Adapting to the new needs

Setup of the system

LipSync program

Conclusion

Polygonal Mind

Creative Development 3D Studio

RelatedUPDATES

follow us

JOIN OUR TEAM

© 2017-2024 POLYGONAL MIND SL. ALL RIGHTS RESERVED