Listening behavior generation



Lip syncing, as compared to real videos



Left: Real video (ground-truth)
Right: Generated video

Synthesized lip syncing is close to the real ones.



Fast rendering on CPU: maintaining quality & speed



left: GPU (P40) rendering with 250 fps
middle: raw CPU rendering with 6 fps
right: optimized CPU rendering with 250 fps (Intel Xeon E5)

Speech to gesture generation



Left: generated body gestures based on input speech; Middle: rendered video; Right: ground-truth video


Digital customer service