Applications

Listening behavior generation

Lip syncing, as compared to real videos

Left: Real video (ground-truth)
Right: Generated video

Synthesized lip syncing is close to the real ones.

Fast rendering on CPU: maintaining quality & speed

left: GPU (P40) rendering with 250 fps

middle: raw CPU rendering with 6 fps

right: optimized CPU rendering with 250 fps (Intel Xeon E5)

Speech to gesture generation

Left: generated body gestures based on input speech; Middle: rendered video; Right: ground-truth video

Digital customer service