Listening behavior generation
Lip syncing, as compared to real videos
Left: Real video (ground-truth)
Right: Generated video
Synthesized lip syncing is close to the real ones.
Fast rendering on CPU: maintaining quality & speed
left: GPU (P40) rendering with 250 fps
middle: raw CPU rendering with 6 fps
right: optimized CPU rendering with 250 fps (Intel Xeon E5)
Speech to gesture generation
Left: generated body gestures based on input speech; Middle: rendered video; Right: ground-truth video