Free-View Expressive Talking Head Video Editing

Yuantian HuangSatoshi IizukaKazuhiro Fukui

ICASSP 2023

teaser image

Abstract:

We present a novel framework for talking head video editing, allowing users to freely edit head pose, emotion, and eye blink while maintaining audio-visual synchronization. Unlike previous approaches that mainly focus on generating a talking head video, our proposed model is able to edit the talking heads of an input video and restore it to full frames, which supports a broader range of applications. Our proposed framework consists of two parts: a) a reconstruction-based generator that can generate talking heads fitting to the original frame while corresponding to freely controllable attributes, including head pose, emotion, and eye blink. b) a multiple-attribute discriminator that enforces attribute-visual synchronization. We additionally introduce attention modules and perceptual loss to improve the overall generation quality. We compare existing approaches as corroborated by quantitative metrics and qualitative comparisons.


URL PDF Code Slide



Online Demo:

Click to open in full screen: huggingface

Model Architecture:

The encoders embed inputs together and feed them into the generator, while the input audio Mel spectrogram, head pose, emotion, and eye blink are extracted from target frames during the training stage. A set of synchronization losses are then calculated by a pre-trained multi-attribute discriminator between generated frames and input attributes to enforce attribute-visual synchronization.



Results:


Certificate:


Publication:

Y. Huang, S. Iizuka and K. Fukui, "Free-View Expressive Talking Head Video Editing," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10095745.
@INPROCEEDINGS{Huang2023FETE,
author={Huang, Yuantian and Iizuka, Satoshi and Fukui, Kazuhiro},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Free-View Expressive Talking Head Video Editing},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10095745},
url={https://ieeexplore.ieee.org/abstract/document/10095745},
month={June},
}