Diffusion-based Semantic Image Synthesis from Sparse Layouts

Yuantian HuangSatoshi IizukaKazuhiro Fukui

CGI 2023

teaser image

Abstract:

We present an efficient framework that utilizes diffusion models to generate landscape images from sparse semantic layouts. Previous approaches use dense semantic label maps to generate images, where the quality of the results is highly dependent on the accuracy of the input semantic layouts. However, it is not trivial to create detailed and accurate semantic layouts in practice. To address this challenge, we carefully design a random masking process that effectively simulates real user input during the model training phase, making it more practical for real-world applications. Our framework leverages the Semantic Diffusion Model (SDM) as a generator to create full landscape images from sparse label maps, which are created randomly during the random masking process. Missing semantic information is complemented based on the learned image structure. Furthermore, we achieve comparable inference speed to GAN-based models through a model distillation process while preserving the generation quality. After training with the well-designed random masking process, our proposed framework is able to generate high-quality landscape images with sparse and intuitive inputs, which is useful for practical applications. Experiments show that our proposed method outperforms existing approaches both quantitatively and qualitatively.


URL PDF Slide Code



Publication:

@inproceedings{Huang2023Sparse,
author={Huang, Yuantian and Iizuka, Satoshi and Fukui, Kazuhiro},
title={Diffusion-based Semantic Image Synthesis from Sparse Layouts},
booktitle={Computer Graphics International Conference},
pages={441--454},
year={2023},
organization={Springer},
url={https://link.springer.com/chapter/10.1007/978-3-031-50072-5_35},
}