MY ALT TEXT CapHuman: Capture Your Moments in Parallel Universes

1ReLER, CCAI, Zhejiang University 2Huawei Technologies Ltd.
arXiv 2024

*Corresponding Author
MY ALT TEXT

Given only one reference facial photograph, our CapHuman can generate photo-realistic specific individual portraits with content-rich representations and diverse head positions, poses, facial expressions, and illuminations in different contexts.

Generalized ID Preservation (5~10s) 🚀, 3D-consistent Head Control 👴, Plug-and-Play 😍, One Reference Image 1⃣️

Abstract

We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, and facial expressions in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the ``encode then learn to align" paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines.

Gallery

Method

MY ALT TEXT

Our CapHuman stands upon the pre-trained T2I diffusion model. a) We embrace the "encode then learn to align" paradigm for generalizable identity preservation. b) The introduction of the 3D parametric face model enables flexible and fine-grained head control. c) We learn a CapFace module to equip the pre-trained T2I diffusion model with the above capabilities.

More Stylization


Head Control

MY ALT TEXT

Head position, pose control

MY ALT TEXT

Facial expression control

MY ALT TEXT

Illumination control


Multi-Human Image Generation

MY ALT TEXT

Simultaneous Head and Body Control

MY ALT TEXT

Photo ID Generation

MY ALT TEXT

BibTeX


      @article{,
      author={Liang, Chao and Ma, Fan and Zhu, Linchao and Deng, Yingying and Yang, Yi},
      journal={}, 
      title={CapHuman: Capture Your Moments in Parallel Universes}, 
      year={2024},
      volume={},
      number={},
      pages={},
      doi={}}