High-Fidelity GAN Inversion for Image Attribute Editing

Tengfei Wang
HKUST
Yong Zhang
Tencent AI Lab
Yanbo Fan
Tencent AI Lab
Jue Wang
Tencent AI Lab
Qifeng Chen
HKUST

Want to play with your photos without cloning the code? Try our ONLINE DEMO for fun!

Abstract

We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved (e.g., background, appearance, and illumination). We first analyze the challenges of high-fidelity GAN inversion from the perspective of lossy data compression. With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images. Increasing the size of a latent code can improve the accuracy of GAN inversion but at the cost of inferior editability. To improve image fidelity without compromising editability, we propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction. In the distortion consultation inversion (DCI), the distortion map is first projected to a high-rate latent map, which then complements the basic low-rate latent code with more details via consultation fusion. To achieve high-fidelity editing, we propose an adaptive distortion alignment (ADA) module with a self-supervised training scheme, which bridges the gap between the edited and inversion images. Extensive experiments in the face and car domains show a clear improvement in both inversion and editing quality.

Results on High-Fidelity Image Editing

Original image (left) and edited image (right).

Results on High-Fidelity Video Editing (+ Smile)

Approach

Overview of our high-fidelity image inversion and editing framework. The basic encoder E₀ infers a low-rate latent code W corresponding to a low-fidelity reconstruction image hat{X}_o. The distortion map contains the lost high-frequency image-specific details to improve the reconstruction fidelity. The red dotted boxes indicate the editing behaviour with certain semantic direction. To achieve high-fidelity image editing, we propose the distortion consultation branch to facilitate the generation. In the distortion consultation, Δ is first aligned with the low-fidelity edited image by ADA and then embedded to a high-rate latent map C via the consultation encoder E_c. Latent code W and latent map C are combined via the consultation fusion (see details in the right part) across layers of G₀ to generate the final edited image.

More Results

BibTeX

             
@inproceedings{wang2021HFGI,
  title={High-Fidelity GAN Inversion for Image Attribute Editing},
  author={Wang, Tengfei and Zhang, Yong and Fan, Yanbo and Wang, Jue and Chen, Qifeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

High-Fidelity GAN Inversion for Image Attribute Editing

Paper

Video

Code

Colab