High-Fidelity GAN Inversion for Image Attribute Editing

Want to play with your photos without cloning the code? Try our ONLINE DEMO for fun!

Abstract

We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved (e.g., background, appearance, and illumination). We first analyze the challenges of high-fidelity GAN inversion from the perspective of lossy data compression. With a low bit-rate latent code, previous works have difficulties in preserving high-fidelity details in reconstructed and edited images. Increasing the size of a latent code can improve the accuracy of GAN inversion but at the cost of inferior editability. To improve image fidelity without compromising editability, we propose a distortion consultation approach that employs a distortion map as a reference for high-fidelity reconstruction. In the distortion consultation inversion (DCI), the distortion map is first projected to a high-rate latent map, which then complements the basic low-rate latent code with more details via consultation fusion. To achieve high-fidelity editing, we propose an adaptive distortion alignment (ADA) module with a self-supervised training scheme, which bridges the gap between the edited and inversion images. Extensive experiments in the face and car domains show a clear improvement in both inversion and editing quality.

Results on High-Fidelity Image Editing


Original image (left) and edited image (right).

original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF

Results on High-Fidelity Video Editing (+ Smile)

Approach

Overview of our high-fidelity image inversion and editing framework. The basic encoder E0 infers a low-rate latent code W corresponding to a low-fidelity reconstruction image hat{X}o. The distortion map contains the lost high-frequency image-specific details to improve the reconstruction fidelity. The red dotted boxes indicate the editing behaviour with certain semantic direction. To achieve high-fidelity image editing, we propose the distortion consultation branch to facilitate the generation. In the distortion consultation, Δ is first aligned with the low-fidelity edited image by ADA and then embedded to a high-rate latent map C via the consultation encoder Ec. Latent code W and latent map C are combined via the consultation fusion (see details in the right part) across layers of G0 to generate the final edited image.

method

More Results

method

BibTeX

             
@inproceedings{wang2021HFGI,
  title={High-Fidelity GAN Inversion for Image Attribute Editing},
  author={Wang, Tengfei and Zhang, Yong and Fan, Yanbo and Wang, Jue and Chen, Qifeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}