High-Fidelity GAN Inversion for Image Attribute Editing

Abstract

We present a novel high-fidelity generative adversarial network (GAN) inversion framework that enables attribute editing with image-specific details well-preserved (e.g., background, appearance and illumination). We first formulate GAN inversion as a lossy data compression problem and carefully discuss the Rate-Distortion-Edit trade-off. Due to this trade-off, previous works fail to achieve high-fidelity reconstruction while keeping compelling editing ability with a low bit-rate latent code only. In this work, we propose a distortion consultation approach that employs the distortion map as a reference for reconstruction. In the distortion consultation inversion (DCI), the distortion map is first projected to a high-rate latent map, which then complements the basic low-rate latent code with (lost) details via consultation fusion. To achieve high-fidelity editing, we propose an adaptive distortion alignment (ADA) module with a self-supervised training scheme. Extensive experiments in the face and car domains show a clear improvement in terms of both inversion and editing quality.

Results on High-Fidelity Image Editing


Original image (left) and edited image (right).

original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF
original image  Your browser does not support GIF original image  Your browser does not support GIF

Results on High-Fidelity Video Editing (+ Smile)

Approach

Overview of our high-fidelity image inversion and editing framework. The basic encoder E0 infers a low-rate latent code W corresponding to a low-fidelity reconstruction image hat{X}o. The distortion map contains the lost high-frequency image-specific details to improve the reconstruction fidelity. The red dotted boxes indicate the editing behaviour with certain semantic direction. To achieve high-fidelity image editing, we propose the distortion consultation branch to facilitate the generation. In the distortion consultation, Δ is first aligned with the low-fidelity edited image by ADA and then embedded to a high-rate latent map C via the consultation encoder Ec. Latent code W and latent map C are combined via the consultation fusion (see details in the right part) across layers of G0 to generate the final edited image.

method

More Results

original image original image

BibTeX

                        
@article{wang2021HFGI,
      author = {Tengfei Wang and Yong Zhang and Yanbo Fan and Jue Wang and Qifeng Chen},
      title = {High-Fidelity GAN Inversion for Image Attribute Editing}, 
      journal = {arxiv:2109.06590},  
      year = {2021}
}