Image-to-image translation using an offset-based multi-scale codes GAN encoder

Guo, Zihao; Shao, Mingwen; Li, Shunhang

doi:10.1007/s00371-023-02810-4

Image-to-image translation using an offset-based multi-scale codes GAN encoder

Original article
Published: 04 March 2023

Volume 40, pages 699–715, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

489 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Despite the remarkable achievements of generative adversarial networks (GANs) in high-quality image synthesis, applying pre-trained GAN models to image-to-image translation is still challenging. Previous approaches typically map the conditional image into the latent spaces of GANs by per-image optimization or learning a GAN encoder. However, neither of these two methods can ideally perform image-to-image translation tasks. In this work, we propose a novel learning-based framework which can complete common image-to-image translation tasks with high quality in real-time based on pre-trained GANs. Specifically, to mitigate the semantic misalignment between conditional and synthesized images, we propose an offset-based image synthesis method that allows our encoder to use multiple rather than one forward propagation to predict the latent codes. During the multiple forward passes, the final latent codes are adjusted continuously according to the semantic difference between the conditional image and the current synthesized image. To further reduce the loss of details during encoding, we extract multiple latent codes at multiple scales from input instead of a single code to synthesize the image. Moreover, we propose an optional multiple feature maps fusion module that combines our encoder with different generators to implement our multiple latent codes strategies. Finally, we analyze the performance and demonstrate the effectiveness of our framework by comparing it with state-of-the-art works on super-resolution and conditional face synthesis tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

InvolutionGAN: lightweight GAN with involution for unsupervised image-to-image translation

Article 24 April 2023

Efficient High-Resolution Image-to-Image Translation Using Multi-Scale Gradient U-Net

Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

Article 17 December 2019

Data Availability Statement

The datasets generated or analyzed during this study are available in the CelebA-HQ repository, http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, and the LSUN repository, https://www.yf.io/p/lsun.

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in neural information processing systems 27, 2672–2680 (2014)
Google Scholar
Gui, J., Sun, Z., Wen, Y., Tao, D., Ye, J.: A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Transactions on Knowledge and Data Engineering (2021). https://doi.org/10.1109/TKDE.2021.3130191
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Song, H., Wang, M., Zhang, L., Li, Y., Jiang, Z., Yin, G.: S2rgan: sonar-image super-resolution based on generative adversarial network. The Visual Computer 37(8), 2285–2299 (2021). https://doi.org/10.1007/s00371-020-01986-3
Article Google Scholar
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the Ieee/cvf Conference on Computer Vision and Pattern Recognition, pp. 2437–2445 (2020)
Chan, K.C., Wang, X., Xu, X., Gu, J., Loy, C.C.: Glean: Generative latent bank for large-factor image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14245–14254 (2021)
Xiu, J., Qu, X., Yu, H.: Double discriminative face super-resolution network with facial landmark heatmaps. The Visual Computer (2022). https://doi.org/10.1007/s00371-022-02701-0
Bai, J., Chen, R., Liu, M.: Feature-attention module for context-aware image-to-image translation. The Visual Computer 36(10), 2145–2159 (2020). https://doi.org/10.1007/s00371-020-01943-0
Article Google Scholar
Li, L., Tang, J., Shao, Z., Tan, X., Ma, L.: Sketch-to-photo face generation based on semantic consistency preserving and similar connected component refinement. The Visual Computer, 1–18 (2021). https://doi.org/10.1007/s00371-021-02188-1
Reisfeld, E., Sharf, A.: Onesketch: learning high-level shape features from simple sketches. The Visual Computer (2022). https://doi.org/10.1007/s00371-022-02494-2
Kang, H.W., He, W., Chui, C.K., Chakraborty, U.K.: Interactive sketch generation. The Visual Computer 21(8), 821–830 (2005). https://doi.org/10.1007/s00371-005-0328-9
Article Google Scholar
Shao, M., Zhang, Y., Liu, H., Wang, C., Li, L., Shao, X.: Dmdit: Diverse multi-domain image-to-image translation. Knowledge-Based Systems 229, 107311 (2021). https://doi.org/10.1016/j.knosys.2021.107311
Article Google Scholar
Shao, M., Zhang, Y., Fan, Y., Zuo, W., Meng, D.: Iit-gat: Instance-level image transformation via unsupervised generative attention networks with disentangled representations. Knowledge-Based Systems 225, 107122 (2021)
Article Google Scholar
Song, X., Shao, M., Zuo, W., Li, C.: Face attribute editing based on generative adversarial networks. Signal, Image and Video Processing 14(6), 1217–1225 (2020)
Article Google Scholar
Xia, W., Zhang, Y., Yang, Y., Xue, J.-H., Zhou, B., Yang, M.-H.: Gan inversion: A survey. arXiv preprint arXiv:2101.05278 (2021)
Ma, F., Ayaz, U., Karaman, S.: Invertibility of convolutional generative networks from partial measurements. Advances in Neural Information Processing Systems 31, 9651–9660 (2018)
Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial network. IEEE transactions on neural networks and learning systems 30(7), 1967–1974 (2018)
Article Google Scholar
Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: How to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613 (2016). Springer
Bau, D., Zhu, J.-Y., Wulff, J., Peebles, W., Strobelt, H., Zhou, B., Torralba, A.: Seeing what a gan cannot generate. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4502–4511 (2019)
Alaluf, Y., Patashnik, O., Cohen-Or, D.: Restyle: A residual-based stylegan encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6711–6720 (2021)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Gu, J., Shen, Y., Zhou, B.: Image processing using multi-code gan prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3012–3021 (2020)
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Li, L., Tang, J., Ye, Z., Sheng, B., Mao, L., Ma, L.: Unsupervised face super-resolution via gradient enhancement and semantic guidance. The Visual Computer 37(9), 2855–2867 (2021). https://doi.org/10.1007/s00371-021-02236-w
Article Google Scholar
Fan, Y., Shao, M., Zuo, W., Li, Q.: Unsupervised image-to-image translation using intra-domain reconstruction loss. International Journal of Machine Learning and Cybernetics 11(9), 2077–2088 (2020)
Article Google Scholar
Lan, J., Ye, F., Ye, Z., Xu, P., Ling, W.-K., Huang, G.: Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation. The Visual Computer (2022). https://doi.org/10.1007/s00371-022-02719-4
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)
Jie, Shen: Samuel, Albanie, Gang, Sun, Enhua: Squeeze-and-excitation networks. IEEE transactions on pattern analysis and machine intelligence (2019). https://doi.org/10.1109/TPAMI.2019.2913372
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, 84–90 (2012)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Wright, L.: Ranger - a synergistic optimizer. GitHub (2019). https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv arXiv preprint arXiv:1710.10196 (2017). http://arxiv.org/abs/1710.10196
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Ma, F., Ayaz, U., Karaman, S.: Invertibility of convolutional generative networks from partial measurements. Advances in Neural Information Processing Systems 31, 9651–9660 (2018)
Google Scholar
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613 (2016). Springer
Mohammadi, P., Ebrahimi-Moghadam, A., Shirani, S.: Subjective and objective quality assessment of image: A survey. Majlesi Journal of Electrical Engineering 9, 55–83 (2014)
Google Scholar
Venkatanath, N., Praneeth, D., Bh, M.C., Channappayya, S.S., Medasani, S.S.: Blind image quality evaluation using perception based features. In: 2015 Twenty First National Conference on Communications (NCC), pp. 1–6 (2015). https://doi.org/10.1109/NCC.2015.7084843
Shaham, T.R., Gharbi, M., Zhang, R., Shechtman, E., Michaeli, T.: Spatially-adaptive pixelwise networks for fast image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14882–14891 (2021)
Wang, X., Tang, X.: Face photo-sketch synthesis and recognition. IEEE transactions on pattern analysis and machine intelligence 31(11), 1955–1967 (2008)
Article Google Scholar
Simo-Serra, E., Iizuka, S., Sasaki, K., Ishikawa, H.: Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Transactions on Graphics (TOG) 35(4), 1–11 (2016)
Article Google Scholar
Chen, S.-Y., Su, W., Gao, L., Xia, S., Fu, H.: Deepfacedrawing: Deep generation of face images from sketches. ACM Transactions on Graphics (TOG) 39(4), 72–1 (2020)
Article Google Scholar

Download references

Acknowledgements

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper. This work was supported by National Key Research and development Program of China (2021YFA1000102), and in part by the grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245), Natural Science Foundation of Shandong Province (No: ZR2022MF260).

Author information

Authors and Affiliations

College of Computer Science and Technology, China University of Petroleum, Qingdao, China
Zihao Guo, Mingwen Shao & Shunhang Li

Authors

Zihao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Mingwen Shao
View author publications
You can also search for this author in PubMed Google Scholar
Shunhang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingwen Shao.

Ethics declarations

Conflict of interest

The authors have no financial or proprietary interests in any material discussed in this article.

Ethical approval

This work is original research that has not been published before and is not considered for publication elsewhere.

Humans or animal rights

This article does not include any studies of humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Guo, Z., Shao, M. & Li, S. Image-to-image translation using an offset-based multi-scale codes GAN encoder. Vis Comput 40, 699–715 (2024). https://doi.org/10.1007/s00371-023-02810-4

Download citation

Accepted: 12 February 2023
Published: 04 March 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00371-023-02810-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image-to-image translation using an offset-based multi-scale codes GAN encoder

Abstract

Access this article

Similar content being viewed by others

InvolutionGAN: lightweight GAN with involution for unsupervised image-to-image translation

Efficient High-Resolution Image-to-Image Translation Using Multi-Scale Gradient U-Net

Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Humans or animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image-to-image translation using an offset-based multi-scale codes GAN encoder

Abstract

Access this article

Similar content being viewed by others

InvolutionGAN: lightweight GAN with involution for unsupervised image-to-image translation

Efficient High-Resolution Image-to-Image Translation Using Multi-Scale Gradient U-Net

Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Humans or animal rights

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation