Skip to main content
Log in

Image-to-image translation using an offset-based multi-scale codes GAN encoder

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Despite the remarkable achievements of generative adversarial networks (GANs) in high-quality image synthesis, applying pre-trained GAN models to image-to-image translation is still challenging. Previous approaches typically map the conditional image into the latent spaces of GANs by per-image optimization or learning a GAN encoder. However, neither of these two methods can ideally perform image-to-image translation tasks. In this work, we propose a novel learning-based framework which can complete common image-to-image translation tasks with high quality in real-time based on pre-trained GANs. Specifically, to mitigate the semantic misalignment between conditional and synthesized images, we propose an offset-based image synthesis method that allows our encoder to use multiple rather than one forward propagation to predict the latent codes. During the multiple forward passes, the final latent codes are adjusted continuously according to the semantic difference between the conditional image and the current synthesized image. To further reduce the loss of details during encoding, we extract multiple latent codes at multiple scales from input instead of a single code to synthesize the image. Moreover, we propose an optional multiple feature maps fusion module that combines our encoder with different generators to implement our multiple latent codes strategies. Finally, we analyze the performance and demonstrate the effectiveness of our framework by comparing it with state-of-the-art works on super-resolution and conditional face synthesis tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data Availability Statement

The datasets generated or analyzed during this study are available in the CelebA-HQ repository, http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, and the LSUN repository, https://www.yf.io/p/lsun.

References

  1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Advances in neural information processing systems 27, 2672–2680 (2014)

    Google Scholar 

  2. Gui, J., Sun, Z., Wen, Y., Tao, D., Ye, J.: A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Transactions on Knowledge and Data Engineering (2021). https://doi.org/10.1109/TKDE.2021.3130191

  3. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

  4. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)

  5. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)

  6. Song, H., Wang, M., Zhang, L., Li, Y., Jiang, Z., Yin, G.: S2rgan: sonar-image super-resolution based on generative adversarial network. The Visual Computer 37(8), 2285–2299 (2021). https://doi.org/10.1007/s00371-020-01986-3

    Article  Google Scholar 

  7. Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the Ieee/cvf Conference on Computer Vision and Pattern Recognition, pp. 2437–2445 (2020)

  8. Chan, K.C., Wang, X., Xu, X., Gu, J., Loy, C.C.: Glean: Generative latent bank for large-factor image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14245–14254 (2021)

  9. Xiu, J., Qu, X., Yu, H.: Double discriminative face super-resolution network with facial landmark heatmaps. The Visual Computer (2022). https://doi.org/10.1007/s00371-022-02701-0

  10. Bai, J., Chen, R., Liu, M.: Feature-attention module for context-aware image-to-image translation. The Visual Computer 36(10), 2145–2159 (2020). https://doi.org/10.1007/s00371-020-01943-0

    Article  Google Scholar 

  11. Li, L., Tang, J., Shao, Z., Tan, X., Ma, L.: Sketch-to-photo face generation based on semantic consistency preserving and similar connected component refinement. The Visual Computer, 1–18 (2021). https://doi.org/10.1007/s00371-021-02188-1

  12. Reisfeld, E., Sharf, A.: Onesketch: learning high-level shape features from simple sketches. The Visual Computer (2022). https://doi.org/10.1007/s00371-022-02494-2

  13. Kang, H.W., He, W., Chui, C.K., Chakraborty, U.K.: Interactive sketch generation. The Visual Computer 21(8), 821–830 (2005). https://doi.org/10.1007/s00371-005-0328-9

    Article  Google Scholar 

  14. Shao, M., Zhang, Y., Liu, H., Wang, C., Li, L., Shao, X.: Dmdit: Diverse multi-domain image-to-image translation. Knowledge-Based Systems 229, 107311 (2021). https://doi.org/10.1016/j.knosys.2021.107311

    Article  Google Scholar 

  15. Shao, M., Zhang, Y., Fan, Y., Zuo, W., Meng, D.: Iit-gat: Instance-level image transformation via unsupervised generative attention networks with disentangled representations. Knowledge-Based Systems 225, 107122 (2021)

    Article  Google Scholar 

  16. Song, X., Shao, M., Zuo, W., Li, C.: Face attribute editing based on generative adversarial networks. Signal, Image and Video Processing 14(6), 1217–1225 (2020)

    Article  Google Scholar 

  17. Xia, W., Zhang, Y., Yang, Y., Xue, J.-H., Zhou, B., Yang, M.-H.: Gan inversion: A survey. arXiv preprint arXiv:2101.05278 (2021)

  18. Ma, F., Ayaz, U., Karaman, S.: Invertibility of convolutional generative networks from partial measurements. Advances in Neural Information Processing Systems 31, 9651–9660 (2018)

  19. Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial network. IEEE transactions on neural networks and learning systems 30(7), 1967–1974 (2018)

    Article  Google Scholar 

  20. Abdal, R., Qin, Y., Wonka, P.: Image2stylegan: How to embed images into the stylegan latent space? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441 (2019)

  21. Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613 (2016). Springer

  22. Bau, D., Zhu, J.-Y., Wulff, J., Peebles, W., Strobelt, H., Zhou, B., Torralba, A.: Seeing what a gan cannot generate. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4502–4511 (2019)

  23. Alaluf, Y., Patashnik, O., Cohen-Or, D.: Restyle: A residual-based stylegan encoder via iterative refinement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6711–6720 (2021)

  24. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  25. Gu, J., Shen, Y., Zhou, B.: Image processing using multi-code gan prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3012–3021 (2020)

  26. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)

  27. Li, L., Tang, J., Ye, Z., Sheng, B., Mao, L., Ma, L.: Unsupervised face super-resolution via gradient enhancement and semantic guidance. The Visual Computer 37(9), 2855–2867 (2021). https://doi.org/10.1007/s00371-021-02236-w

    Article  Google Scholar 

  28. Fan, Y., Shao, M., Zuo, W., Li, Q.: Unsupervised image-to-image translation using intra-domain reconstruction loss. International Journal of Machine Learning and Cybernetics 11(9), 2077–2088 (2020)

    Article  Google Scholar 

  29. Lan, J., Ye, F., Ye, Z., Xu, P., Ling, W.-K., Huang, G.: Unsupervised style-guided cross-domain adaptation for few-shot stylized face translation. The Visual Computer (2022). https://doi.org/10.1007/s00371-022-02719-4

  30. Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., Cohen-Or, D.: Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2287–2296 (2021)

  31. Jie, Shen: Samuel, Albanie, Gang, Sun, Enhua: Squeeze-and-excitation networks. IEEE transactions on pattern analysis and machine intelligence (2019). https://doi.org/10.1109/TPAMI.2019.2913372

  32. Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)

  33. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

  34. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, 84–90 (2012)

  35. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699 (2019)

  36. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  37. Wright, L.: Ranger - a synergistic optimizer. GitHub (2019). https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

  38. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv arXiv preprint arXiv:1710.10196 (2017). http://arxiv.org/abs/1710.10196

  39. Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

  40. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

  41. Ma, F., Ayaz, U., Karaman, S.: Invertibility of convolutional generative networks from partial measurements. Advances in Neural Information Processing Systems 31, 9651–9660 (2018)

    Google Scholar 

  42. Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613 (2016). Springer

  43. Mohammadi, P., Ebrahimi-Moghadam, A., Shirani, S.: Subjective and objective quality assessment of image: A survey. Majlesi Journal of Electrical Engineering 9, 55–83 (2014)

    Google Scholar 

  44. Venkatanath, N., Praneeth, D., Bh, M.C., Channappayya, S.S., Medasani, S.S.: Blind image quality evaluation using perception based features. In: 2015 Twenty First National Conference on Communications (NCC), pp. 1–6 (2015). https://doi.org/10.1109/NCC.2015.7084843

  45. Shaham, T.R., Gharbi, M., Zhang, R., Shechtman, E., Michaeli, T.: Spatially-adaptive pixelwise networks for fast image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14882–14891 (2021)

  46. Wang, X., Tang, X.: Face photo-sketch synthesis and recognition. IEEE transactions on pattern analysis and machine intelligence 31(11), 1955–1967 (2008)

    Article  Google Scholar 

  47. Simo-Serra, E., Iizuka, S., Sasaki, K., Ishikawa, H.: Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Transactions on Graphics (TOG) 35(4), 1–11 (2016)

    Article  Google Scholar 

  48. Chen, S.-Y., Su, W., Gao, L., Xia, S., Fu, H.: Deepfacedrawing: Deep generation of face images from sketches. ACM Transactions on Graphics (TOG) 39(4), 72–1 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper. This work was supported by National Key Research and development Program of China (2021YFA1000102), and in part by the grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245), Natural Science Foundation of Shandong Province (No: ZR2022MF260).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingwen Shao.

Ethics declarations

Conflict of interest

The authors have no financial or proprietary interests in any material discussed in this article.

Ethical approval

This work is original research that has not been published before and is not considered for publication elsewhere.

Humans or animal rights

This article does not include any studies of humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, Z., Shao, M. & Li, S. Image-to-image translation using an offset-based multi-scale codes GAN encoder. Vis Comput 40, 699–715 (2024). https://doi.org/10.1007/s00371-023-02810-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02810-4

Keywords

Navigation