Figure 5: Comparison of height maps generated by different
models. The left image shows the conversion result using
the pix2pix model, the middle image shows the result from
the original pix2pix-turbo model, and the right image
displays the result from the improved pix2pix-turbo model
(Photo/Picture credit: Original).
Additionally, the dataset includes many examples
where the source and target images have weak
correlations. For instance, in the Ceramic category, a
significant portion of the materials are tiles. This
means that even if the base color image has complex
patterns, the height map might simply consist of
straightforward square segments. These examples do
not provide the model with meaningful variation
patterns, which contributes to a decline in the final
generated quality.
4 CONCLUSIONS
This paper improves the pix2pix-turbo model by
incorporating multi-layer LoRA for material
generation and introducing a classifier along with a
combination of domain-specific expert models and a
general expert model. The improved model achieved
a minimum MSE of 1181.41 and a maximum SSIM
of 0.614 on the MatSynth dataset's height maps,
which represents a reduction in MSE by 1,443.53 and
an increase in SSIM by 0.13 compared to the original
model. The results demonstrate that the improved
model has a strong capability for PBR material image
conversion, allowing for the rapid generation of high-
quality PBR material images from input basecolor
images.
From the experimental results, it is evident that
the modified pix2pix-turbo model for PBR material
conversion performs better than the original model
and the standard pix2pix model. Additionally, the
CLIP text-image alignment tool shows that more
precise input leads to better material generation
results. However, CLIP may not fully understand
certain terms. For example, the quality of images
generated with the terms "rough" and "smooth" for
roughness material does not match the quality
achieved for height and normal maps.
Future research could focus on the semantic
aspects of images, using other models to evaluate
height, normal, and roughness features in specific
areas. Combining models like pix2pix-turbo with
advanced image semantic understanding could
enhance the realism and accuracy of PBR material
conversion effects, addressing the model's current
limitations in roughness material conversion.
REFERENCES
Guo, Y., Smith, C., Hašan, M., Sunkavalli, K. and Zhao, S.,
2020. MaterialGAN: Reflectance capture using a
generative SVBRDF model. arXiv preprint
arXiv:2010.00114.
Hu, Y., Hašan, M., Guerrero, P., Rushmeier, H. and
Deschaintre, V., 2022. Controlling material
appearance by examples. Computer Graphics Forum,
41(4), pp.117-128.
Isola, P., Zhu, J. Y., Zhou, T. and Efros, A. A., 2017. Image-
to-image translation with conditional adversarial
networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp.1125-
1134.
Martin, R., Roullier, A., Rouffet, R., Kaiser, A. and
Boubekeur, T., 2022. MaterIA: Single image high
‐
resolution material capture in the wild. Computer
Graphics Forum, 41(2), pp.163-177.
Parmar, G., Park, T., Narasimhan, S. and Zhu, J. Y., 2024.
One-step image translation with text-to-image models.
arXiv preprint arXiv:2403.12036.
Pharr, M. and Humphreys, G., 2004. Physically based
rendering: From theory to implementation.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S. and Sutskever, I., 2021. Learning
transferable visual models from natural language
supervision. In International Conference on Machine
Learning, pp.8748-8763. PMLR.
Siddiqui, Y., Monnier, T., Kokkinos, F., Kariya, M.,
Kleiman, Y., Garreau, E. and Novotny, D., 2024. Meta
3D AssetGen: Text-to-mesh generation with high-
quality geometry, texture, and PBR materials. arXiv
preprint arXiv:2407.02445.
Vecchio, G., 2024. StableMaterials: Enhancing diversity in
material generation via semi-supervised learning.
arXiv preprint arXiv:2406.09293.
Vecchio, G. and Deschaintre, V., 2024. MatSynth: A
modern PBR materials dataset. In Proceedings of the
IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp.22109-22118.
Vecchio, G., Martin, R., Roullier, A., Kaiser, A., Rouffet,
R., Deschaintre, V. and Boubekeur, T., 2023.
Controlmat: A controlled generative approach to
material capture. ACM Transactions on Graphics.