Enhanced Multi-Attribute Fashion Image Editing Using Vision Transformer-Guided Diffusion Models

Zechen Zhao

doi:10.5220/0012938500004508

Enhanced Multi-Attribute Fashion Image Editing Using Vision Transformer-Guided Diffusion Models

Zechen Zhao

2024

Abstract

This research explores the application of off-the-shelf diffusion models for fashion imagery generation, aiming to advance attribute-specific image manipulation without the need for manual masking or dataset-specific model training. This study presents a new method that combines a multi-attribute classifier with an attention-pooling mechanism by utilizing the flexibility and generative capabilities of diffusion models. These models were initially trained on large visual datasets such as ImageNet. This method is crucial in directing the diffusion process and enabling specific modifications of various fashion features within a single framework. The classifier's design, based on the Vision Transformer (ViT) architecture, improves the process of manipulating attributes to generate fashion images with greater realism and diversity. The experimental validation confirms that the suggested method outperforms existing generative models in producing fashion images that are of high quality and accurately represent the desired attributes. They provide significant improvements in image quality, attribute integrity, and editing flexibility.

Download

Paper Citation

in Harvard Style

Zhao Z. (2024). Enhanced Multi-Attribute Fashion Image Editing Using Vision Transformer-Guided Diffusion Models. In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI; ISBN 978-989-758-713-9, SciTePress, pages 380-385. DOI: 10.5220/0012938500004508

in Bibtex Style

@conference{emiti24,
author={Zechen Zhao},
title={Enhanced Multi-Attribute Fashion Image Editing Using Vision Transformer-Guided Diffusion Models},
booktitle={Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI},
year={2024},
pages={380-385},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012938500004508},
isbn={978-989-758-713-9},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI
TI - Enhanced Multi-Attribute Fashion Image Editing Using Vision Transformer-Guided Diffusion Models
SN - 978-989-758-713-9
AU - Zhao Z.
PY - 2024
SP - 380
EP - 385
DO - 10.5220/0012938500004508
PB - SciTePress