Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

Gaurav Rai; Ojaswa Sharma

doi:10.5220/0013304800003912

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints

Gaurav Rai, Ojaswa Sharma

2025

Abstract

Animating hand-drawn sketches using traditional tools is challenging and complex. Sketches provide a visual basis for explanations, and animating these sketches offers an experience of real-time scenarios. We propose an approach for animating a given input sketch based on a descriptive text prompt. Our method utilizes a parametric representation of the sketch’s strokes. Unlike previous methods, which struggle to estimate smooth and accurate motion and often fail to preserve the sketch’s topology, we leverage a pre-trained text-to-video diffusion model with SDS loss to guide the motion of the sketch’s strokes. We introduce length-area (LA) regularization to ensure temporal consistency by accurately estimating the smooth displacement of control points across the frame sequence. Additionally, to preserve shape and avoid topology changes, we apply a shape-preserving As-Rigid-As-Possible (ARAP) loss to maintain sketch rigidity. Our method surpasses state-of-the-art performance in both quantitative and qualitative evaluations. https://graphics-research-group.github.io/ESA/.

Download

Paper Citation

in Harvard Style

Rai G. and Sharma O. (2025). Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: GRAPP; ISBN 978-989-758-728-3, SciTePress, pages 151-160. DOI: 10.5220/0013304800003912

in Bibtex Style

@conference{grapp25,
author={Gaurav Rai and Ojaswa Sharma},
title={Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: GRAPP},
year={2025},
pages={151-160},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013304800003912},
isbn={978-989-758-728-3},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 1: GRAPP
TI - Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints
SN - 978-989-758-728-3
AU - Rai G.
AU - Sharma O.
PY - 2025
SP - 151
EP - 160
DO - 10.5220/0013304800003912
PB - SciTePress