PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments

Zaiqiang Wu, I-Chao Shen, Takeo Igarashi

The University of Tokyo
Pacific Graphics 2025

Paper Code arXiv

Our method enables real-time virtual try-on for loose-fitting garments, such as Chinese hanfu, as demonstrated in the video.

Abstract

Per-garment virtual try-on methods collect garment-specific datasets and train networks tailored to each garment to achieve superior results. However, these approaches often struggle with loose-fitting garments due to two key limitations: (1) They rely on human body semantic maps to align garments with the body, but these maps become unreliable when body contours are obscured by loose-fitting garments, resulting in degraded outcomes; (2) They train garment synthesis networks on a per-frame basis without utilizing temporal information, leading to noticeable jittering artifacts. To address the first limitation, we propose a two-stage approach for robust semantic map estimation. First, we extract a garment-invariant representation from the raw input image. This representation is then passed through an auxiliary network to estimate the semantic map. This enhances the robustness of semantic map estimation under loose-fitting garments during garment-specific dataset generation. To address the second limitation, we introduce a recurrent garment synthesis framework that incorporates temporal dependencies to improve frame-to-frame coherence while maintaining real-time performance. We conducted qualitative and quantitative evaluations to demonstrate that our method outperforms existing approaches in both image quality and temporal coherence. Ablation studies further validate the effectiveness of the garment-invariant representation and the recurrent synthesis framework.

Method

We propose a garment-invariant representation and a BodyMap network to enable generation of per-garment datasets for loose-fitting garments.

We train a recurrent garment synthesis network to learning the mapping from pose sequences to garment sequences.

Our trained networks enables real-time and temporally consistent virtual try-on for loose-fitting garments.

Results

Qualitative comparison against existing image-based methods.

Our network trained on a single human body generalize well to unseen body shapes.

We ablate the ConvLSTM module to validate its effectiveness in maintaining temporal consistency.

Video Presentation

BibTeX

@article{wu2025real,
  title={Real-Time Per-Garment Virtual Try-On with Temporal Consistency for Loose-Fitting Garments},
  author={Wu, Zaiqiang and Shen, I-Chao and Igarashi, Takeo},
  journal={arXiv preprint arXiv:2506.12348},
  year={2025}
}