Beyond Deterministic Alignment: Probabilistic VisionLanguage Representations for Pedestrian Re-Identification

Authors

  • Hiroshi Tanaka Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan Author
  • Yuki Nakamura Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan Author
  • Kenji Sato Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan Author
  • Ayumi Kobayashi Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan Author

DOI:

https://doi.org/10.71465/mrcis210

Keywords:

Pedestrian re-identification, uncertainty modeling, probabilistic learning, vision language models, autonomous driving

Abstract

Uncertainty estimation plays an important role in perception tasks for autonomous  systems. Motivated by CLIP-based uncertainty modal modeling, this study formulates pedestrian  representations as probabilistic embeddings rather than fixed feature vectors. A variational  learning strategy is employed to model both data-dependent and model-related uncertainty across  visual and textual modalities. The approach is evaluated on three pedestrian re-identification  benchmarks containing over 95,000 identities and 780,000 images. Performance is compared with  deterministic baselines, including ResNet-based ReID, Transformer-based ReID, and CLIP derived embedding models. Results show average gains of 3.1% in rank-1 accuracy and 3.7% in  mAP, along with a reduction of expected calibration error by 18%–24% in occlusion-heavy  scenarios. 

Downloads

Published

2026-01-31