Beyond Deterministic Alignment: Probabilistic VisionLanguage Representations for Pedestrian Re-Identification
DOI:
https://doi.org/10.71465/mrcis210Keywords:
Pedestrian re-identification, uncertainty modeling, probabilistic learning, vision language models, autonomous drivingAbstract
Uncertainty estimation plays an important role in perception tasks for autonomous systems. Motivated by CLIP-based uncertainty modal modeling, this study formulates pedestrian representations as probabilistic embeddings rather than fixed feature vectors. A variational learning strategy is employed to model both data-dependent and model-related uncertainty across visual and textual modalities. The approach is evaluated on three pedestrian re-identification benchmarks containing over 95,000 identities and 780,000 images. Performance is compared with deterministic baselines, including ResNet-based ReID, Transformer-based ReID, and CLIP derived embedding models. Results show average gains of 3.1% in rank-1 accuracy and 3.7% in mAP, along with a reduction of expected calibration error by 18%–24% in occlusion-heavy scenarios.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Hiroshi Tanaka , Yuki Nakamura , Kenji Sato , Ayumi Kobayashi (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in the Multidisciplinary Research in Computing Information Systems are licensed under an open-access model. Authors retain full copyright and grant the journal the right of first publication. The content can be freely accessed, distributed, and reused for non-commercial purposes, provided proper citation is given to the original work.
