Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval

Published in Empirical Methods in Natural Language Processing (EMNLP) (Short), 2024

We quantify performance gaps between training on captions that come from native German perception and captions that have been either machine-translated or human-translated from English into German. To address these gaps, we further propose and evaluate caption augmentation strategies. \