11573/1744349 - 2025 -
Right Answer, Wrong Score: Uncovering the Inconsistencies of {LLM} Evaluation in Multiple-Choice Question Answering Molfese, Francesco Maria; Moroni, Luca; Gioffré, Luca; Scirè, Alessandro; Conia, Simone; Navigli, Roberto - 04b Atto di convegno in volume
congresso: Association for Computational Linguistics (Vienna, Austria)
libro: Findings of the Association for Computational Linguistics: {ACL}2025, Vienna, Austria, July 27–August 1st, 2025 - (979-8-89176-256-5)