This repository contains the "Idioms in Context" dataset used in our ACL 2024 paper: The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities.
The dataset consists of idiomatic expressions in context and their human-written translations. It covers 2 language pairs (English-German and English-Russian) with 3 translation directions:
- English → German
- German → English
- Russian → English
The dataset is designed to evaluate the performance of large language models and machine translation systems in handling idiomatic expressions, which can be challenging due to their non-literal meanings.
If you use this dataset in your work, please cite our paper:
@misc{stap2024-idioms,
title={The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities},
author={David Stap and Eva Hasler and Bill Byrne and Christof Monz and Ke Tran},
year={2024},
eprint={2405.20089},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.20089},
}
See CONTRIBUTING for more information.
This dataset is licensed under the CC-BY-NC-4.0 License.