Word Alignment Quality in the IBM 2 Mixture Model
Jorge Civera, Alfons Juan
2008
Abstract
Finite mixture modelling is a standard pattern recognition technique. However, in statistical machine translation (SMT), the use of mixture modelling is currently being explored. Two main advantages of the mixture approach are first, its flexibility to find an appropriate tradeoff between model complexity and the amount of training data available and second, its capability to learn specific probability distributions that better fit subsets of the training dataset. This latter advantage is even more important in SMT, since it is widely accepted that most state-of-the-art translation models proposed have limited application to restricted semantic domains. In this work, we revisit the mixture extension of the well-known M21 translation model. The M2 mixture model is evaluated on a word alignment large-scale task obtaining encouraging results that prove the applicability of finite mixture modelling in SMT.
DownloadPaper Citation
in Harvard Style
Civera J. and Juan A. (2008). Word Alignment Quality in the IBM 2 Mixture Model . In Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008) ISBN 978-989-8111-42-5, pages 93-102. DOI: 10.5220/0001739700930102
in Bibtex Style
@conference{pris08,
author={Jorge Civera and Alfons Juan},
title={Word Alignment Quality in the IBM 2 Mixture Model},
booktitle={Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008)},
year={2008},
pages={93-102},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001739700930102},
isbn={978-989-8111-42-5},
}
in EndNote Style
TY  - CONF 
JO  - Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2008)
TI  - Word Alignment Quality in the IBM 2 Mixture Model
SN  - 978-989-8111-42-5
AU  - Civera J. 
AU  - Juan A. 
PY  - 2008
SP  - 93
EP  - 102
DO  - 10.5220/0001739700930102