Moses
Introduction
Moses is the statistical machine translation decoder. For more information, please refer to their official web site http://www.statmt.org/moses/
Install
For more information about how to install MGIZA++, please go to Install Moses.
Input Format
Basic Input
Phrase Translation Table
The phrase translation table provide the phrase translation probability between source and target language pair. The example file is as follows:
der ||| the ||| 0.3 das ist ||| this is ||| 0.8
The first column is the source language (Dutch), and the second column is the target language (English). And final one is the probability of this translation pair. '|||' is used for separation.
Language Model
Moses accepts language model generated from three softwares.
- SRI language modeling toolkit (SRILM)
- IRST language modeling toolkit (IRSTLM)
- RandLM language modeling toolkit (RandLM)
In VERBS, SRILM is the only software installed. For more information about how to run SRILM, please read the Moses#Run part.
Advanced Input
Factored Model
The moses model trains not only the surface word but also other external information, e.g., POS and semantic role. To use these additional information, the phrase table should modify as following:
Parameters
It is recommended to make an .ini file to storage all of your setting. Here is a simple example of the .ini file
######################### ### MOSES CONFIG FILE ### ######################### # input factors [input-factors] 0 # mapping steps, either (T) translation or (G) generation [mapping] T 0 # translation tables: source-factors, target-factors, number of scores, file [ttable-file] 0 0 1 phrase-table # language models: type(srilm/irstlm), factors, order, file [lmodel-file] 0 0 3 ../lm/europarl.srilm.gz # limit on how many phrase translations e for each phrase f are loaded [ttable-limit] 10 # distortion (reordering) weight [weight-d] 1 # language model weights [weight-l] 1 # translation model weights [weight-t] 1 # word penalty [weight-w] 0
- input-factors
- Using factor model or not
- mapping
- To use LM in memory (T) or read the file in hard disk directly (G)
- ttable-file
- Indicate the num. of source-factor, num. of target-factor, num of score, and the path to translation table file
- lmodel-file
- Indicate the type using for LM (0:SRILM, 1:IRSTLM), using factor number, the order (n-gram) of LM, and the path to language model file
Run
Before running moses, you have to build your own language model file by SRILM or IRSTLM
srilm/i686/ngram-count -order 5 -interpolate -kndiscount -text working-dir/lm/europarl.lowercased -lm working-dir/lm/europarl.lm
where -text is setting the path to original target language sentences file, and -lm indicates the location to store your language model file
Then you can start to run moses. Just easily input
moses/moses -f config.ini < inputfile > outputfile
where config.ini is the configure .ini file you save in the Parameters sector, the inputfile is the source language sentences you want to translate, and the outputfile is the filename to store all of the translated result.
Path Information on Verbs
- SRILM
- /home/verbs/shared/stages/tools/srilm/
- Moses
- /home/verbs/shared/stages/tools/moses/