Difference between revisions of "Moses"
(→Run) |
|||
(5 intermediate revisions by the same user not shown) | |||
Line 20: | Line 20: | ||
In VERBS, SRILM is the only software installed. For more information about how to run SRILM, please read the [[Moses#Run]] part. | In VERBS, SRILM is the only software installed. For more information about how to run SRILM, please read the [[Moses#Run]] part. | ||
+ | |||
+ | === Advanced Input === | ||
+ | ==== Factored Model ==== | ||
+ | The moses model trains not only the surface word but also other external information, e.g., POS and semantic role. To use these additional information, the phrase table should modify as following: | ||
+ | |||
+ | unsoliden ||| unsound|jj ,|, ||| 1 | ||
+ | unsoliden ||| unsound|jj ||| 0.5 | ||
+ | unten ||| at|in ||| 0.2 | ||
+ | |||
+ | In the second columns, the phrases not only show the surface word but also the POS tag. Separate these two fields with one '''|'''. Also, Other additional informations are also contacted in the following. | ||
+ | |||
+ | In the language model, the surface word and other additional information should decode separately. For instance, if you want to use POS and semantic role as factors, you have to train three language models, i.e., surface language mode, POS language model, and semantic role language model. | ||
== Parameters == | == Parameters == | ||
Line 76: | Line 88: | ||
== Run == | == Run == | ||
− | Just easily input | + | Before running moses, you have to build your own language model file by SRILM or IRSTLM |
+ | srilm/i686/ngram-count -order 5 -interpolate -kndiscount -text working-dir/lm/europarl.lowercased -lm working-dir/lm/europarl.lm | ||
+ | where -text is setting the path to original target language sentences file, and -lm indicates the location to store your language model file | ||
+ | |||
+ | Then you can start to run moses. Just easily input | ||
moses/moses -f config.ini < inputfile > outputfile | moses/moses -f config.ini < inputfile > outputfile | ||
Line 82: | Line 98: | ||
== Path Information on Verbs == | == Path Information on Verbs == | ||
+ | * SRILM | ||
+ | ** /home/verbs/shared/stages/tools/srilm/ | ||
+ | * Moses | ||
+ | ** /home/verbs/shared/stages/tools/moses/ | ||
+ | |||
== Reference == | == Reference == | ||
http://www.statmt.org/moses/ | http://www.statmt.org/moses/ | ||
[[Category:Machine Translation]] | [[Category:Machine Translation]] |
Latest revision as of 12:18, 1 October 2010
Introduction
Moses is the statistical machine translation decoder. For more information, please refer to their official web site http://www.statmt.org/moses/
Install
For more information about how to install MGIZA++, please go to Install Moses.
Input Format
Basic Input
Phrase Translation Table
The phrase translation table provide the phrase translation probability between source and target language pair. The example file is as follows:
der ||| the ||| 0.3 das ist ||| this is ||| 0.8
The first column is the source language (Dutch), and the second column is the target language (English). And final one is the probability of this translation pair. '|||' is used for separation.
Language Model
Moses accepts language model generated from three softwares.
- SRI language modeling toolkit (SRILM)
- IRST language modeling toolkit (IRSTLM)
- RandLM language modeling toolkit (RandLM)
In VERBS, SRILM is the only software installed. For more information about how to run SRILM, please read the Moses#Run part.
Advanced Input
Factored Model
The moses model trains not only the surface word but also other external information, e.g., POS and semantic role. To use these additional information, the phrase table should modify as following:
unsoliden ||| unsound|jj ,|, ||| 1 unsoliden ||| unsound|jj ||| 0.5 unten ||| at|in ||| 0.2
In the second columns, the phrases not only show the surface word but also the POS tag. Separate these two fields with one |. Also, Other additional informations are also contacted in the following.
In the language model, the surface word and other additional information should decode separately. For instance, if you want to use POS and semantic role as factors, you have to train three language models, i.e., surface language mode, POS language model, and semantic role language model.
Parameters
It is recommended to make an .ini file to storage all of your setting. Here is a simple example of the .ini file
######################### ### MOSES CONFIG FILE ### ######################### # input factors [input-factors] 0 # mapping steps, either (T) translation or (G) generation [mapping] T 0 # translation tables: source-factors, target-factors, number of scores, file [ttable-file] 0 0 1 phrase-table # language models: type(srilm/irstlm), factors, order, file [lmodel-file] 0 0 3 ../lm/europarl.srilm.gz # limit on how many phrase translations e for each phrase f are loaded [ttable-limit] 10 # distortion (reordering) weight [weight-d] 1 # language model weights [weight-l] 1 # translation model weights [weight-t] 1 # word penalty [weight-w] 0
- input-factors
- Using factor model or not
- mapping
- To use LM in memory (T) or read the file in hard disk directly (G)
- ttable-file
- Indicate the num. of source-factor, num. of target-factor, num of score, and the path to translation table file
- lmodel-file
- Indicate the type using for LM (0:SRILM, 1:IRSTLM), using factor number, the order (n-gram) of LM, and the path to language model file
Run
Before running moses, you have to build your own language model file by SRILM or IRSTLM
srilm/i686/ngram-count -order 5 -interpolate -kndiscount -text working-dir/lm/europarl.lowercased -lm working-dir/lm/europarl.lm
where -text is setting the path to original target language sentences file, and -lm indicates the location to store your language model file
Then you can start to run moses. Just easily input
moses/moses -f config.ini < inputfile > outputfile
where config.ini is the configure .ini file you save in the Parameters sector, the inputfile is the source language sentences you want to translate, and the outputfile is the filename to store all of the translated result.
Path Information on Verbs
- SRILM
- /home/verbs/shared/stages/tools/srilm/
- Moses
- /home/verbs/shared/stages/tools/moses/