Moses

From CompSemWiki
Jump to navigationJump to search

Introduction

Moses is the statistical machine translation decoder. For more information, please refer to their official web site http://www.statmt.org/moses/

Install

For more information about how to install MGIZA++, please go to Install Moses.

Input Format

Basic Input

Phrase Translation Table

The phrase translation table provide the phrase translation probability between source and target language pair. The example file is as follows:

der ||| the ||| 0.3
das ist ||| this is ||| 0.8

The first column is the source language (Dutch), and the second column is the target language (English). And final one is the probability of this translation pair. '|||' is used for separation.

Language Model

Moses accepts language model generated from three softwares.

  • SRI language modeling toolkit (SRILM)
  • IRST language modeling toolkit (IRSTLM)
  • RandLM language modeling toolkit (RandLM)

In VERBS, SRILM is the only software installed. For more information about how to run SRILM, please read the Moses#Run part.

Advanced Input

Factored Model

The moses model trains not only the surface word but also other external information, e.g., POS and semantic role. To use these additional information, the phrase table should modify as following:

unsoliden ||| unsound|jj ,|, ||| 1
unsoliden ||| unsound|jj ||| 0.5
unten ||| at|in ||| 0.2

In the second columns, the phrases not only show the surface word but also the POS tag. Separate these two fields with one |. Also, Other additional informations are also contacted in the following.

In the language model, the surface word and other additional information should decode separately. For instance, if you want to use POS and semantic role as factors, you have to train three language models, i.e., surface language mode, POS language model, and semantic role language model.

Parameters

It is recommended to make an .ini file to storage all of your setting. Here is a simple example of the .ini file

#########################
### MOSES CONFIG FILE ###
#########################
 
# input factors
[input-factors]
0
 
# mapping steps, either (T) translation or (G) generation
[mapping]
T 0
 
# translation tables: source-factors, target-factors, number of scores, file
[ttable-file]
0 0 1 phrase-table
 
# language models: type(srilm/irstlm), factors, order, file
[lmodel-file]
0 0 3 ../lm/europarl.srilm.gz
 
# limit on how many phrase translations e for each phrase f are loaded
[ttable-limit]
10
 
# distortion (reordering) weight
[weight-d]
1
 
# language model weights
[weight-l]
1
 
# translation model weights
[weight-t]
1
 
# word penalty
[weight-w]
0
  • input-factors
    • Using factor model or not
  • mapping
    • To use LM in memory (T) or read the file in hard disk directly (G)
  • ttable-file
    • Indicate the num. of source-factor, num. of target-factor, num of score, and the path to translation table file
  • lmodel-file
    • Indicate the type using for LM (0:SRILM, 1:IRSTLM), using factor number, the order (n-gram) of LM, and the path to language model file

Run

Before running moses, you have to build your own language model file by SRILM or IRSTLM

srilm/i686/ngram-count -order 5 -interpolate -kndiscount -text working-dir/lm/europarl.lowercased -lm working-dir/lm/europarl.lm

where -text is setting the path to original target language sentences file, and -lm indicates the location to store your language model file

Then you can start to run moses. Just easily input

moses/moses -f config.ini < inputfile > outputfile

where config.ini is the configure .ini file you save in the Parameters sector, the inputfile is the source language sentences you want to translate, and the outputfile is the filename to store all of the translated result.

Path Information on Verbs

  • SRILM
    • /home/verbs/shared/stages/tools/srilm/
  • Moses
    • /home/verbs/shared/stages/tools/moses/

Reference

http://www.statmt.org/moses/