Difference between revisions of "MGIZA++"
(7 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
MGIZA++ is a software based on the famous word-alignment software GIZA++. Since GIZA++ is an signal-processing software and the processing of GIZA++ is time-consuming, MGIZA++ modify the structure of GIZA++ and then support the multi-thread architecture. | MGIZA++ is a software based on the famous word-alignment software GIZA++. Since GIZA++ is an signal-processing software and the processing of GIZA++ is time-consuming, MGIZA++ modify the structure of GIZA++ and then support the multi-thread architecture. | ||
− | == Support Word Alignment Model | + | == Support Word Alignment Model == |
* IBM Model 1 | * IBM Model 1 | ||
* IBM Model 2 | * IBM Model 2 | ||
Line 15: | Line 15: | ||
== Input File Format == | == Input File Format == | ||
+ | The input format of MGIZA++ is almost the same with [[GIZA++]], please read the [[GIZA++#Input_Format]] part to get more idea. | ||
+ | |||
+ | The only different between these two softwares is for the cooccurence file. MGIZA++ needs cooccurence file for processing. To get the coocurence file, please input the commend | ||
+ | |||
+ | giza-pp/snt2cooc.out vocFile1 vocFile2 snt12 | ||
== Parameters == | == Parameters == | ||
+ | The parameters for MGIZA++ is also almost the same with GIZA++, please refer to the [[GIZA++#Parameters]] part. | ||
+ | |||
+ | Since the MGIZA++ is for multi-processing GIZA++ training, the only different parameter is the number of CPUs using in the processing. Please indicate how many CPUs by using -ncpu NUM. | ||
== Run == | == Run == | ||
+ | The configure file is the same with GIZA++. Please refer to [[GIZA++#Run]]. | ||
+ | mgiza-pp/mgiza configure.gizacfg -ncpu NUM | ||
+ | |||
+ | == Path Information on Verbs == | ||
+ | * MGIZA++ | ||
+ | ** /home/verbs/shared/stages/tools/mgiza-pp/ | ||
== Reference == | == Reference == | ||
+ | http://geek.kyloo.net/software/doku.php/mgiza:overview | ||
− | + | [[Category:Machine Translation]] |
Latest revision as of 10:51, 22 September 2010
Introduction
MGIZA++ is a software based on the famous word-alignment software GIZA++. Since GIZA++ is an signal-processing software and the processing of GIZA++ is time-consuming, MGIZA++ modify the structure of GIZA++ and then support the multi-thread architecture.
Support Word Alignment Model
- IBM Model 1
- IBM Model 2
- IBM Model 3
- IBM Model 4
- IBM Model 5
- IBM Model 6
- Hidden Markov Model
Install
For more information about how to install MGIZA++, please go to Install MGIZA++.
Input File Format
The input format of MGIZA++ is almost the same with GIZA++, please read the GIZA++#Input_Format part to get more idea.
The only different between these two softwares is for the cooccurence file. MGIZA++ needs cooccurence file for processing. To get the coocurence file, please input the commend
giza-pp/snt2cooc.out vocFile1 vocFile2 snt12
Parameters
The parameters for MGIZA++ is also almost the same with GIZA++, please refer to the GIZA++#Parameters part.
Since the MGIZA++ is for multi-processing GIZA++ training, the only different parameter is the number of CPUs using in the processing. Please indicate how many CPUs by using -ncpu NUM.
Run
The configure file is the same with GIZA++. Please refer to GIZA++#Run.
mgiza-pp/mgiza configure.gizacfg -ncpu NUM
Path Information on Verbs
- MGIZA++
- /home/verbs/shared/stages/tools/mgiza-pp/