We have been developing the statistical machine translation system for
speech to speech translation. We focus our research on text-to-text
translation task now, but we will include speech-to-speech translation among
our research topics soon. We have an interest in building a translation
model, decoding a word graph and combining statistical machine translation
system and speech recognizer.
Statistical Machine Translation
Input : SMT system gets a foreign sentence as a input.
Output : SMT system generates a native sentence which is a translation
of the input
Language Model is a model that provides the probability of an arbitarary
word sequence.
Translation Model is a model that provides the probabilities of possible
translation pairs.
Decoding Algorithm is a graph search algorithm that provides best path
on a word graph.
Decoding process
A Decoder is a core component of the SMT systzm. The decoder gets
possible partial translations from the translation model, then selects an
re-arranges them to make the best translation.
Initialize : create small partial model for caching an pre-calculate
future cost.
Hypothesis is a partial translation which generated by applying a series
of tranlation options.
Decoding process is iterations of two taks: choosing a hypothesis and
exapnding the hypothesis. The process terminates if there is no remainig
hypothesis to expand.
Speech to Speech Machine Translation
Speech to Speech Machine Translation can be achieved by cascading three
independent components: ASR, SMT system and TTS system.
That is, an output of ASR be an input for the SMT system and an output
an output of the SMT system be an input for the TTS systm.
We use cascading approach now, but we have an interest in joint model
which combines ASR and SMT decoder.
Demo Video
You can download and play the demo video of POSSMT/KE (POStech
Statistical Machine Translation) from here
You can download and play the demo video of POSSMT/KJ (POStech
Statistical Machine Translation) from here
You can download and play the demo video of POSSLT (POStech Spoken
Language Translation) from here