235 16 12MB
English Pages 131 [132] Year 2023
Japanese Technology Reviews Editor in Chief Toshiaki Ikoma, University of Tokyo
Section Editors Section A: Electronics Toshiaki Ikoma, University of Tokyo
Section B: Computers and Communications KazumotO Iinuma, NEC Corporation, Kawasaki and Tadao SaitO, University of Tokyo
Section C: New Materials Hiroaki Yanagida, University of Tokyo and Noboru Ichinose, Waseda University, Tokyo
Section D: Manufacturing Engineering Fumio Harashima, University of Tokyo
Section E: Biotechnology Isao Karube, University of Tokyo Reiko Kuroda, University of Tokyo GENERAL INFORMATION Aims and Scope Japanese Technology Reviews is a series of tracts which examines the status and future prospects for Japanese technology.
Automatic Speech Translation Fundamental Technology for Future Cross-Language Communications
Editor in Chief Toshiaki Ikoma, University of Tokyo
Section Editors Section A: Electronics Section B: Computers and Communications
Toshiaki Ikoma, University of Tokyo Tadao SaitO, University of Tokyo KazumotO Iinuma, NEC Corporation, Kawasaki
Section C: New Materials Hiroaki Yanagida, University of Tokyo Noboru Ichinose, Waseda University, Tokyo
Section D: Manufacturing Fumio Harashima, University of Tokyo Engineering Section E: Biotechnology
Isao Karube, University of Tokyo Reiko Kuroda, University of Tokyo
Section B: Computers and Communications Volume 10
Machine Vision: A Practical Technology for Advanced Image Processing Masakazu Ejiri Volume 15
Cryptography and Security Edited by Shigeo Tsujii Volume 16
VLSI Neural Network Systems Yuzo Hirai Volume 28
Automatic Speech Translation: Fundamental Technology for Future Cross-Language Communications Akira Kurematsu and Tsuyoshi Morimoto
Automatic Speech Translation Fundamental Technology for Future Cross-Language Communications
by Akira Kurematsu University of Electro-Communications, Tokyo, Japan
and Tsuyoshi Morimoto ATR Interpreting Telecommunications Laboratories, Kyoto, Japan
CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business
First published 1996 by OPA (Overseas Publishers Association) Published 2021 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 1996 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works ISBN 13:978-2-919875-02-3 (pbk) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace |
SF
4
3.
4.
5.
6.
Introduction to Speech Translation apply to the capability of machine translation in the speech translation system. The automatic speech translation system will receive feedback from the speaker and hearer in the understanding of the dialogue, which will be difficult with the machine translation system. The hearer will be able to ask questions when he/she cannot understand the meaning of the output's synthesized speech. Handling spoken language is very different from written text. Although the sentences in spoken dialogues are usually short and the sentence structure is not particularly complex, spoken dialogues include elliptic and anaphoric expressions. They also tend to include many syntactically ill-formed expressions. In addition to their incorrect grammatical nature, a spoken language translation system must tackle the errors or ambiguities caused by the speech recognition system, which are difficult to recognize. To tolerate the inevitable recognition errors or ambiguities, the language parsing process which precedes speech recognition will be required to be robust and efficient — that is, it must be capable of selecting optimal results among multiple candidates. One basic requirement for a spoken language translation system is real-time operation. The automatic speech translation system simply cannot take more than several seconds to output interpreted speech. The hearer is there waiting for the output from the system, and a consecutive reply is required to respond to the previous utterance. What this means is that high-speed processing of both speech recognition and language translation is required. An efficient algorithm is crucial if an exponential increase in computation time is to be avoided. Although the ultimate goal of the automatic speech interpretation system will be used in a universal dialogue in an unlimited domain, the present technology limits us to specific, task-oriented domains. Only by sharply constraining the specific applications can we make an early progress toward an automatic speech interpretation system. The step-by-step upgrading of technology will be a reasonable strategy. The automatic speech interpretation system will be used by monolingual users, that is, the speaker will not know the target language and the hearer will not understand the source language. This situation also imposes stringent requirements on the accuracy of inter-
1.4 History in Brief
5
pretation to prevent misunderstanding. Clarification of the speech recognition and language translation will be needed. The ease of use, or "user friendliness" of the system incorporated into the intelligent human interface is very important.
1.4. History in Brief A brief history of research into automatic speech translation systems throughout the world is given below. In 1983, NEC Research Laboratories in Japan dramatically demonstrated a laboratory model of an automatic speech translation system at Telecom-83 in Geneva. It was the first trial of real-time bi-directional speech translation between selected languages, that is, between Japanese and English, and Spanish. Although it was a small and quite limited demonstration of a conversation about map inquiring, it attracted the attention of the audience and related persons. Dr. Kouji Kobayashi, the president at that time of NEC Co., advocated the necessity of researching automatic telephone interpretation for future world-wide telephone communication, using different languages. In Japan, the general outlook for the research and development of an automatic telephone translation system was reported in 1986 under the sponsorship of the Ministry of Posts and Telecommunications. It described the necessity for long-term research and development of various related fields such as speech recognition, machine translation speech synthesis, artificial intelligence and computer science. ATR Interpreting Telephony Research Laboratories was established in 1986. Since then, the research and development of automatic speech translation has been extended to several institutes over the world, inspired by the research activities of ATR in Japan. Speech translation between English and French based on the phrasebook approach, i.e.: stored set phrases, was conducted at British Telecom Research Laboratories in 1987 [Stentiford-87]. The system was based on a set of more than 400 common business phrases stored in memory. A phrase-book message retrieved through crude keyword speech recognition is sent to a distant computer system, which translates and speaks with reference to the phrase-book and outputs it by the speech synthesizer. The advantage of the phrase-book approach is the
6
Introduction to Speech Translation
good quality of translation. However, the disadvantage of this approach is the inflexibility of the expressions. The Toshiba Research and Development Center reported the results of an experiment in real-time bi-directional machine translation with a keyboard conversation in the manner of Unix Talk between Japanese and English [1988]. It was shown that frequent ellipses and unknown words caused syntax errors and that there was some meta-dialogue, i.e., dialogue about the previous interchanges. The results of experiment gave an insight into the desirable characteristics of the interactive dialogue translation system, although speech input and output were not included in the experiment. At Carnegie Mellon University (CMU), a speech translation system that can deal with spoken utterances input via a microphone was developed in 1988 [Tomita-88]. The domain was a simple doctor-patient conversation. As the linguistic part, the Universal Parser Architecture based on knowledge-based approach was used. Domain-specific knowledge was efficiently and dynamically utilized during the runtime. Noisy phoneme sequences of spoken utterances were handled by expanding the run-time generalized LR parser. Syntactic parsing results in Interlingua, a language independent, but domain-specific representation of meaning. The generation of target language from an Interlingua representation is undertaken as the mapping interlingua representation into a frame structure of the target language and sentence generation. Although it was limited domain of a small scale, the system hinted at the possibilities of a speech translation system. At ATR Interpreting Telephony Laboratories, an experimental speech translation system, which translates spoken Japanese into English (SL-TRANS), was developed in 1989 [Kurematsu-92(a) and (b)] [Morimoto-92]. It aimed at total speech translation with a constrained speaking style of phrase separation. The task domain was the inquiry on international conference registration with a vocabulary of about 500. After that, many improvements in the mechanism and efficiency were made. Details will be described in the following chapter of this book. There were several new approaches to speech translation at CMU. Direct memory access translation for speech input was proposed. The phoneme-based Direct Memory Access Translation System (DMTRANS), which integrates phonological and contextual knowledge in speech understanding in a massive parallel activation-marker-
1.4 History in Brief
7
passing network. A new model of speech-to-speech dialog translation $-DMDIALOG) was proposed at CMU [Kitano 91]. -DMDIALOG provides a scheme of nearly concurrent parsing and generation. Several types of markers are passed in a memory network, which represents knowledge from the morphonetic-level to discourse-level and world knowledge. The ability of these memory network models to be expanded to larger systems will require further study. CMU also developed another speech-to-speech translation system (JANUS) [Waibel-91]. The JANUS system was implemented in the Conference Registration Task with a vocabulary of about 500 words, between English and Japanese, and German. For acoustic modelling, an LVQ algorithm with context-dependent phonemes was used for speaker-independent recognition. The search module of the recognizer built a sorted list of sentence hypotheses. The N-best algorithm was implemented through the amount of 100 hypothesis. Language models of trigrams were introduced using word class specific equivalence classes (digits, names, towns, languages etc.). As for the MT components, the Generalized LR-parser-based syntactic approach was used following Tomita's system and the most useful analysis from the module was mapped onto a common meaning. Improvements of robustness and extendability were undertaken continuously. The NEC Research Laboratory developed a two-way speech translation system (INTERTALKER) in 1991 [Hatazaki-92] INTERTALKER can deal with speech translation between Japanese and English with a vocabulary size of about 500 for a sight-seeing guide domain. Multiple language translation is available by use of language-independent interlingua expressions. The acceptable expressions of language are fixed in form and the sentence form is restricted. A detailed description will be given in chapter 5 of this book. At ATR, the improved version of SL-TRANS called ASURA, has been developed in 1993. It integrated the technologies of speech recognition, language translation and speech synthesis [Morimoto-93]. ASURA is a Japanese-to-English/German speech translation system. The sentence uttered by the Japanese speaker is recognized by a speakeradaptive continuous speech recognition system (ATREUS) [Nagai-93]. The system incorporated an analysis of the expressions of linguistic intention of utterances. The task domain was inquiries and explanations regarding an international conference. In applying this translation
8
Introduction to Speech Translation
method to a goal-oriented dialogue corpus, the experimental system showed effectiveness for translation dialogues with ordinary conversational expressions from Japanese to English and German. An effort was made to build a restricted-domain spoken language translation system (VEST) at ATT Bell laboratories in 1991 [Roe-91]. VEST is the bi-directional spoken language translation system for English and Spanish. It recognizes several speakers and is limited to a few hundred words. The key new idea is that the speech recognition and language analysis are tightly coupled by using the same language model and augmented phrase structure grammar, for both. A prototype of the speech translation system was developed on a small scale. SRI investigated a prototype speech translation system (SLT) [Rayer93]. The SLT system can translate queries from spoken English to spoken Swedish in the domain of air travel information systems. The speech recognizer used is a fast version of SRFs DECIPHER speakerindependent continuous speech recognition system [Murveit-91]. It uses context-dependent phonetic-based Hidden Markov Models (HMMs). Text processing for language is performed by the SRI Core Language Engine (CLE), a general natural-language processing system [Ashawi92]. The English grammar is a large general-purpose feature grammar, which has been augmented with a small number of domain-specific rules. The Swedish grammar has been adapted fairly directly from the English ones. Each CLE grammar associates surface strings with representations in Quasi-Logical Form.
CHAPTER 2
Speech Recognition 2.1. Introduction Speech recognition refers to the recognition of a human utterance and understanding it automatically by machine. More specifically, speech recognition is the task of converting the acoustical speech signal of an utterance to a linguistic sequence of symbols. Speech recognition plays the role of front-end processing in automatic speech translation. Since speech recognition functions to convert a speech signal to text, it is the most important component in automatic speech translation. In speech translation, speaker-independent continuous speech recognition that recognizes speech uttered continuously is required. Recently, great progress has been made in speech recognition, owing to the advances in related technologies. Three major contributions are computing technologies, the availability of extensive speech databases, and the creation of powerful algorithms based on a statistical approach. In this chapter, the essentials of speech recognition, especially continuous speech recognition, will be presented.
2.2. Fundamental Concepts of Speech Recognition 2.2.1. The Concept of Speech Recognition Speech recognition is the process of converting a continuous speech signal to a discrete symbol sequence. In other words, speech recognition is expressed as the process of searching for the most probable candidate among many hypotheses of uttered speech. In continuous speech recognition, the number of hypotheses is enormously large, and it is difficult to check all of them. Typical continuous speech recognition consists of five major steps: signal analysis, acoustical speech modelling, language modelling, search and acoustical pattern matching, and language processing. The 9
Speech Recognition
10 Input Speech Acoustic Analysis
i
Feature Parameters Recognition Results
Search Process Similarity Acoustic Model
Constraint
Language Model
Figure 2.1. General principle of continuous speech recognition. general principle of continuous speech recognition is described in Figure 2.1. Speech analysis extracts important features from speech waveforms. Acoustical speech modelling is the process to express the basic speech unit such as phonemes. Language modelling represents the language characteristics needed to recognize the spoken language. Search and acoustical pattern matching is used to search the candidates and select the most probable one. Language processing deals with the language constraints of syntax, semantics and pragmatics. 2.2.2 The Problem of Speech Recognition In speech translation, large-vocabulary speech recognition of continuous utterances is essential. The vocabulary size that will be required is at least several thousand words, even for a goal-oriented dialogue in a limited domain. For the recognition of large-vocabulary continuous speech, the
problem of attaining high recognition performance must be solved in order to lessen the burden of language processing. Continuous speech recognition separated into phrases is the first actual step towards the ultimate, fully-continuous speech recognition system. The recognition of phrases or sentences based on phoneme recognition is investigated. The problems of how the speech recognition system is to be utilized in the real field are referred to as the pattern recognition problems. The
2.3 Speech Pattern Representation
11
principle concern of the speech recognizer is the method of understanding the input message by human voice. Key aspects of the speech recognizer must be described to overcome the problems: 1. 2. 3. 4.
Input speech is pronounced continuously and contextual variations are large in speech signal characteristics. Much variance exists in the idiosyncracy of individual's speech characteristics. Environment of the speech input will vary in terms of microphone characteristics, environmental noise, and communication line characteristics such as telephone circuits. Ambiguities inherently exist in spoken languages.
2.3, Speech Pattern Representation 2.3.1. Characteristics of the Japanese Speech Signal Generally, speech recognition performance is not greatly influenced by the difference between phonetic and linguistic characteristics. However, it is said that it differs fairly substantially in different languages because of the varieties of phonetic and linguistic phenomena. Before describing the method of speech recognition, the characteristics of the speech signal in Japanese are described in this section. An example of Japanese speech waveform (/soredewadozo/) is shown in Figure 2.2. The periodic characteristics can be seen in the vowel parts of /o/, Id or /a/. In the fricative part such as /s/, the waveform shows a noise-like, irregular form. In the plosive part such as /d/,
Time Figure 2.2. Example of Japanese speech waveform /soredewadozo/.
12
Speech Recognition
the plosion can be seen. It is common to treat speech signal based on spectral features in speech recognition. Characteristics of normal Japanese speech from the standpoint of speech recognition can be described as follows: 1.
2.
3. 4. 5. 6.
The number of phonemes is small compared to the European languages. There are five vowels, /a/, /i/, /u/, /e/, /of. The total number of phonemes is around 20, and they are categorized as unvoiced plosives, voiced plosives, unvoiced fricatives, voiced fricatives, nasals and semi vowels. The number of syllables is small. There are about 100 syllables in Japanese. In Japanese, there is a strong phonotactic constraint that consonants basically do not concatenate. Therefore, the Japanese syllable is composed of a sequence of consonants and vowels, or only vowels. Consonants do not appear at the end of the syllable. There are peculiar sounds, e.g. double consonants such as /kitta/, nasal /N/, and a long vowel such as /kiita/. These sounds have to be discriminated in reference to duration time of phoneme. An accent produced by difference of pitch frequency appears. For example, /hashi/ and hashi/have to be discriminated as different meanings. The phenomenon of change of sound by liaison appears in the case of concatenation of words. For example, if/kara/(vacant) and /hako/ (box) are concatenated, it is pronounced /karabako/(empty box). Devocalization happens under certain conditions, e.g. /h(i)kari/. The vowels l\l or /u/ followed by unvoiced consonants are usually devoiced.
Although the small number of phonemes in Japanese helps to discriminate between phonemes, it will also cause greater varieties in the acoustic characteristics. Furthermore, the many homonyms in Japanese will make it more difficult to understand the meaning of the recognized result. These kinds of language-specific knowledge about the characteristics of speech will be important in developing a speech recognition system for Japanese. 2.3.2. Representation of Acoustical Patterns of Speech Although a speech signal is non-stationary, it can be regarded as stationary over a short time period. This is because the movement of the
2.3 Speech Pattern Representation
13
organ of articulation is gradual. Spectrum analysis is processed for a certain duration, for instance, 15 or 30 ms. Short-time speech information is called a frame. By shifting a frame we derive the time sequence of feature parameters. A window function is applied to speech waveform. 2.3.3. Signal Processing and Speech Analysis Methods The mathematical process of signal processing and speech analysis methods aimed at speech recognition follows several steps, as described below: 1.
Sampling An analog speech signal picked up by a microphone is sampled and converted to a digital signal by AD converter. The sampling frequency is 12 kHz and the speech is quantized to 16 bits.
2.
Higher frequency enhancement Since the energy component of the speech signal {Xt} (t indicates time) is more weighted to a lower frequency, a higher frequency is enhanced to increase the accuracy of speech analysis by use of a linear differential filter. The formula is as follows: Xct=Xt-aXt_x
(2.1)
The coefficient of a is taken as 0.98. 3.
Data window Spectrum analysis of the speech signal is calculated by extracting the short-time signal (20 ms). It is multiplied by a window function. That is, (2.2)
xt = htx; The window function is the Hamming window {/z}, which is expressed as follows: ht = 0.54 - 0.46 cos (—X
V NJ
t = 1, 2,..., N
(2.3)
N is the window length. A speech signal is analyzed at each frame where windowing is shifted at short intervals, for instance, 5 ms.
14
Speech Recognition
4.
Auto-correlation analysis Before linear predictive coding (LPC) is performed, the auto-correlation functions are calculated, From speech signal {xh x^ ..., xN] the auto-correlation function is j N-T
(2.4) 5.
LPC analysis Linear predictive coding (LPC) is based on the model that speech signal xt, is predicted from the past p signals as indicated in the following: X
t = ~a\xt_x
- "2Xt-2
~ a3Xt-3
- - " -
a X
p t-p
(2.5)
The coefficients {a} are determined to minimize the mean square errors. The value of p is taken as 16. The solution of {at} is carried out by solving the equation called the Yule-Walker equation. v0
vx
v2
vx
p-3
VP-X
Vp-2
^0 _
p-\ P-2
a2 a3
= —
a
p.
°2=Ziaivi
v2 v3
(2.6)
_v (2.7)
i=0
This matrix is called the Teoplitz matrix. There is a recursive procedure, the Levinson-Durbin algorithm or the Saito-Itakura algorithm, to solve the equation. 6.
LPC cepstrum Cepstrum is the inverse Fourier transform of the log spectrum.
log(/(a>))=£cnC
(2.8)
2.3 Speech Pattern Representation
15
When the power spectra/(w) are all pole type. The following relationship between the spectrum prediction coefficient { INFORM in •.PHASE : English :STATUS :COMPLEMENT in = [[IFT INFORM] ?rest ] if type of Input.Obje.Pred is :action then set ?output to [[IFT REQUEST] ?rest ] out = ?output end' ' (c) Rule-3 Transfer rule for IFT
Figure 3.12 Transfer rules. (From reference [Suzuki-92].) case role constituents. The second spells out the description of a sentence according to the rules of the target language writing style and morphology. 3.5.1. Language Generation Based on Feature Structure In accordance with the grammatical framework of analysis and transfer, language generation based on the unification algorithm is carried out. Language generation requires linguistic knowledge of both specific
Language Translation of Spoken Language
60 input Feature Structure IFT. INFORM obje REQUEST agen: 7x1 •SPEAKER obje: SEND agen: •HEARER objei: ?X1 obje2 FORM
Output (English Syntactic Structure)
Generation
Generation Knowledge
IFT INFORM agen: 'SPEAKER obje: ?act agen * HEARER SYN "Please" VP(SEM ?act)
VP SEM SEND
objei: ?x
NP SEM: 'SPEAKER SYN: " I "
obje2: ?y SYN: "send11 NP(SEM: ">x) NP(SEM: ?y)
Figure 3.13 English generation by phrase description. expression-like idioms and general grammatical constructions. Surface strings should be efficiently produced by applying knowledge. A languagegeneration method for feature-structure-based unification grammar has been proposed [Kikui-92]. Linguistic generation rules can be defined for a comparatively large unit, because the variety of sentences to be generated is not very large. A generation rule is described in the phrase description, which is composed of syntactic phrase structure, syntactic and semantic constraints, and application constraints. An outline of the generation process is shown in Figure 3.13. A set of trees annotated with feature structures to represent generation knowledge is employed. Each tree represents a fragment of a syntactic structure, and is jointly represented with a semantic structure. Idiomatic constructions can be described by making the tree, which contains lexical specifications and is linked with a specific semantic structure. Generation is executed by successively activating phrase descriptions that can subsume the whole semantics of the input feature structure. The generation system is based on Semantic Head Driven Generation [Shiever89], which is an efficient algorithm for the unification-based formalism. A multiple index network of feature structures is used to efficiently select relevant generation knowledge from the knowledge base.
3.5.
Utterance Generation
61
3.5.2. Knowledge Representation of Phrase Description The generation process will be explained taking the feature structure in Figure 3.14, which is given as an example. This feature structure is composed of IFT (illocutional force type) of an intention part which expresses a request from a speaker to a hearer, and a propositional content part, which represents the content of the input sentence. The simplest way to implement the generation process is to link the input feature structure and language expression as a pair. Phrase descriptions are prepared for this purpose. An example of phrase descriptions is shown in Figure 3.15. The generation system generates word sequences, which coincide with the input semantic structure, by selecting phrase descriptions. For example, when an input is made, as shown in Figure 3.14, an output sentence is generated, as shown in No. 1 of Figure 3.15. [ [rein REQUEST] : (Intention) [action [[rein SEND] :(Content of Request) [agen *HEARER*] [obje *PAPERS*] [recp 'SPEAKER*]]]]
Figure 3.14 Example of simplified feature structure after transfer process.
Number
1
2
50 51
Semantic Feature Structure [ [rein REQUEST] faction I [rein SEND ] [agen *HEARER*] [obje "PAPERS* ] [recp *SPEAKER* ]]]] [ [rein REQUEST] [action I [rein SEND ] [agen *HEARER*j [Obje *ANNOUNCEMENT*] (recp *SPEAKER* ]]]]
< Feature structure to express "gratitude"> < Feature structure to express "greeting">
Language Expression
Please send mejthe papers.1
Please send me | the
Thank you.
How do you do?
Figure 3.15 Example of phrase description.
announcement \\
Language Translation of Spoken Language
62 [[sem [[rein REQUEST] [action ?X]]) [syn [[cat S-TOP]]]] (a-1)
[[sem ?X] [syn [[cat VP] [vform BASE]]]] (a-2)
(b) [[sem [[rein send] [agen ?X] [recp ?Y] (obj ?Z]]] [syn [[catVP] [vform BASE]]]]
send
(c) [[sem "SPEAKER"] [syn [[cat NP] [case ace]]]
(d) [[sem "papers") [syn [[cat NP) [case ace]]]
[[sem ?Y] [[sem ?Z] [syn [[cat NP] [syn [[cat NP] [case ACC]]] [case ACC]]] (b-2) (2b3)
Figure 3.16 Phrase description of tree expression. The phrase description (PD) is expressed as a set of tree forms as shown in Figure 3.16. The common parts are extracted and marked as a variable with a mark of "?" in a more efficient way. The root indicates the semantic feature and the leaves indicate the description patterns. By applying the same phrase description in various situations, infinite kinds of sentences can be generated.The PD is annotated with feature structures. The PD consists of a structure definition part and a feature structure annotation part. The structure definition part defines the structure of a tree expressed by a list where the first element corresponds to a mother node and the rest of the elements to daughters. Each daughter may be a tree or a simple node. The annotation part specifies the feature structure symbol of each element. A description of a feature structure contains tags or variables. Each node should have a semantic and syntactic feature structure. The semantic feature on the root node of PD represents the semantics of PD. 3.5.3. Generation Algorithm Input to the generation process is a feature structure. The task of the generation system is to generate a syntax tree corresponding to input semantic feature structures. The generation algorithm is undertaken in the following steps. 1. 2.
3.
Step 1: The initial node is constructed by combining the input feature structure and initial syntactic feature. Step 2: The PDs whose semantic structures subsume the semantic structure of the expanding node are activated. A tree that has a root node satisfying the constraints with the initial node is picked up in PD. The root node of the selected PD and the initial node are unified by copying this tree. Step 3: If all leaf nodes are lexically identified, the process terminates.
63
3.5. Utterance Generation
4.
Step 4: For the unlexicalized leaf node, a tree is selected that has the root node satisfying the constraint with it in PD. As in step 2, the leaf node and selected PD are unified, copying this tree.
The steps described above are explained in more detail. For example, the input feature structure is shown in Figure 3.14. The initial node shown in Figure 3.17 is obtained by combining the input shown in Figure 3.14 and syntactic category (S-TOP). The phrase description (1-1) in Figure 3.15 is selected and the result of unification is obtained as shown in Figure 3.18. After the unification, the feature of the variable (?X) of the nodes (1-1) and (1-2) is converted to the action feature of the input semantic structure. Since a leaf node of the obtained tree is not lexicalized, the process proceeds to step 4. By selecting a tree (2) in PD, this node is unified with the selected root node. In the same manner, PDs of (3) and (4) are applied to [[sem [[rein REQUEST] — [action [[rein SEND] [agen *HEARER*] {recp ^SPEAKER*] [obje *PAPERS*]]]]] [syn [[cat S-TOP]]]]
Input Feature Structure
Figure 3.17. Feature structure of initial node. [[syn [[cat S-TOP]] [sem [[rein REQUEST] [action [rein SEND] [agen MHEARERM] [recp "SPEAKER"] [obje "PAPERS"]]]]]]
?X
Please
[[syn | cat VP]]] [sem [[rein SEND] [agen "HEARER"] [recp "SPEAKER"] [obje "PAPERS"]]]]
Figure 3.18. Syntax structure immediately after applying tree of Figure 3.16(a).
Language Translation of Spoken Language
64 [[syn [[cat S-TOP]]]
Please
send
[[syn [[cat NP]]]
me
[[syn [[cat NP]]]
the papers
Figure 3.19. Completed syntactic structure (partially shown). the nodes (2-2) and (2-3). The final sentence structure is obtained as in Figure 3.19. By traversing the tree from left to right, the sentence "Please send me the papers", is generated. 3.5.4. Towards Efficient Generation Feature-structure-directed generation is useful for a bidirectional grammar of analysis and generation to share the grammar in the two modules. An appropriate auxiliary phrase is added if necessary. Typed feature structures are utilized to describe the control of the generation process in a declarative way. The disjunctive feature structure is introduced to solve the inefficiency in making multiple copies of the phrase structure when the generation process encounters multiple rule candidates.
3.6. Contextual Processing based on Dialogue Interpretation The dialogue interpretation using the context sensitive processing is effective in the spoken language interpretation. How to select the correct speech recognition hypotheses for robust spoken language processing is a significant problem in a speech translation system. How to disambiguate the ambiguous sentence and how to predict the next utterance are important issues in dialogue processing. The plan recognition model for dialogue understanding assumes that an utterance is an action in the conversation.
3.6. Towards Efficient Generation
65
3.6.1. Plan Recognition In devising a language translation system for spoken dialogue, one of the problems to be solved is how to adequately translate the underlying meaning of the source utterance, or the speaker's intention, into the target language. In spoken dialogue, smoothness of communication depends on understanding the speaker's underlying meaning. Considerable research has been focused on a plan recognition model for solving ellipses or phrases or choosing an appropriate translated word [Iida-92]. The model consists of plans, objects and inference rules. Plans for taskoriented knowledge and pragmatics are used in the model: a domain plan, which manages the structure of domain-dependent action hierarchies, a dialogue plan, which manages a global change of topics in a domain, a
I Dialogue | * ^ ^ ~ ^ ^ |Open-Dlalogue Unit] ' '
Dialogue-Plan: I Global Structure of Dialogue
\ 1 Content. I '—y-^
^"7^—*> I Cloee-Dlalogue Unit l '
| MaJte-Reglatratlon | Domain-Plan: Domain-Specific Action and Objects
^ | ^=^T [Gat Form |
~~~~ • — * . I Fill-Out Form
Achieve Know
Communication-Plan: Dialogue Development, Information Exchange Action
•—^_______^ j" * . Return Form
Execute Domain-Plan
Introduce Object-Plan
Execute Domain-Plan i
i
Get Value Unit
Get Value Unit
Request Action Unit
ZA_JJV. X^XN
Interaction-Plan: Utterance Turn-Taking
Requeetj I Accept I r A a F I Action I I Action | Value
utterance 1
[intern] [Value |
utterance3 utterance 2
I A r t I I Inform I ****** value | VaJue | | Action
utterances
utterance 4
I Accept | [Action
utterance 7 utterances
Dialogue Example: Utterance 1: Utterance 2: Utterance 3: Utterance 4: Utterance 5: Utterances: Utterancs 7:
customer: secretariat: customer: secretariat:
Will you send me a registration form? AH right. Will you give me your name and address? (My) name is Mayumi Suzuki, and (my) address is... When is the deadline? December 1. Please return the form as soon as possible.
Figure 3.20. Process of construction of dialogue structure. (From reference [Iida-92].)
Language Translation of Spoken Language
66
communication plan, which represents the sequence of information exchange, and an interaction plan, which manages demand-response pairs in the dialogues. The plan recognizer assumes a goal and the analyzed utterance is matched with the various plans, and a chaining path from an input into a goal is inferred. An example of dialogue structure is shown in Figure 3.20. 3.6.2. Dialogue Interpretation In the dialogue analysis of spoken language, the communication act type is not only concerned with the intentional part of an utterance but also the property of the topic of the propositional contents [Yamaoka-91]. For this purpose, a set of communicative act types, each of which indicates a Table 3.6. Typical communicative act type of Japanese. (From reference [Yamaoka-91].)
1. Demand Class
ASK-ACTION CONFIRM-ACTION REQUEST-ACTION OFFER-ACTION ASK-VALUE CONFIRM-VALUE ASK-STATEMENT CONFIRM-STATEMENT GREETING-OPEN GREETING-CLOSE
"ACT-wa WH-desu-ka?" "ACT-suru-no-desu-ka?" "ACT-shite-kudasai." "ACT-shi-masu." "OB J-wa WH-desu-ka?" "OBJ-wo onegai-shi-masu." "OBJ-wa VAL-desu-ka?" STA-waWh-desu-ka?" "STA-desu-ka?" "Mohsimoshi." "Sayonara."
2. Response Class
INFORM-ACTION INFORM-VALUE INFORM-STATEMENT AFFIRMATIVE NEGATIVE ACCEPT-ACTION REJECT-ACTION ACCEPT-OFFER REJECT-OFFER GREETING-OPEN GREETING-CLOSE
"ACT-shite-kudasai." "OBJ-wa Val-desu." "STA-desu-(ga)" "Hai.", "Soudesu." "lie." "Wakarimashita." "ACT-deki-masen." "Arigatou-gozaimasu "(lie) kekkou-desu." "Hai." . "Shitsurei-shi-masu."
3. Confirm Class
CONFIRMATION
"Wakarimashita."
(ACT denotes phrase on ACTION, OBJ on OBJECT, STA on STATEMENT, WH on interrogative, respectively.)
3.6. Towards Efficient Generation
67
Table 3.7. Typical utterance pairs of Japanese dialogue. (From reference [Yamaoka-91].)
Domain Class
Response Class
ASK-ACTION CONFIRM-ACTION
INFORMATION-ACTION AFFIRMATIVE-NEGATIVE INFORM-ACTION ACCEPT-ACTION, REJECT-ACTION ACCEPT-OFFER, REJECT-OFFER INFORM-VALUE AFFIRMATIVE-NEGATIVE INFORM-VALUE INFORM-STATEMENT AFFIRMATIVE-NEGATIVE INFORM-STATEMENT GREETING-CLOSE
REQUEST-ACTION OFFER-ACTION ASK-VALUE CONFIRM-VALUE ASK-STATEMENT CONFIRM-STATEMENT GREETING-CLOSE
particular type of speech act, can be defined. A set of communicative act types, shown in Table 3.6, are defined through a linguistic and pragmatic analysis of dialogue corpus of the conference registration domain. Five types of illocutionary forces for modal expressions are defined: INFORM (representative, declarative, expressive), ASK (interrogative), CONFIRM (interrogative), REQUEST (directive, imperative), and OFFER (commissive). Three classes of property of topic are defined: ACTION, OBJECT which has property value, and STATEMENT. A set of utterance pairs in a cooperative dialogues can be defined as shown in Table 3.7. Two sets of pragmatic knowledge concerning usage of isolated utterances are described separately. One is the surface form to express the speaker's intention. The other is the propositional contents in terms of a predicate and its case value. An outline of the communicative act type analysis concerning the property of a topic is shown in Figure 3.21. In order to reason communicative act types from an utterance, a rewriting system which can control the application of rules is employed as an inference engine. Prediction of the next utterance is also investigated using a dialogue understanding model in the framework of plan recognition [Yamaoka-91]. The various plans are used to predict expressions about the communicative act type of the utterance. Referring to the goal list, which contains the incomplete plans regarded as possibilities and expectations for future goals, the next utterance can be predicted within the ongoing dialogue. The next
Language Translation of Spoken Language
68 Semantic Structure with Pragmatics
Intention Part Analysis
Frozen Expression
Proposition Contents Analysis
Communicative Act
Rules for Intention Part )
Rewriting Engine
Rewriting Environment
Rules for Proposition Contents
Figure 3.21. Configuration of communicative act type analysis system. (From reference [Yamaoko-91 ].) utterance is predicted in the two types of information, one of which is regarding the communicative act type and the other is the constituents which consist of the propositional contents. The capability of selecting the correct surface form of the next utterance, in particular as regards expressions of the speaker's intention is confirmed [Yamaoka-91]. 3.6.3. Contextual Processing A dialogue model, as well as a broad explanation of the dialogue process, is useful for language translation. A computational model for contextual processing using constraints on the dialogue participants' mental states is studied. Shared goals and mutual beliefs between dialogue participants are taken from the context of the task-oriented dialogue. Communication acts performed by dialogue participants can be interpreted based on such contextual information [Dohsaka-90]. Referents of omitted pronouns in Japanese dialogue are identified
through the interpretation of pragmatic constraints on the use of linguistic expressions in the context. Honorific relationships, the speaker's point of view and the speaker's range of information are exploited by the model. The interpretation mechanism is regarded as an integration of constraint satisfaction and objective inference.
3.7. New approach for Language Translation
69
3.7. New approach for Language Translation Conventional approaches to language translation are mostly aimed at treating syntactical information and semantic information based on linguistic formal grammars. However, dialogue utterances involve various kinds of intention expressions, and are often fragmentary. A new approach is emerging in language translation, to overcome the inconvenience of the conventional rule-based translation systems. It is obvious that traditional language translation systems rely upon the extensive use of rules or lexical expressions. They face the following difficulties: (1) improvement in translation quality is delayed, and (2) processing time sometimes increases intolerably. The novel approach is proposed to make use of the increasing availability of a large-scale corpus and bulk processing power. 3.7.1. Example-based Language Translation Example-based language translation has been proposed to overcome the difficulties that have arisen in the traditional approach [Nagao-84]. Example-based language translation is usually called example-based machine translation (EBMT). In EBMT, a database that consists of a translation example of bilingual text, is prepared. Examples whose source is most similar to the input phrase or sentence are retrieved from the example database. A translation output is obtained based on the retrieved example. This framework utilizes best-matching between the input and provided examples, and selects the most plausible target expression from many candidates. The semantic distance between the input and example is to be calculated. A measure of the semantic distance is determined based on a thesaurus, which is a hierarchical organization of the concept of word. The distance between words is derived from that between concepts. The distance between concepts is determined according to their locations in the thesaurus hierarchy. The best match based on semantic distance is useful for target word selection of function words (e.g., Japanese particles and English prepositions) [Sumita-91 ]. Sentence and clause type patterns play an important role in an intuitive translation. As an extension of EBMT, a method called Transfer-Driven
Language Translation of Spoken Language
70
Input
AProcessing Lexical
)
i
Transfer Knowledge
]
/German-Japanese
"^GERMANY(Mu^fchTx SIEMENS AG j^smheUniversjt^^
Germarv>English
USA(Pittsburgh) ^Carnegie Meilon University/
GermarK-English
Figure 5.3. General configuration of international joint experiment.
Karlsruhe University in Germany. The general configuration of the International experiment is shown in Figure 5.3. The ASURA system was interconnected to English and German speech translation systems over international telecommunication channels. Figure 5.4 shows the configuration of the joint experiment. A DSP-based front-end processor calculates the output probabilities of hidden states of HMnet for each frame. Each party shared equal responsibility; each site developed units
Input Speech
Speech Recognition
Language Translation
Channel
1 Communication Control Japanese Output Speech
y
Speech Synthesis Communication Channel
ISIEMENS/KU]
f
German ^Speech
Speech Translation System s. ^ *
Input
German Speech Output
Figure 5.4. General configuration of international joint experiment of telephone interpretation.
Experimental System of Speech Translation
92
for own-language speech recognition, language translation from source language to target language, and speech synthesis. The configuration of the experimental setup between Japan and USA is shown in Figure 5.5. The Japanese speech synthesize, ATR-v Talk, was used as the Japanese speech output system. The experiment was held on January 28, in 1993. In addition to speech translation, a video-conference system was used so that the speaker could see what his/her partner was doing at the other end. Several dialogues on the conference registration task were uttered in the experiment. Figure 5.6 shows the picture of the joint experiment. As a whole, the experiment was successful and received favourable comments from all over the world.
Japanese Speech Input
ATRSite
"moshimoshi" I Japanese Speech Recognition
Japanese Speech Output "konnichtwa" T
Japanese to English Language Translation "helto"
Japanese Speech Synthesis
| Communication Control i
^international Communication Network > Communication Control "konnk^iwa" English Speech Synthesis "hello"
English to Japanese Language Translation
|
English Speech Output
English Speech Recognition
CMUSite
"how are you?" English Speech Input
Figure 5.5. Configuration of experimental system setup between Japan and USA. (From reference [Yato-93].)
5.3. INTERTALKER
93
Figure 5.6. Picture of joint experiment of automatic speech translation.
5.3. INTERTALKER 5.3.1. Overview INTERTALKER is an experimental automatic interpretation system developed at NEC Research Laboratories in 1991. It recognizes naturally spoken speech and translates between Japanese and English. It accomplishes a bi-directional speech translation between Japanese and English. A general view of INTERTALKER is shown in Figure 5.7 [Hatazaki-92b]. The peculiar characteristics are described as follows: (1) The domain is task-oriented dialogue input with a vocabulary size of about 500 words on sight-seeing guidance. (2) Speaker-independent continuous speech recognition is accomplished. (3) Speech recognition and language translation are tightly integrated using a conceptual representation. (4) Language-independent expression of sentence makes possible easy extension to multiple language translation from one source
Experimental System of Speech Translation
94
Conceptual Representation Speech Input m
Japanese/English Speech Recognition
Text
English Generation French Generation
•
•
English Speech Synthesis
Speech Output
French Speech Synthesis
Spanish Generation
Spanish Speech Synthesis
Japanese Generation
Japanese Speech Synthesis
Figure 5.7. General outline of automatic interpretation system: INTERTALKER. (From reference [Hatazaki-92b].) language: for instance, from Japanese to English, French and Spanish. (5) New Japanese speech synthesis is introduced, significantly improving the clarity and intelligibility of any synthesizing sentence. Detailed explanations will be stated in the following sections. 5.3.2. Speech Recognition In speech recognition, an input utterance is recognized and a conceptual representation is obtained as a result. This system accomplishes speaker-independent continuous recognition, controlled by a finite state network grammar. The Japanese speech recognition system uses a demi-syllable speech unit. The demi-syllable unit is robust for contextual variations caused by co-articulation, since it contains a transitional part information of speech within a unit. The number of demi-syllable units in Japanese is 241. Each unit is modelled by Gaussian mixture density HMMs. Speaker-independent HMMs are trained by task-independent training data. Training data are 250 phonetically balanced word utterances spoken by 100 speakers. Training data are 250 phonetically balanced word utterances spoken by 100 speakers. The English recognition system uses 354 diphone units and other processes are the same as in Japanese. A word dictionary contains demi-syllable sequences as a word and the finite state grammar network and demi-syllable HMMs are compiled into a single network. Phonological variations and word juncture models are automatically expanded in the network. Figure 5.8 shows the process involved in speech recognition and language translation.
95
5.3. INTERTALKER
3^
Speech Recognition
Demi-syllable based Speech Recognition
T
Conceptual Representation Generation
Network Grammar
^ \ Task Knowledge Conceptual / - Relationship w Table
Conceptual Representation Conceptual Wording 1
Syntactic Selection
Text Generation Target Language Dictionary
Word Ordering Morphological Generation Text Speech Synthesis Speech Output
Figure 5.8. Integration using conceptual representation. (From reference [Hatazaki-92b].) A best path for the input utterance is searched in the finite state network. The semantic and task knowledge is incorporated in the network grammar. Conceptual representation is obtained as a recognition result. A bundle search of a fast frame-synchronous search algorithm is introduced for the network search to reduce computation time. In the bundle search, word-level search is undertaken only once for the most likely occurrence. Figure 5.9 shows an example of conceptual representation in relation to the network grammar. The obtained best path in the network is an arc sequence from the start node in the network. Both conceptual primitives of key words and their semantic dependency relationship are extracted from the conceptual relationship table. The conceptual representation, which is composed of an acyclic graph, is built up by using
Experimental System of Speech Translation
96
Network Grammar
Conceptual Representation
Figure 5.9. Generation of conceptual representation from network grammar. (From reference [Hatazaki-92b].) the conceptual primitives and their relationship. Pragmatic information is attached to the nodes in the conceptual representation. 5.3.3. Language Translation Based on the conceptual representation, a sentence is generated in each language directly. The conceptual representation is a language-independent expression of a sentence that was developed for the text-to-text machine translation system, PIVOT [Okumura-91]. The conceptual representation easily accomplishes the multi-lingual translation method. The conceptual representation is transformed to a syntactically dependent structure, using the target language dictionary. The steps of generation are as follows: 1. 2.
3. 4.
Conceptual wording: The target sentence structure is determined pragmatically and syntactically. Suitable expressions of clause or sentence, as well as the subject and predicate, are selected. Syntactic selection: Syntactic information is given to the nodes in the semantic structure and the syntactic structure is generated. The morphological information for surface cases and modalities are produced. Word ordering: Word-order properties for the syntactic structure are determined. Morphological generation: The nodes is the syntactic structure are arranged according to word-order properties. Surface morphemes for each node are generated and combined into words.
5.3. INTERTALKER
97
5.3.4. Speech Synthesis Translated text is converted to speech using a rule-based speech synthesizer. A Japanese text-to-speech system with high intelligibility is used. First, accentuation and pause insertion rules are applied to a sentence. Then, synthesized speech is generated using a pitch-controlled residual waveform excitation technique [Iwata-90]. 5.3.5. Performance A real-time experiment on speech interpretation has been conducted. Two kinds of tasks are implemented in the system: concert ticket reservation, and a tour guide. The concert ticket reservation task has a 500word vocabulary and the word perplexity is 5.5. Performance has been evaluated in the tour guide task. By using 30 sentences spoken by ten talkers, the rate of correct sentence recognition is 83% and word accuracy is 95.5%. The sentence-level translation accuracy is 93%. When the key words in a sentence are recognized correctly, conceptual representation is obtained in spite of some errors appearing in the unimportant part.
CHAPTER 6
Future Directions 6.1. Introduction The important issue in speech translation is to develop key technologies for translation of spontaneously spoken utterances. Such utterances include wide varieties of speech and language phenomena. Characteristics of spoken dialogue in terms of language and speech phenomena are shown in Figure 6.1. In spontaneously spoken speech, various phenomena such as strong co-articulation, variations depending on the individual person, collapsed or missing phones, etc. will appear quite often. Moreover, prosody plays an important role in conveying extralinguistic information such as the speaker's intention and emotional expression. As for spoken language issues, fragmental and strongly context-dependent utterances, inversions, repetitions or re-phrasing, ungrammatical expressions, etc. will frequently appear.
Dialect Speech Free Utterance
Non-constraint Speech Spontaneous Dialogue
Spontaneous Speech
Cfear Pronunciation Speech^"^ s' Language
Simple Dialogue
Grammatically Well-formed Sentence
Illocutional Expression
Grammatically Ill-formed Sentence
Idiomatic Expression
Figure 6.1. Characteristics of spoken dialogue.
98
Emotional Expression
99
6.1. Introduction Speech Input
Spe ch Recognit on Language Analysis
Language Translation
Language Generation
Speech Output
Speech Synthesis
Semantics / Fragmatics Dialogue Context
Intention Prosody
Dialogue Situation
Emotion Psychologiacal State
Figure 6.2. Advance spoken dialogue translation. The perspective of the advanced speech translation incorporating various information and knowledge of context, situation and prosody is shown in Figure 6.2.
6.2. Future Directions of Speech Translation 6.2.1. Recognition of Spontaneous Speech The technology of speech recognition must be adequate to deal with both acoustic and linguistic variations. In the acoustic field, much effort should be directed at developing more precise and stronger contextdependent phoneme models to cover wide acoustic variations. The effects of the linguistic constraints must be extensively utilized. The speakerindependent recognition of a large vocabulary should be extended. Dynamic speaker adaptation will be developed to eliminate the undesirable necessity of uttering training words in advance. In addition to the above points, the problem of how to define and manage a language model of spontaneous speech should be investigated. Since unnecessary or unimportant words such as "Uh" or "Eh" are frequently inserted in spontaneous utterances, the treatment of colloquial expressions like inversion and insertion of words should be studied. A management mechanism of a language model that will
100
Future Directions
interact with a higher level language processing component and restructure the model dynamically according to the moves of intentions or topics of a dialogue, will have to be investigated. 6.2.2. Prosody Extraction and Control in Speech Processing Prosody such as intonation, power and duration will contain important information in spontaneous speech. It helps not only to resolve ambiguities in sentence meanings, but also to extract extra-linguistic information such as the speaker's attitude, intention and emotion. Prosodic control of the spoken style will also be introduced in speech synthesis. 6.2.3. Translation of Colloquial Utterance Robust translation will be required in spoken language translation. A new paradigm of language translation, called "example-based translation" will be promising for spontaneous utterance. It translates an input by using a set of translation examples, each of which is very similar to a portion of the input. An example-based paradigm is quite attractive as a way of establishing a robust translation. The translation mechanism retrieves similar examples from the database. A large number of database examples, which describe pairs of source texts and their translation, has been set up as knowledge. Such examples are extracted from a large bi-lingual corpus. The efficient way to proceed will be to integrate a rule-based approach and an example-based approach. The dialogue situation at the time of utterance should be taken into consideration to translate the contextdependent expressions properly. 6.2.4. Integrated Control of Speech and Language Processing Integrated control of speech and language processing will be important in spontaneous speech translation. Appropriate information necessary for a language model in speech recognition should be provided at the syntactic, semantic and pragmatic level. Speech information such as prosody should be provided for language processing as an important cue to understanding spoken language. The situation and environment of the dialogue should be recognized and maintained properly. Such situational information will include the environment such as the domain and subject of the dialogue, the intentional and mental status of partici-
6.3. International Cooperation
101
pants and the dialogue progression status such as the topic or focus. Such information will be referred to by both the speech processing and the language processing units. 6.2.5. Mechanism of Spontaneous Speech Interpretation The mechanism for spontaneous speech processing must be consistent with a mechanism that handles associative knowledge such as translation usage examples and word co-occurrence information [Iida-93]. The suitability of applying several types of knowledge such as pragmatic knowledge, metaphorical associative knowledge, and associative knowledge based on social behaviour cannot be determined without grammatical and semantic structural information. Accordingly, the respectively required types of information are positioned as keys to resolve local subproblems, which are all processed in parallel. The global problem of interpretation will be accomplished by solving subproblems, including the speaker's intention, and mental states, the effects on the outer world and the problems of interpretation from the pragmatics. The final solution will be the one that has the lowest number of contradictions.
6.3. International Cooperation Interest in speech translation is growing all over the world. The research project at ATR Interpreting Telephony has greatly stimulated further research on speech translation. At ATR, the extended project on speech translation is being continued at a newly established research organization (ATR Interpreting Telecommunications Research Laboratories) [Morimoto-93]. Extensive efforts have been undertaken in research on spontaneous speech translation. In various research groups, for example CMU, ATT Bell Labs, and SRI. In Germany, the national project called VERBMOBIL has been launched in 1993. The long-term vision behind VERBMOBIL is a portable translation device that you can carry to a meeting with speakers of other languages [Wahlster-93]. An automatic speech translation project has also started recently in Korea. International cooperation is essential in the development of speech translation systems since speech processing and language translation requires deep insight into each native language. Resource sharing in a speech and language database will be effective in enhancing the smooth development of the speech translation system.
References Abbreviations Used ACL: Association for Computational Linguistics COLING: International Conference On Computational Linguistics EUROSPEECH: European Conference on Speech Communication and Technology IEICE: Institute of Electronics, Information and Communication Engineers SST: Symposium in Speech Technology ICASSP: International Conference of Speech, Signal Processing ICSLP: International Conference on Spoken Language Processing TMI: International Conference on Theoretical and Methodological Issues in Machine Translation [Abe-90] M. Abe, S. Nakamura, K. Shikano, H. Kuwabara, "Voice Conversion Through Vector Quantization" Jour of Ac oust. Soc. of Japan, Vol. E-ll, No. 2, 71-76, (1990) [Abe-91] M. Abe, K. Shikano, H. Kuwabara, "Statistical Analysis of Bilingual Speaker's Speech for Cross-Language Voice Conversion", Jour, of Acoust. Soc. of America, Vol. 90. No. 1, 76-82, (1991) [Ashawi-92] H. Ashawi (ed.) The Core Language Engine, MIT Press, (1992). [ATR-93] ATR International (ed.), Automatic Telephone Interpretation, Tokyo: Ohm Co. (1993) [Dohsaka-90] K. Dohsaka, "Identifying the Referents of Zero-Pronouns in Japanese Based on Pragmatic Constraint Interpretation", Proc. of ECAI-90, 240-245,(1990) [Ehara-91] T Ehara, T. Morimoto, "Contents and Structure of the ATR Bilingual Database of Spoken Dialogues", Proc. Int. Joint Conf. ACH and ALLC, 131-136 (1991) [Furuse-92] O. Furuse, H. Iida, "An Example-Based Method for TransferDriven Machine Translation', Proc. ofTMI-92, 139-150, (1992) [Gunji-87] T. Gunji, Japanese Phrase Structure Grammar-A Unification-Based Approach (Dordrecht:Reidel) (1987) [Hanazawa-90] T. Hanazawa, K. Kita, T. Kawabata, S. Nakamura, K. Shikano, "ATR HMM-LR Continuous Speech Recognition System, Proc. ICASSP-90, 53-56,(1990) [Hatazaki-92a] K. Hatazaki, K. Yoshida, A. Okumura, Y. Mitome, T. Watanabe, M. Fujimoto, K. Narita, Proc. of 44th Convention Inf. Proc. Society of Japan,
Vol. 3, 219-220(1992)
102
References
103
[Hatazaki-92b] K. Hatazaki, J. Noguchi, A. Okumura, K. Yoshida, T. Watanabe, "INTERTALKER: An Experimental Automatic Interpretation System Using Conceptual Representation", Proc. ICSLP-92, pp. 393-396, (1992) [Hattori-92] H. Hattori, S. Sagayama, "Vector Field Smoothing Principle for Speaker Adaptation, Proc ICSLP-92, 381-384, (1992) [Hayes-86] P. Hayes, A. Hauptmann, J. Carbonell, M. Tomita, "Parsing Spoken Language: A Semantic Case-frame Approach", Proc. COLING-86, 587-592, (1986) [Iida-92] H. Iida, H. Arita, "Natural Language Understanding on a Four-layer Plan Recognition Model", Jour, of Information Processing, Vol. 15, No. 1, 60-71,(1992) [Iida-89] H. Iida, K. Kogure, T. Aizawa, "An Experimental Spoken Natural Dialogue Translation System Using a Lexicon-Driven Grammar", Proc. of European Conf on Speech Technology and Communications 89, \-^, (1989). [Iida-93] H. Iida, "Prospects for Advanced Spoken Dialogue Processing", IEICE Trans, INF. & SYST, pp. 108-114, (1993) [Iwahashi-92a] N. Iwahashi, N. Kaiki, Y. Sagisaka, "Concatenative Speech Synthesis by Minimum Distortion Criteria", Proc. ICASSP-92, 65-68 (1992) [Iwahashi-92b] N. Iwahashi, Y. Sagisaka, "Speech Segment Network Approach for an Optimal Synthesis Unit Set", Proc. ICSLP-92, 479-482 (1992) [Iwata-90] K. Iwata, Y. Mitome, J. Kametani, M. Akamatsu, S. Tomotake, K. Ozawa, T. Watanabe, "A Rule-based Speech Synthesizer Using Pitch Controlled Residual Wave Excitation Method", Proc. ICSLP 90, 6.6.1-6.6.4, (1990) [Kaiki-92] N. Kaiki, Y. Sagisaka, "Pause Characteristics and Local PhraseDependency Structure in Japanese", Proc. ICSLP-92, 357-360 (1992) [Kay-80] M. Kay, "Algorithm Schemata and Data Structures in Syntactic Processing", Tech. Report CSL-80-I2, Xerox PARK, (1980) [Kikui-92] G. Kikui, "Feature Structure Based Semantic Head Driven Generation", Proc. ofCOLING-92, 32-38, (1992) [Kindaichi-81] H. Kindaichi, K. Akinaga, Japanese Accent Dictionary, Meikai Nihongo Akusento Jiten, Sanseido (1981) [Kita-89] K. Kita, T. Kawabata, "HMM Continuous Speech Recognition Using Predictive LR Parsing", Proc. ICASSP-89, 703-796, (1989) [Kita-91] K. Kita, T. Takezawa, T. Morimoto, "Continuous Speech Recognition Using Two-Level LR Parsing", Trans. IEICE, Vol 74-E, No. 7, 1806-1810, (1991) [Kitano-89] H. Kitano, H. Tomabechi, T. Mitamura, H. Iida, Proc. of EUROSPEECH-89, 198-201,(1989) [Kitano-91] H. Kitano, "f DM-Dialogue-An Experimental Speech-to-Speech Dialog Translation System", IEEE Computer, pp. 31-50 (1991) [Kogure-90] K. Kogure, H. Iida, T. Hasegawa, K. Ogura, "NADINE: An Experimental Dialogue Translation System from Japanese to English, Proc. Info Japan-90, 57-64, (1990)
104
References
[Kosaka-92] T. Kosaka, S. Sagayama, "An Algorithm for Automatic HMM Structure Generation in Speech Recognition, "Proc. SST92, pp. 104-109, (1992) [Kosaka-93] T. Kosaka, J. Takami, S. Sagayama, "Rapid Speaker Adaptation Using Speaker Mixture Allophone Models Applied to Speaker-Independent Speech Recognition", Proc. ICASSP-93, II 570-573, (1993) [Kurematsu-90] A. Kurematsu, "ATR Speech Database: As a Tool of Speech Recognition and Analysis", Speech Communication, Vol 9, No. 4 357- 363, (1990) [Kurematsu-92a] A. Kurematsu, H. Iida, T. Morimoto, "Language Processing in Connection in with Speech Translation at ATR Interpreting Telephony Research Laboratories", Speech Communication, Vol. 10. No. 1, pp. 1-9 (1991) [Kurematsu-92b] A. Kurematsu, "Future Perspective of Automatic Telephone Interpretation", 1E1CE Trans, COMMUN., E75-B, No. 1, 14-19 (1992) [Minami-92] Y. Minami, T. Matsuoka, K. Shikano, "Evaluation of HMM by Training Using Speaker Independent Speech Database", Tech. Report of IEICE, SP91-113(1992) [Morimoto-92] T. Morimoto, M. Suzuki, T. Takezawa, G. Kikui, M. Nagata, M. Tomokiyo, "A Spoken Language Translation System: SL-TRANS2", Proc. ofCOLlNG-92, 1048-1052(1992) [Morimoto-93a] T. Morimoto, A. Kurematsu, "Automatic Speech Translation at ATR", Proc. Fourth Machine Translation Summit, pp. 83-96, (1993) [Morimoto-93b] T. Morimoto, T. Takezawa, F. Yato, SW. Sagayama, T. Tashiro, M. Nagata, A. Kurematsu, "ATR's Speech Translation System: ASURA", Proc. EUROSPEECH-93, 1291-1294 (1993) [Murveit-90] H. Murveit, R. Moore, "Integrating Natural Language Constraints into HMM-based Speech Recognition", Proc. ICASSP-90, 573-576, (1990) [Murveit-91] H. Murveit, J. Butzberger, M. Weintraub, "Speech Recognition in SRI's Resource Management and ATIS Systems", Proc DARPA Workshop on Speech and Natural Language, (1991) [Nagai-91] A. Nagai, S. Sagayama, K. Kita, "Phoneme-context-dependent Parsing Algorithm for HMM-based Continuous Speech Recognition", Proc. EUROSPEECH-91 (1991) [Nagai-92a] A. Nagai et al., "Hardware Implementation of Realtime 1000-word HMM-LR Continuous Speech Recognition", Proc. ICASSP-92, pp. 15111514,(1992) [Nagai-92b] A. Nagai, J. Takami, S. Sagayama, "The SSS-LR Continuous Speech Recognition System: Integrating SSS-Driven Allophone Models and a Phoneme-Context Dependent LR Parser", Proc. ICSLP-92, 1511-1514, (1992) [Nagai-93] A. Nagai, Y Yamaguchi, S. Sagayama, A. Kurematsu: "ATREUS: A Comparative Study of Continuous Speech Recognition System at ATR", Proc. ICASSP-93, 139-142 (1993) [Nagao-84] M. Nagao, Framework of a Machine Translation between Japanese and English by Analogy Principle, in Artificial and Human Intelligence, eds. A. Elithorn and R. Banerji, (Amsterdam: North-Holland) 173-180 (1984)
References
105
[Nagata-90] M. Nagata, K. Kogure, "HPSG-Based Lattice Parser for Spoken Japanese in a Spoken Language Translation System", Proc. ECAI-90, 461-466,(1990) [Nagata-92] M. Nagata, "An Empirical Study on Rule Granularity and Unification Interleaving Toward an Efficient Unification-Based Parsing System", Proc. COLING-92, 177-183 (1992) [Nagata-93] M. Nagata, T. Morimoto, "A Unification-Based Japanese Parser for Speech-to-Speech Translation", IEICE Trans. Inf. & Syst, Vol. E76-D Vol. 1, (1993) [Nakamura-89] S. Nakamura, K. Shikano, "Spectrogram Normalization Using Fuzzy Vector Quantization", Jour, of Acoust. Soc. of Japan, Vol. 45, No. 2, 107-114,(1989) [Okumura-91] K. Okura, K. Muraki, S. Akamine, "Multi-lingual Sentence Generation from the PIVOT Interlingua", Proc. Machine Translation Summit, 67-71, (1991) [Ohkura-92] K. Ohkura, M. Sugiyama, S. Sagayama, "Speaker Adaptation Based on Transfer Vector Field Smoothing with Continuous Mixture Density HMMs", Proc. ICSLP-92, 369-372, (1992) [Pollard-87] C. Pollard, "An Information-Based Syntax and Semantics-Volume 1: Fundamentals", CSLI Lecture Note Number 13, CSLI, (1987) [Rabiner-93] L. Rabiner, B. Juang Fundamentals of Speech Recognition (New York: Prentice Hall) (1993) [Rayner-93] M. Rayner, et al. "Spoken Language Translation with Mid-90's Technology: A Case Study", Proc. ICSLP-93, pp. 1299-1302 (1993) [Roe-91] D. Roe, F. Pereira, R. Sproat, M. Riley, "Toward a Spoken Language Translator for Restricted-domain Context-free Languages", Proc. of EUROSPEECH-91, pp. 1063-1066, (1991) [Sagayama-93] S. Sagayama, J. Takami, A. Nagai, H. Singer, K. Yamaguchi, K. Ohkura, K. Kita, A. Kurematsu, "ATREUS: a Speech Recognition Frontend for a Speech Translation System", Proc. EUROSPEECH-93, 1287-1290 (1993) [Sagisaka-88] Y. Sagisaka, "Speech Synthesis by Rule Using an Optimal Selection of Non-Uniform Synthesis Units", Proc. ICASSP-88, 679-682, (1988) [Sagisaka-89] Y. Sagisaka, "On the Unit Set Design for Speech Synthesis by Rule Using Non-Uniform Units", Jour. Acoust. Soc. Am, Suppl. Vol. 86, S79, FF24, Fall (1989) [Sagisaka-90] Y. Sagisaka, "On the Prediction of Global FO Shape for Japanese Text-to-Speech", Proc. ICASSP-90, (1990) [Sagisaka-92] Y Sagisaka, N. Kaiki, N. Iwahashi, K. Mimura, " ATR n Talk Speech Synthesis System", Proc. ICSLP-92, (1992) [Sato-92] H. Sato, "Text-to-Speech Synthesis of Japanese", Speech Science and Technology (Ed. by Shuzo Saito), 158-178 (1992) [Seneff-89] S. Seneff, "TINA: A Probabilistic Syntactic Parser for Speech Understanding Systems", Proc. ICASSP-89, 711-714, (1989)
106
References
[Shiever-89] S. Shiever et al., "Semantic-Head-Driven Generation", Computational Linguistics, Vol. 16, No. 1, 30-42, (1990) [Shikano-88] K. Shikano, "Towards the overcome of individual variations on speech", ATR Journal No. 3, pp. 10-13, (1988) [Stentiford-87] M. Stentiford, et. al.: "A Speech Driven Language Translation System", Proc. EUROSPEECH-87', pp. 418-421, (1987) [Sumita-91] E. Sumita, H. Iida, "Experiments and Prospects of Example-Based Machine Translation", Proc. 29th Annual Meeting ACL, 185-192, (1991) [Suzuki-92] M. Suzuki, "A Method of Utilizing Domain and Language Specific Constraints in Dialogue Translation", Proc. COLING-92, 756-762, (1992) [Takami-92a] J. Takami, S. Sagayama, "A Successive State Splitting Algorithm for Efficient Allophone Modeling", Proc. ICASSP-92, pp. 1573-1576, (1992) [Takami-92b] J. Takami, A. Nagai, S. Sagayama, "Speaker Adaptation of the SSS (Successive State Splitting)-based Hidden Markov Network for Continuous Speech Recognition", Proc. SST-92, pp. 437-442, (1992) [Takeda-92] K. Takeda, K. Abe, Y. Sagisaka, "On the Basic Scheme and Algorithms in Non-Uniform Unit Speech Synthesis", Talking Machines, 93105 (North Holland.Elsevier Science Publishers) (1992) [Tomabechi-92] H. Tomabechi, "Quasi-Destructive Graph Unification with Structure-Sharing", Proc. ACL-92, 440-446, (1992) [Tomita-86] M. Tomita, Generalized LR Parsing. Kluwer Academic Publishers, (1991) [Wahlster-93] W. Wahlster, "Verbmobil: Translation of Face-To-Face Dialogues", Proceedings of the Fourth Machine Translation Summit, pp. 127-135, (1993) [Waibel-91] A. Waibel, et al.,"JANUS: A Speech-to-Speech Translation System Using Connectionist and Symbolic Processing Strategies", ICASSP-91, pp. 793-796, (1991) [Yamaoka-91] T. Yamaoka, H. Iida, "Dialogue Interpretation Model and Its Application to Next Utterance Prediction for Spoken Language Processing", Proc. EUROSPEECH-91, 849-852, (1991) [Yato-93] F. Yato, T. Morimoto, Y. Yamazaki, A. Kurematsu, "Important Issue for Automatic Interpreting Telephone Technologies", Proc. of Int. Symp. on Spoken Dialogue 1993, pp. 235-238. (1993) [Yoshida-89] K. Yoshida, T. Watanabe, S. Koga, "Large Vocabulary Word Recognition Based on Demi-Syllable Hidden Markov Model Using Same Amount of Training Data", Proc. ICASSP 89, pp. 1-4, (1989) [Yoshimoto-88] K. Yoshimoto, "Identifying Zero Pronouns in Japanese Dialogues", Proc. COLING-88, (1988)
Index activation-marker-passing 6 accent 12,72 acoustic pattern 9, 16 algorithm 4, 9, 26 active chart parsing 53 beam search 25 forward-backward 29 generation 62 HMM-LR 25 Levinson-Durbon 14 N-best 7 trelis 29 Viterbi 29 VFS 38 allophone 16,87 ambiguities 4, 11 amplitude 79 analysis 13 anaphoric 1 annotation 48 ASURA 7, 86, 88-90 ATR 5,6 ATR-v Talk 74 automatic speech translation 1, 2 auto-correlation 14 Baum-Welch 18 Bayer's rule 23 bilingual text 69 British Telecom 5 bundle search 95 Carnegie Mellon University (CMU) 6 ceptstrum 14 clusters 15 code 15 codebook 81 codebook mapping 32
code vector 36 codeword 36 colloquial utterance 100 communication act 66 communication plan 66 communicative goal 56 computing 9 concepts (source language) 56 conceptual representation 96 consonant 12 contextual knowledge 56 contextual processing 68 context-dependent 7, 16 continuous speech recognition 10 large vocabulary 22, 24 control 100 convert 9 Core Language Engine (CLE) 8 cross-language voice conversion 84 database 9 demi-syllable 16,95 devocalization 12 dialogue interpretation 66 plan 65 dialogue translation method 45 dialogue utterance 42 discourse structure 43, 54 discrete representation 15 distortion 77 DMDIALOG 7 DMTRANS 6 domain 4 domain specific 6, 8 domain plan 65 duration control 30 dynamic time warping (DTW) 34, 82 107
108 ellipses 65 elliptic 4 enhancement 13 entropy 77 erroneous spoken input 53 example-based language 69 EBMT (example-based machine language) 69 feature parameter 16 feature structures 45, 57, 59-60, 62,64 feature rewriting process 55 finite state network grammar 95 frame 13 French 5 fricative 11-12 fundamental frequency 79 fuzzy mapping 32 membership function 32 vector quantization (VQ) 30, 32 Gaussian distribution 30 Gaussian output 18, 20-21 German 7,8 goal oriented dialogue 44 grammar ambiguity in 27 case-frame based semantic 46 context free (CFG) 24-26, 30 head-driven phrase structure 4 8 ^ 9 , 54 inter-phrase 31 intra-phrase 32 Japanese phrase structure 49 lexical-syntactic 46 stochastic 46 syntactic 46 unification based 46, 50 unification based spoken Japanese 53
Index Hamming window 13 HEAD feature 52 Hidden Markov Model (HMM) 8, 17 discrete 18 continuous-mixture 18 continuous-mixture density 37 perplexity 24 phoneme models 24, 28 network 19-21 network (speaker adaptation of) 38 histogram 82 homonyms 12 honorific expressions 42 honorific relationships 68 human-to-human communication 2 idioms 60 illocational force type (IFT) 44, 67 indirect expressions 43 individuality control 80 intention 42 intention translation method 88 international j oint experiment 91 INTERTALKER 7,91 interpolation 36 JANUS
7
k-nearest neighbor 30 knowledge-based approach 6 knowledge representation 61 labelling 15 language generation 59 language translation 42 problems in 42 lexical dictionaries 46 lexical entries 54 liaison 12 linear predictive coding 14 linguistic knowledge 59 log-power 18
Index
109
LPC analysis 14 cepstrum 14-15 delta 18 LR Parsing generalized 6, 7, 24, 27 table 26 two-level 31-32 stack operation 28 LVQ algorithm 7 machine translation 5, 70 mapping 6 mapping codebook 81 Markov 8 source 77 state 17 massively parallel 6 maximum likelyhood values memory access 6 meta-dialogue 5 model(s) acoustic 23 language 23 phoneme 24 probabilistic 22 modelling 9-10 monolingual 4 mora 78 morphology 46 multiple codebook 18, 30 multiple operations 27 mutual beliefs 68 nasals 12 nearest-neighbor labelling 16 NEC 5,7 negation 42 network 7 noise 11 non-terminal symbols 48 non-uniform speech 73 oriented knowledge
65
20
parser (feature structure) 51 parsing 46 particles 43 case 43 path equations 49, 51 pattern recognition 10-11 performance 89,90 performance score 40 perplexity 23-25 phoneme 7, 10, 12, 16 perplexity 24 parsing trees 29 phonetic 11 phonetic tagging 76 phrase book 5 phrase description 62 pitch 12 PIVOT 96 plan recognition 65 plosive 11-12 posteriori probability 23 pragmatic constraints 68 information 54 knowledge 67 presupposition 54 processing 9-10 propositional contents 67 prosody 73, 100 Quasi Logical
8
recognition 1 results 30 reference speakers 37 regression model 79 rewriting environment 58 rewriting rule 26 Saito-Itakura algorithm 14 sampling 13 segmental distortion 77 segmental duration 78
110 semantic(s) 10 case frame 53 constraints 50 distance 69 feature 44,62 parallelism constraint 53 transfer 44 semi vowel 12 sentence (Japanese) 57 shared goals 68 signal 9 similarities 83 simulated annealing 78 simulated telephone 42 SL-TRANS 6,7 SLT 8 smoothing 36 source 56 speaker adaptive/adaptation 31-32 speaker dependent 31 speaker mixture 38 speaker-tied mixture weight training 39 spectral difference 75 spectrum analysis 13 spectrum mapping 34, 80 spectrum normalization 35 speech modelling 9 segment network (SSN) 78 synthesis 71 translation 2, 6, 99 unit(s) 16,75 speech recognition 9 speaker-independent 38 spoken dialogue 98 spoken language 4 spontaneously spoken speech 98, 101 SSS-LR 87 stack operations 27 SUBCAT feature 52 sub-phrases 43 successive state splitting 20
Index supervised speaker adaptation 35 Swedish 8 syllable 12, 16 syntax 1,10,22 synthesis unit entry (SUE) 74 constraints 50 errors 6 synthesis 1 table 26 action table 26 go to table 26 target 56 target language 4 task-oriented 4 task-oriented dialogue TDMT 70 Telecom-83 5 telephone 5 template 15 Teoplitz matrix 14 thesaurus 69 Tomita 6,7 Toshiba 6 transfer knowledge 70 process 55 rules 58 vector 36 tree(s) 60 turn taking 43
68
units 16 Unix Talk 6 utterance 67 utterance generation 58-59 utterance transfer 55 variance/variations 11, 17 context-dependent 19 vector 15 VEST 8 Viterbi alignment 30
Index voice conversion 80 vowel 12, 16 vector field smoothing (VFS) 35 88 vector quantization (VQ) 15, 30. 32 (see also fuzzy VQ) verbs auxilary 43-^4
111 waveform 11 window 13 word co-occurence
101
Yule-Walker equation zero pronoun 42, 50 heuristics 54 resolution of 53
14
Volume 1 (Manufacturing Engineering)
AUTOMOBILE ELECTRONICS by Shoichi Washino Volume 2 (Electronics)
MMIC—MONOLITHIC MICROWAVE INTEGRATED CIRCUITS by Yasuo Mitsui
Volume 3 (Biotechnology)
PRODUCTION OF NUCLEOTIDES AND NUCLEOSIDES BY FERMENTATION by Sadao Teshiba and Akira Furuya Volume 4 (Electronics)
BULK CRYSTAL GROWTH TECHNOLOGY by Shin-ichi Akai, Keiichiro Fujita, Masamichi Yokogawa, Mikio Morioka and Kazuhisa Matsumoto Volume 5 (Biotechnology)
RECENT PROGRESS IN MICROBIAL PRODUCTION OF AMINO ACIDS
by Hitoshi Enei, Kenzo Yokozeki and Kunihiko Akashi Volume 6 (Manufacturing Engineering)
STEEL INDUSTRY I: MANUFACTURING SYSTEM
by Tadao Kawaguchi and Kenji Sugiyama Volume 7 (Manufacturing Engineering)
STEEL INDUSTRY II: CONTROL SYSTEM by Tadao Kawaguchi and Takatsugu Ueyama Volume 8 (Electronics)
SEMICONDUCTOR HETEROSTRUCTURE DEVICES by Masayuki Abe and Naoki Yokoyama
Volume 9 (Manufacturing Engineering)
NETWORKING IN JAPANESE FACTORY AUTOMATION by Koichi Kishimoto, Hiroshi Tanaka, Yoshio Sashida and Yasuhisa Shiobara Volume 10 (Computers and Communications)
MACHINE VISION: A PRACTICAL TECHNOLOGY FOR ADVANCED IMAGE PROCESSING by Masakazu Ejiri Volume 11 (Electronics)
DEVELOPMENT OF OPTICAL FIBERS IN JAPAN by Hiroshi Murata Volume 12 (Electronics)
HIGH-PERFORMANCE BiCMOS TECHNOLOGY AND ITS APPLICATIONS TO VLSIs by Ikuro Masuda and Hideo Maejima Volume 13 (Electronics)
SEMICONDUCTOR DEVICES FOR ELECTRONIC TUNERS by Seiichi Watanabe
Volume 14 (Biotechnology)
RECENT ADVANCES IN JAPANESE BREWING
TECHNOLOGY
by Takashi Inoue, Jun-ichi Tanaka and Shunsuke Mitsui Volume 15 (Computers and Communications)
CRYPTOGRAPHY AND SECURITY edited by Shigeo Tsujii Volume 16 (Computers and Communications)
VLSI NEURAL NETWORK SYSTEMS by Yuzo Hirai
Volume 17 (Biotechnology)
ANTIBIOTICS I: /3-LACTAMS AND OTHER ANTIMICROBIAL AGENTS by Isao Kawamoto and Masao Miyauchi Volume 18 (Computers and Communications)
IMAGE PROCESSING: PROCESSORS AND APPLICATIONS TO RADARS AND FINGERPRINTS by Shin-ichi Hanaki
Volume 19 (Electronics)
AMORPHOUS SILICON SOLAR CELLS by Yukinori Kuwano Volume 20 (Electronics)
HIGH DENSITY MAGNETIC RECORDING FOR HOME VTR: HEADS AND MEDIA by Kazunori Ozawa
Volume 21 (Biotechnology)
ANTIBIOTICS II: ANTIBIOTICS BY FERMENTATION by Sadao Teshiba, Mamoru Hasegawa, Takashi Suzuki, Yoshiharu Tsubota, Hidehi Takebe, Hideo Tanaka, Mitsuyasu Okabe and Rokuro Okamoto Volume 22 (Biotechnology)
OLIGOSACCHARIDES: PRODUCTION, PROPERTIES, AND APPLICATIONS edited by Teruo Nakakuki Volume 23 (Computers and Communications)
JAPANESE TELECOMMUNICATION NETWORK by Kimio Tazaki and Jun-ichi Mizusawa
Volume 24 (Biotechnology)
ADVANCES IN POLYMERIC SYSTEMS FOR DRUG DELIVERY by Teruo Okano, Nobuhiko Yui, Masayuki Yokoyama and Ryo Yoshida Volume 25 (Biotechnology)
ON-LINE SENSORS FOR FOOD PROCESSING edited by Isao Karube Volume 26 (Biotechnology)
HMG-CoA REDUCTASE INHIBITORS AS POTENT CHOLESTEROL LOWERING DRUGS
by Yoshio Tsujita, Nobufusa Serizawa, Masahiko Hosobuchi, Toru Komai and Akira Terahara Volume 27 (Manufacturing Engineering)
VISUAL OPTICS OF HEAD-UP DISPLAYS (HUDs) IN AUTOMOTIVE APPLICATIONS edited by Shigeru Okabayashi Volume 28 (Computers and Communications)
AUTOMATIC SPEECH TRANSLATION FUNDAMENTAL TECHNOLOGY FOR FUTURE CROSS-LANGUAGE COMMUNICATIONS by Akira Kurematsu and Tsuyoshi Morimoto Volume 29 (Electronics)
TFT/LCD: Liquid Crystal Displays Addressed by Thin-Film Transistors
by Toshihisa Tsukada