
always learns more knowledge from its experience and can handle future situations 
better. In this way, we can make the system more and more robust to handle unknown 
sentences in the real test.  
6   Conclusions 
The lack of well-annotated data is always one of the biggest problems for most 
training-based dialogue systems. 
In this paper, we explore the evolutionary language processing approach to build a 
natural language understanding system for dialogue systems in a virtual human 
training project. The initial training data are built with a finite state machine. The 
language understanding machine is trained based on the automated data first and is 
improved as more and more data come in, which is proved by the experimental 
results.  
The quality and the configuration of the training set affect the ability to process 
sentences. How to build a balanced training set with single finite state machine will 
remain one of our important future problems. Ongoing research also includes 
improving pruning approaches and finding new ways to integrate semantic knowledge 
to our classifier. 
References 
1.  Swartout, W., et al.: Toward the Holodeck: Integrating Graphics, Sound, Character and 
Story. Proceedings of 5th International Conference on Autonomous Agents. (2001)  
2.  Eugene Charniak. Statistical Parsing with a Context-free Grammar and Word Statistics. 
AAAI-97, (1997) pp. 598-603 
3.  Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. Proc. of the 
35th ACL, (1997) pp. 16-23 
4.  S. Miller, R. Bobrow, R. Ingria, and R. Schwartz. Hidden Understanding Models of 
Natural Language, Proceedings of ACL Meeting, (1994) pp. 25-32 
5.  Schwartz, R., Miller, S., Stallard, D., and Makhoul, J.: Language Understanding using 
hidden understanding models. In ICSLP’96 (1996.), pp. 997-1000 
6.  Klaus Macherey, Franz Josef Och, Hermann Ney. Natural Language Understanding Using 
Statistical Machine Translation, EUROSPEECH, (2001) pp. 2205-2208, Denmark 
7. K. A. Papineni, et al. Feature-based language understanding, Proceedings of 
EuroSpeech'97, Greece, vol 3, (1997) pp. 1435-1438 
8.  A.L. Gorin, G. Riccardi and J.H. Wright. How may I help you?, Speech Communication, 
vol. 23, (1997) pp. 113-127 
9.  W. Minker, S.K. Bennacef, and J.L. Gauvain. A Stochastic Case Frame Approach for 
Natural Language Understanding, Proc. ICSLP, (1996) pp. 1013—1016 
10. D. Gildea and D. Jurafsky. Automatic Labeling of Semantic Roles, Computational 
Linguistics, 28(3) (2002) 245-288 14 
11. Michael Fleischman, Namhee Kwon, and Eduard Hovy. Maximum Entropy Models for 
FrameNet Classification. EMNLP, Sapporo, Japan. (2003) 
12.  G. Sampson, 1996. Evolutionary Language Understanding, Cassell, NY/London (1996) 
13. Peter F. Brown, et al. Class-Based n-gram Models of Natural Language, Computational 
Linguistics, 18 (4), (1992) 467-479 
54