Language and Interaction Resesarch [LAIR]

Robust Multimodal Interpretation in Conversation Systems

2/1/2004 - 2010
Supported by National Science Foundation (Career Award)

Multimodal systems allow users to interact with computers through multiple modalities such as speech, gesture, and gaze. These systems are designed to support transparent, efficient, and natural means of human computer interaction. Understanding what the user intends to communicate is one of the most significant challenges for multimodal systems. Despite recent progress in multimodal interpretation, when unexpected inputs (e.g., inputs that are outside of system knowledge) or unreliable inputs (e.g., inputs that are not correctly recognized) are encountered, these systems tend to fail. Variations in vocabulary and multimodal synchronization patterns, disfluencies in speech utterances, and ambiguities in gestures can seriously impair interpretation performance. This project seeks to improve the robustness of multimodal interpretation by adapting system interpretation capability over time through automated knowledge acquisition and optimizing interpretation through probabilistic reasoning. Supported by NSF.
(Picture: Ph.D. student Zahar Prasov interacts with a system using speech and gesture)

Selected Papers:

Context-based Word Acquisition for Situated Dialogue in a Virtual World. S. Qu and J. Y. Chai. Journal of Artificial Intelligence Research, Volume 37, pp.347-377, March 2010.
The Role of Interactivity in Human Machine Conversation for Automated Word Acquisition. S. Qu and J. Y. Chai. The 10th Annual SIGDIAL Meeting on Discourse and Dialogue, London, UK, September, 2009.
Incorporating Temporal and Semantic Information with Eye Gaze for Automatic Word Acquisition in Multimodal Conversational Systems . S. Qu and J. Chai. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP). Honolulu, October 2008.
Beyond Attention: The Role of Deictic Gesture in Intention Recognition in Multimodal Conversational Interfaces.
S. Qu and J. Chai.
ACM 12th International Conference on Intelligent User interfaces (IUI).
Canary Islands, Jan 13-17, 2008.
What’s in a Gaz e? The Role of Eye-Gaze in Reference Resolution in Multimodal Conversational Interface s.
Z. Prasov and J. Y. Chai.
ACM 12th International Conference on Intelligent User interfaces (IUI).
Canary Islands, Jan 13-17, 2008.
Automated Vocabulary Acquisi tion and Interpretation in Multimodal Conversational Systems.
Y. Liu, J. Y. Chai, and R. Jin.
The 45th Annual Meeting of the Association of Computational Linguistics (ACL).
Prague, Czech Republic, June 23-30, 2007.

An Exploration of Eye Gaze i n Spoken Language Processing for Multimodal Conversational Interfaces.
S. Qu and J. Y. Chai.
2007 Meeting of the North American Chapter of the Association of Computational Linguistics (NAACL-07).
Rochester NY, April, 2007.

Salience Modeling based on Non-verbal Modalities for Spoken Language Understanding.
S. Qu and J. Chai.
ACM 8th International Conference on Multimodal Interfaces (ICMI), pp. 193-200.
Banff, Canada, November 2-4, 2006.

Cognitive Principles in Robust Multimodal Interpretation.
J. Chai, Z. Prasov, and S. Qu.
Journal of Artificial Intelligence Research, Vol 27, pp. 55-83, 2006.

Linguistic Theories in Efficient Multimodal Reference Resolution: an Empirical Investigation.
J. Chai, Z. Prasov, J. Blaim, and R. Jin.
The 10th International Conference on Intelligent User Interfaces (IUI-05), pp. 43-50.
ACM, San Diego, CA, January 9-12, 2005.

Optimization in Multimodal Interpretation.
J. Chai, P. Hong, M. Zhou, and Z. Prasov.
The 42nd Annual Conference of Association of Computational Linguistics (ACL), pp. 1-8.
Barcelona, Spain. July 22-24, 2004.

Available Data:

Interior Decoration Domain (Gesture)

Interior Decoration Domain (Gaze)

Treasure Hunting Domain