David Traum, Kallirroi Georgila, Ron Artstein, and Anton Leuski. Evaluating spoken dialogue processing for time-offset interaction. To appear in the Proceedings of the SIGDIAL 2015 Conference. Prague, Czech Republic, September 2015.
Abstract: This paper presents the first evaluation of a full automated prototype system for time-offset interaction, that is, conversation between a live person and recordings of someone who is not temporally co-present. Speech recognition reaches word error rates as low as 5% with general-purpose language models and 19% with domain-specific models, and language understanding can identify appropriate direct responses to 60–66% of user utterances while keeping errors to 10–16% (the remainder being indirect, or off-topic responses). This is sufficient to enable a natural flow and relatively open-ended conversations, with a collection of under 2000 recorded statements.
Ron Artstein, Anton Leuski, Heather Maio, Tomer Mor-Barak, Carla Gordon, and David Traum. How many utterances are needed to support time-offset interaction? In Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference, pages 144–149. Hollywood, Florida, May 2015.
Abstract: Time-offset interaction is a new technology that enables conversational interaction with a person who is not present, using pre-recorded video statements. Statements were recorded by Pinchas Gutter, a Holocaust survivor, talking about his personal experiences before, during and after the Holocaust. Participants interacted with the statements through a “Wizard of Oz” system, where live operators select an appropriate reaction to each utterance in real time; unanswered questions were analyzed to identify gaps, and additional statements were recorded to fill the gaps. Even though participant questions were completely unconstrained, the recorded statements from the first round directly addressed at least 58% of the questions; this number rises to 95% with the second round of recording, when tested on newly elicited utterances. This demonstrates the feasibility for a system to address unseen questions and sustain short conversations when the topic is well defined. The statements have been put into an automated system using existing language understanding technology, to create a preliminary working system of time-offset interaction, allowing a live conversation with a real human who is not present for the conversation in real time.
Simon S. Woo, Jelena Mirkovic, Ron Artstein, and Elsi Kaiser. Life-experience passwords (LEPs). In Who are you?! Adventures in Authentication: WAY Workshop. Menlo Park, California, July 2014.
Abstract: User-supplied textual passwords are extensively used today for user authentication. However, these passwords have serious deficiencies in the way they interact with humans’ natural ability to form memories. Strong passwords that are hard to crack are also often hard for humans to remember, while memorable passwords are easily brute-forced or guessed. We propose a novel password design – life-experience passwords (LEPs). We explain how to use users’ existing episodic memories about defining life events to create memorable and hard-to-guess passwords and discuss challenges involved in design and use of LEPs.
Jonathan Gratch, Ron Artstein, Gale Lucas, Giota Stratou, Stefan Scherer, Angela Nazarian, Rachel Wood, Jill Boberg, David DeVault, Stacy Marsella, David Traum, Skip Rizzo, and Louis-Philippe Morency. The Distress Analysis Interview Corpus of Human and Computer Interviews. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pages 3123–3128. Reykjavik, Iceland, May 2014.
Abstract: The Distress Analysis Interview Corpus (DAIC) contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post traumatic stress disorder. The interviews are conducted by humans, human controlled agents and autonomous agents, and the participants include both distressed and non-distressed individuals. Data collected include audio and video recordings and extensive questionnaire responses; parts of the corpus have been transcribed and annotated for a variety of verbal and non-verbal features. The corpus has been used to support the creation of an automated interviewer agent, and for research on the automatic identification of psychological distress.
David DeVault, Ron Artstein, Grace Benn, Teresa Dey, Ed Fast, Alesia Gainer, Kallirroi Georgila, Jon Gratch, Arno Hartholt, Margaux Lhommet, Gale Lucas, Stacy Marsella, Fabrizio Morbini, Angela Nazarian, Stefan Scherer, Giota Stratou, Apar Suri, David Traum, Rachel Wood, Yuyu Xu, Albert Rizzo, and Louis-Philippe Morency. SimSensei kiosk: A virtual human interviewer for healthcare Decision Support. Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), pages 1061–1068, Paris, May 2014.
Abstract: We present SimSensei Kiosk, an implemented virtual human interviewer designed to create an engaging face-to-face interaction where the user feels comfortable talking and sharing information. SimSensei Kiosk is also designed to create interactional situations favorable to the automatic assessment of distress indicators, defined as verbal and nonverbal behaviors correlated with depression, anxiety or post-traumatic stress disorder (PTSD). In this paper, we summarize the design methodology, performed over the past two years, which is based on three main development cycles: (1) analysis of face-to-face human interactions to identify potential distress indicators, dialogue policies and virtual human gestures, (2) development and analysis of a Wizard-of-Oz prototype system where two human operators were deciding the spoken and gestural responses, and (3) development of a fully automatic virtual interviewer able to engage users in 15–25 minute interactions. We show the potential of our fully automatic virtual human interviewer in a user study, and situate its performance in relation to the Wizard-of-Oz prototype.
Ron Artstein, David Traum, Oleg Alexander, Anton Leuski, Andrew Jones, Kallirroi Georgila, Paul Debevec, William Swartout, Heather Maio, and Stephen Smith. Time-offset interaction with a Holocaust survivor. In IUI '14: Proceedings of the 19th international conference on Intelligent User Interfaces, pages 163–168, Haifa, Israel, February 2014.
Abstract: Time-offset interaction is a new technology that allows for two-way communication with a person who is not available for conversation in real time: a large set of statements are prepared in advance, and users access these statements through natural conversation that mimics face-to-face interaction. Conversational reactions to user questions are retrieved through a statistical classifier, using technology that is similar to previous interactive systems with synthetic characters; however, all of the retrieved utterances are genuine statements by a real person. Recordings of answers, listening and idle behaviors, and blending techniques are used to create a persistent visual image of the person throughout the interaction. A proof-of-concept has been implemented using the likeness of Pinchas Gutter, a Holocaust survivor, enabling short conversations about his family, his religious views, and resistance. This proof-of-concept has been shown to dozens of people, from school children to Holocaust scholars, with many commenting on the impact of the experience and potential for this kind of interface.
William Swartout, Ron Artstein, Eric Forbell, Susan Foutz, H. Chad Lane, Belinda Lange, Jacquelyn Morie, Dan Noren, Skip Rizzo, and David Traum. Virtual humans for learning. AI Magazine 34(4): 13-30, 2013.
Abstract: Virtual humans are computer-generated characters designed to look and behave like real people. Studies have shown that virtual humans can mimic many of the social effects that one finds in human-human interactions such as creating rapport, and people respond to virtual humans in ways that are similar to how they respond to real people. We believe that virtual humans represent a new metaphor for interacting with computers, one in which working with a computer becomes much like interacting with a person and this can bring social elements to the interaction that are not easily supported with conventional interfaces. We present two systems that embody these ideas. The first, the Twins are virtual docents in the Museum of Science, Boston, designed to engage visitors and raise their awareness and knowledge of science. The second SimCoach, uses an empathetic virtual human to provide veterans and their families with information about PTSD and depression.
Lauren Faust and Ron Artstein. People hesitate more, talk less to virtual interviewers than to human interviewers. In Semdial 2013 DialDam: Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, pages 35–43, Amsterdam, December 2013.
Abstract: In a series of screening interviews for psychological distress, conducted separately by a human interviewer and by an animated virtual character controlled by a human, participants talked substantially less and produced twice as many filled pauses when talking to the virtual character. This contrasts with earlier findings, where people were less disfluent when talking to a computer dialogue system. The results suggest that the characteristics of computer-directed speech vary depending on the type of dialogue system used.
David DeVault, Kallirroi Georgila, Ron Artstein, Fabrizio Morbini, David Traum, Stefan Scherer, Albert (Skip) Rizzo and Louis-Philippe Morency. Verbal indicators of psychological distress in interactive dialogue with a virtual human. In Proceedings of the SIGDIAL 2013 Conference: the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 193–202. Metz, France, August 2013.
Abstract: We explore the presence of indicators of psychological distress in the linguistic behavior of subjects in a corpus of semi-structured virtual human interviews. At the level of aggregate dialogue-level features, we identify several significant differences between subjects with depression and PTSD when compared to non-distressed subjects. At a more fine-grained level, we show that significant differences can also be found among features that represent subject behavior during specific moments in the dialogues. Finally, we present statistical classification results that suggest the potential for automatic assessment of psychological distress in individual interactions with a virtual human dialogue system.
Fabrizio Morbini, Kartik Audhkhasi, Kenji Sagae, Ron Artstein, Doğan Can, Panayiotis Georgiou, Shri Narayanan, Anton Leuski and David Traum. Which ASR should I choose for my dialogue system? In Proceedings of the SIGDIAL 2013 Conference: the 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 394–403. Metz, France, August 2013.
Abstract: We present an analysis of several publicly available automatic speech recognizers (ASRs) in terms of their suitability for use in different types of dialogue systems. We focus in particular on cloud based ASRs that recently have become available to the community. We include features of ASR systems and desiderata and requirements for different dialogue systems, taking into account the dialogue genre, type of user, and other features. We then present speech recognition results for six different dialogue systems. The most interesting result is that different ASR systems perform best on the data sets. We also show that there is an improvement over a previous generation of recognizers on some of these data sets. We also investigate language understanding (NLU) on the ASR output, and explore the relationship between ASR and NLU performance.
Fabrizio Morbini, Kartik Audhkhasi, Ron Artstein, Maarten Van Segbroeck, Kenji Sagae, Panayiotis Georgiou, David R. Traum, and Shri Narayanan. A reranking approach for recognition and classification of speech input in conversational dialogue systems. In Fourth IEEE Workshop on Spoken Language Technology (SLT). Miami Beach, Forida, December 2012.
Abstract: We address the challenge of interpreting spoken input in a conversational dialogue system with an approach that aims to exploit the close relationship between the tasks of speech recognition and language understanding through joint modeling of these two tasks. Instead of using a standard pipeline approach where the output of a speech recognizer is the input of a language understanding module, we merge multiple speech recognition and utterance classification hypotheses into one list to be processed by a joint reranking model. We obtain substantially improved performance in language understanding in experiments with thousands of user utterances collected from a deployed spoken dialogue system.
Sunghyun Park, Gelareh Mohammadi, Ron Artstein, and Louis-Philippe Morency. Crowdsourcing micro-level multimedia annotations: The challenges of evaluation and interface. To appear in International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM). Nara, Japan, October 2012.
Abstract: This paper presents a new evaluation procedure and tool for crowdsourcing micro-level multimedia annotations and shows that such annotations can achieve a quality comparable to that of expert annotations. We propose a new evaluation procedure, called MM-Eval (Micro-level Multimedia Evaluation), which compares fine time-aligned annotations using Krippendorff’s alpha metric and introduce two new metrics to evaluate the types of disagreement between coders. We also introduce OCTAB (Online Crowdsourcing Tool for Annotations of Behaviors), a web-based annotation tool that allows precise and convenient multimedia behavior annotations, directly from Amazon Mechanical Turk interface. With an experiment using the above tool and evaluation procedure, we show that a majority vote among annotations from 3 crowdsource workers leads to a quality comparable to that of local expert annotations.
David Traum, Priti Aggarwal, Ron Artstein, Susan Foutz, Jillian Gerten, Athanasios Katsamanis, Anton Leuski, Dan Noren, and William Swartout. Ada and Grace: Direct interaction with museum visitors. In Intelligent Virtual Agents: 12th International Conference, IVA 2012, Santa Cruz, CA, USA, September 12–14, 2012 Proceedings (Lecture Notes in Artificial Intelligence 7502), pages 245–251. Springer, Heidelberg, September 2012.
Abstract: We report on our efforts to prepare Ada and Grace, virtual guides in the Museum of Science, Boston, to interact directly with museum visitors, including children. We outline the challenges in extending the exhibit to support this usage, mostly relating to the processing of speech from a broad population, especially child speech. We also present the summative evaluation, showing success in all the intended impacts of the exhibit: that children ages 7–14 will increase their awareness of, engagement in, interest in, positive attitude about, and knowledge of computer science and technology.
Xuchen Yao, Emma Tosch, Grace Chen, Elnaz Nouri, Ron Artstein, Anton Leuski, Kenji Sagae, and David Traum. Creating conversational characters using question generation tools. Dialogue and Discourse 3(2): 125–146, 2012.
Abstract: This article describes a new tool for extracting question-answer pairs from text articles, and reports three experiments which investigate how suitable this technique is for supplying knowledge to conversational characters. Experiment 1 demonstrates the feasibility of our method by creating characters for 14 distinct topics and evaluating them using hand-authored questions. Experiment 2 evaluates three of these characters using questions collected from naive participants, showing that the generated characters provide full or partial answers to about half of the questions asked. Experiment 3 adds automatically extracted knowledge to an existing, hand-authored character, demonstrating that augmented characters can answer questions about new topics but with some degradation of the ability to answer questions about topics that the original character was trained to answer. Overall, the results show that question generation is a promising method for creating or augmenting a question answering conversational character using an existing text.
William Yang Wang, Ron Artstein, Anton Leuski, and David Traum. Improving spoken dialogue understanding using phonetic mixture models. In Cross-Disciplinary Advances in Applied Natural Language Processing: Issues and Approaches, edited by Chutima Boonthum-Denecke, Philip M. McCarthy, and Travis A. Lamkin, chapter 15, pages 225–238. IGI Global, Hershey, Pensylvania, 2012.
Abstract: Reasoning about sound similarities improves the performance of a Natural Language Understanding component that interprets speech recognizer output: the authors observed a 5% to 7% reduction in errors when they augmented the word strings with a phonetic representation, derived from the words by means of a dictionary. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.
Sin-Hwa Kang, Jonathan Gratch, Candy Sidner, Ron Artstein, Lixing Huang, and Louis-Philippe Morency. Towards building a virtual counselor: Modeling nonverbal behavior during intimate self-disclosure. In Eleventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS). Valencia, Spain, June 2012.
Abstract: Nonverbal behavior is considered critical for indicating intimacy and is important when designing a social virtual agent such as a counselor. One key research question is how to properly express intimate self-disclosure. In this paper we present an extensive study of human nonverbal behavior during intimate self-disclosure. This is an important milestone in creating a virtual counselor. A study of video interactions between human participants demonstrated that people display more head tilts and pauses when they revealed highly intimate information about themselves; they presented more head nods and eye gazes during less intimate sharing. An implementation of these behaviors in a virtual agent suggests that people tend to perceive head tilts, pauses and gaze aversion by the agent as conveying intimate self-disclosure. These findings are important for future research with virtual counselors and other social agents.
Priti Aggarwal, Ron Artstein, Jillian Gerten, Athanasios Katsamanis, Shrikanth Narayanan, Angela Nazarian, and David Traum. The Twins corpus of museum visitor questions. In Proceedings of the Eigth International Conference on Language Resources and Evaluation (LREC 2012), pages 2355–2361. Istanbul, Turkey, May 2012.
Abstract: The Twins corpus is a collection of utterances spoken in interactions with two virtual characters who serve as guides at the Museum of Science in Boston. The corpus contains about 200,000 spoken utterances from museum visitors (primarily children) as well as from trained handlers who work at the museum. In addition to speech recordings, the corpus contains the outputs of speech recognition performed at the time of utterance as well as the system interpretation of the utterances. Parts of the corpus have been manually transcribed and annotated for question interpretation. The corpus has been used for improving performance of the museum characters and for a variety of research projects, such as phonetic-based Natural Language Understanding, creation of conversational characters from text resources, dialogue policy learning, and research on patterns of user interaction. It has the potential to be used for research on children’s speech and on language used when talking to a virtual human.
Elnaz Nouri, Ron Artstein, Anton Leuski and David Traum. Augmenting Conversational Characters with Generated Question-Answer Pairs. In Question Generation: Papers form the AAAI Fall Symposium, pages 49–52. Arlington, Virginia, November 2011.
Abstract: We take a conversational character trained on a set of linked question-answer pairs authored by hand, and augment its training data by adding sets of question-answer pairs which are generated automatically from texts on different topics. The augmented characters can answer questions about the new topics, at the cost of some performance loss on questions about the topics that the original character was trained to answer.
Priti Aggarwal, Kevin Feeley, Fabrizio Morbini, Ron Artstein, Anton Leuski, David Traum, and Julia Kim. Interactive characters for cultural training of small military units. In Intelligent Virtual Agents: 11th International Conference, IVA 2011, Reykjavik, Iceland, September 15–17, 2011 Proceedings (Lecture Notes in Artificial Intelligence 6895), pages 426–427. Springer, Heidelberg, 2011. (Poster)
Abstract: CHAOS, the Combat Hunter Action and Observation Simulation, is an immersive simulation training environment which gives small military units the experience of interacting with local Afghan villagers during a patrol. It is a physical build-out of a housing compound in a mock Afghan village, with several life-size reactive and interactive animated Pashto-speaking virtual characters. The exercise requires an infantry squad to locate and interview a character named Omar, communicating through a live human interpreter and attending to proper protocol regarding Omar's family. Character animation and behavior is based on extensive interviews with Afghan experts to provide a realistic setting of the intended locale. The system combines virtual human technology, story engineering, and physical set building to provide a compelling training environment that can handle a full squad, requiring trainees to integrate tasks such as working with an interpreter, dealing with non-English speakers from another culture, and assessing information and disposition to make decisions in a mission context.
Sin-Hwa Kang, Candy Sidner, Jonathan Gratch, Ron Artstein, Lixing Hwang, and Louis-Philippe Morency. Modeling nonverbal behavior of a virtual counselor during intimate self-disclosure. In Intelligent Virtual Agents: 11th International Conference, IVA 2011, Reykjavik, Iceland, September 15–17, 2011 Proceedings (Lecture Notes in Artificial Intelligence 6895), pages 455–457. Springer, Heidelberg, 2011. (Poster)
Abstract: Humans often share personal information with others in order to create social connections. Sharing personal information is especially important in counseling interactions. Research studying the relationship between intimate self-disclosure and human behavior critically informs the development of virtual agents that create rapport with human interaction partners. One significant example of this application is using virtual agents as counselors in psychotherapeutic situations. The capability of expressing different intimacy levels is key to a successful virtual counselor to reciprocally induce disclosure in clients. Nonverbal behavior is considered critical for indicating intimacy and is important when designing a social virtual agent such as a counselor. One key research question is how to properly express intimate self-disclosure. In this study, our main goal is to find what types of interviewees’ nonverbal behavior is associated with different intimacy levels of verbal self-disclosure. Thus, we investigated humans’ nonverbal behavior associated to self-disclosure during interview setting (with intimate topics).
Ron Artstein, Michael Rushforth, Sudeep Gandhe, David Traum and Aram Donigian. Limits of Simple Dialogue Acts for Tactical Questioning Dialogues. Proceedings of the 7th IJCAI workshop on knowledge and reasoning in practical dialogue systems, pages 1–8. Barcelona, Spain, July 2011.
Abstract: A set of dialogue acts, generated automatically by applying a dialogue act scheme to a domain representation designed for easy scenario authoring, covers approximately 72%–76% of user utterances spoken in live interaction with a tactical questioning simulation trainer. The domain is represented as facts of the form <object, attribute, value> and conversational actions of the form <character, action>. User utterances from the corpus that fall outside the scope of the scheme include questions about temporal relations, relations between facts and relations between objects, questions about reason and evidence, assertions by the user, conditional offers, attempts to set the topic of conversation, and compound utterances. These utterance types constitute the limits of the simple dialogue act scheme.
Ron Artstein. Error Retun Plots. Proceedings of the SIGDIAL 2011 Conference: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 319–324. Portland, Oregon, June 2010. (Poster)
Abstract: Error-return plots show the rate of error (misunderstanding) against the rate of non-return (non-understanding) for Natural Language Processing systems. They are a useful visual tool for judging system performance when other measures such as recall/precision and detection-error tradeoff are less informative, specifically when a system is judged on the correctness of its responses, but may elect to not return a response.
Kallirroi Georgila, Ron Artstein, Angela Nazarian, Michael Rushforth, David Traum, and Katia Sycara. An annotation scheme for cross-cultural argumentation and persuasion dialogues. Proceedings of the SIGDIAL 2011 Conference: the 12th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 272–278. Portland, Oregon, June 2010. (Poster)
Abstract: We present a novel annotation scheme for cross-cultural argumentation and persuasion dialogues. This scheme is an adaptation of existing coding schemes on negotiation, following a review of literature on cross-cultural differences in negotiation styles. The scheme has been refined through application to coding both two-party and multi-party negotiation dialogues in three different domains, and is general enough to be applicable to differ- ent domains with minor or no modifications at all. Dialogues annotated with the scheme have been used to successfully learn culture-specific dialogue policies for argumentation and persuasion.
William Yang Wang, Ron Artstein, Anton Leuski, and David Traum. Improving spoken dialogue understanding using phonetic mixture models. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, pages 329–334. Palm Beach, Florida, May 2011.
Abstract: Augmenting word tokens with a phonetic representation, derived from a dictionary, improves the performance of a Natural Language Understanding component that interprets speech recognizer output: we observed a 5% to 7% reduction in errors across a wide range of response return rates. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.
Grace Chen, Emma Tosch, Ron Artstein, Anton Leuski, and David Traum. Evaluating conversational characters created through question generation. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, pages 343–344. Palm Beach, Florida, May 2011. (Poster)
Abstract: Question generation tools can be used to extract a question-answer database from text articles. We investigate how suitable this technique is for giving domain-specific knowledge to conversational characters. We tested these characters by collecting questions and answers from naive participants, running the questions through the character, and comparing the system responses to the participant answers. Characters gave a full or partial answer to 53% of the user questions which had an answer available in the source text, and 43% of all questions asked. Performance was better for questions asked after the user had read the source text, and also varied by question type: the best results were answers to who questions, while answers to yes/no questions were among the poorer performers. The results show that question generation is a promising method for creating a question answering conversational character from an existing text.
Julia Campbell, Mark Core, Ron Artstein, Lindsay Armstrong, Arno Hartholt, Cyrus Wilson, Kallirroi Georgila, Fabrizio Morbini, Edward Haynes, Dave Gomboc, Mike Birch, Jonathan Bobrow, H. Chad Lane, Jillian Gerten, Anton Leuski, David Traum, Matthew Trimmer, Rich DiNinni, Matthew Bosack, Timothy Jones, Richard E. Clark, and Kenneth A. Yates. Developing INOTS to support interpersonal skills practice. 2011 IEEE Aerospace Conference, Big Sky, Montana, March 2011.
Abstract: The Immersive Naval Officer Training System (INOTS) is a blended learning environment that merges traditional classroom instruction with a mixed reality training setting. INOTS supports the instruction, practice and assessment of interpersonal communication skills. The goal of INOTS is to provide a consistent training experience to supplement interpersonal skills instruction for Naval officer candidates without sacrificing trainee throughput and instructor control. We developed an instructional design from cognitive task analysis interviews with experts to serve as a framework for system development. We also leveraged commercial student response technology and research technologies including natural language recognition, virtual humans, realistic graphics, intelligent tutoring and automated instructor support tools. In this paper, we describe our methodologies for developing a blended learning environment, and our challenges adding mixed reality and virtual human technologies to a traditional classroom to support interpersonal skills training.
Antonio Roque, Kallirroi Georgila, Ron Artstein, Kenji Sagae, and David Traum. Natural language processing for joint fire observer training. 27th Army Science Conference, Orlando, Florida, December 2010.
Abstract: We describe recent research to enhance a training system which interprets Call for Fire (CFF) radio artillery requests. The research explores the feasibility of extending the system to also understand calls for Close Air Support (CAS). This work includes automated analysis of complex language behavior in CAS missions, evaluation of speech recognition performance, and simulation of speech recognition errors.
Abstract: The purpose of this study is to get a working definition that matches people’s intuitive notion of gossip and is sufficiently precise for computational implementation. We conducted two experiments investigating what type of conversations people intuitively understand and interpret as gossip, and whether they could identify three proposed constituents of gossip conversations: third person focus, pejorative evaluation and substantiating behavior. The results show that (1) conversations are very likely to be considered gossip if all elements are present, no intimate relationships exist between the participants, and the person in focus is unambiguous. (2) Conversations that have at most one gossip element are not considered gossip. (3) Conversations that lack one or two elements or have an ambiguous element lead to inconsistent judgments.
William Swartout, David Traum, Ron Artstein, Dan Noren, Paul Debevec, Kerry Bronnenkant, Josh Williams, Anton Leuski, Shrikanth Narayanan, Diane Piepol, Chad Lane, Jacquelyn Morie, Priti Aggarwal, Matt Liewer, Jen-Yuan Chiang, Jillian Gerten, Selina Chu, and Kyle White. Ada and Grace: Toward realistic and engaging virtual museum guides. In Intelligent Virtual Agents: 10th International Conference, IVA 2010, Philadelphia, PA, USA, September 20–22, 2010 Proceedings (Lecture Notes in Artificial Intelligence 6356), pages 286–300. Springer, Heidelberg, 2010.
Abstract: To increase the interest and engagement of middle school students in science and technology, the InterFaces project has created virtual museum guides that are in use at the Museum of Science, Boston. The characters use natural language interaction and have near photoreal appearance to increase engagement. The paper presents an evaluation of natural language performance and presents reports from museum staff on visitor reaction.
Xuchen Yao, Pravin Bhutada, Kallirroi Georgila, Kenji Sagae, Ron Artstein, and David Traum. Practical evaluation of speech recognizers for virtual human dialogue systems. LREC 2010, Valetta, Malta, May 2010. (Poster)
Abstract: We perform a large-scale evaluation of multiple off-the-shelf speech recognizers across diverse domains for virtual human dialogue systems. Our evaluation is aimed at speech recognition consumers and potential consumers with limited experience with readily available recognizers. We focus on practical factors to determine what levels of performance can be expected from different available recognizers in various projects featuring different types of conversational utterances. Our results show that there is no single recognizer that outperforms all other recognizers in all domains. The performance of each recognizer may vary significantly depending on the domain, the size and perplexity of the corpus, the out-of-vocabulary rate, and whether acoustic and language model adaptation has been used or not. We expect that our evaluation will prove useful to other speech recognition consumers, especially in the dialogue community, and will shed some light on the key problem in spoken dialogue systems of selecting the most suitable available speech recognition system for a particular application, and what impact training will have.
Michael Rushforth, Sudeep Gandhe, Ron Artstein, Antonio Roque, Sarrah Ali, Nicolle Whitman, and David Traum. Varying personality in spoken dialogue with a virtual human. In Intelligent Virtual Agents: 9th International Conference, IVA 2009, Amsterdam, The Netherlands, September 14-16, 2009 Proceedings (Lecture Notes in Artificial Intelligence 5773), pages 541–542. Springer, Heidelberg, 2009. (Poster)
Abstract: This poster reports the results of two experiments to test a personality framework for virtual characters. We use the Tactical Questioning dialogue system architecture (TACQ) as a testbed for this effort. Characters built using the TACQ architecture can be used by trainees to practice their questioning skills by engaging in a role-play with a virtual human. The architecture supports advanced behavior in a questioning setting, including deceptive behavior, simple negotiations about whether to answer, tracking subdialogues for offers/threats, grounding behavior, and maintenance of the affective state of the virtual human. Trainees can use different questioning tactics in their sessions. In order for the questioning training to be effective, trainees should have experience of interacting with virtual humans with different personalities, who react in different ways to the same questioning tactics.
Sudeep Gandhe, Nicolle Whitman, David Traum and Ron Artstein. An integrated authoring tool for tactical questioning dialogue systems. In 6th Workshop on Knowledge and Reasoning in Practical Dialogue Systems, Pasadena, California, July 2009.
Abstract: We present an integrated authoring tool for rapid prototyping of dialogue systems for virtual humans taking part in tactical questioning simulations. The tool helps domain experts, who may have little or no knowledge of linguistics or computer science, to build virtual characters that can play the role of the interviewee. Working in a top-down fashion, the authoring process begins with specifying a domain of knowledge for the character; the authoring tool generates all relevant dialogue acts and allows authors to assign the language that will be used to refer to the domain elements. The authoring tool can also be used to manipulate some aspects of the dialogue strategies employed by the virtual characters, and it also supports re-using some of the authored content across different characters.
Ron Artstein, Sudeep Gandhe, Michael Rushforth and David Traum. Viability of a Simple Dialogue Act Scheme for a Tactical Questioning Dialogue System. In DiaHolmia 2009: Proeedings of the 13th Workshop on the Semantics and Pragmatics of Dialogue, pages 43–50. Stockholm, Sweden, June 2009.
Abstract: User utterances in a spoken dialogue system for tactical questioning simulation were matched to a set of dialogue acts generated automatically from a representation of facts as <object, attribute, value> triples and actions as <character, action> pairs. The representation currently covers about 50% of user utterances, and we show that a few extensions can increase coverage to 80% or more. This demonstrates the viability of simple schemes for representing question-answering dialogues in implemented systems.
Ron Artstein, Sudeep Gandhe, Jillian Gerten, Anton Leuski and David Traum. Semi-formal evaluation of conversational characters. In Languages: From Formal to Natural. Essays Dedicated to Nissim Francez on the Occasion of His 65th Birthday (Lecture Notes in Computer Science 5533), edited by Orna Grumberg, Michael Kaminski, Shmuel Katz and Shuly Wintner, pages 22–35. Springer, Heidelberg, 2009.
Abstract: Conversational dialogue systems cannot be evaluated in a fully formal manner, because dialogue is heavily dependent on context and current dialogue theory is not precise enough to specify a target output ahead of time. Instead, we evaluate dialogue systems in a semi-formal manner, using human judges to rate the coherence of a conversational character and correlating these judgments with measures extracted from within the system. We present a series of three evaluations of a single conversational character over the course of a year, demonstrating how this kind of evaluation helps bring about an improvement in overall dialogue coherence.
Ron Artstein, Jacob Cannon, Sudeep Gandhe, Jillian Gerten, Joe Henderer, Anton Leuski and David Traum. Coherence of off-topic responses for a virtual character. 26th Army Science Conference, Orlando, Florida, December 2008.
Abstract: We demonstrate three classes of off-topic responses which allow a virtual question-answering character to handle cases where it does not understand the user s input: ask for clarification, indicate misunderstanding, and move on with the conversation. While falling short of full dialogue management, a combination of such responses together with prompts to change the topic can improve overall dialogue coherence.
Sudeep Gandhe, David DeVault, Antonio Roque, Bilyana Martinovski, Ron Artstein, Anton Leuski, Jillian Gerten, and David Traum. From domain specification to virtual humans: An integrated approach to authoring tactical questioning characters. Interspeech 2008, Brisbane, Australia, September 2008.
Abstract: We present a new approach for rapidly developing dialogue capabilities for virtual humans. Starting from domain specification, an integrated authoring interface automatically generates dialogue acts with all possible contents. These dialogue acts are linked to example utterances in order to provide training data for natural language understanding and generation. The virtual human dialogue system contains a dialogue manager following the information-state approach, using finite-state machines and SCXML to manage local coherence, as well as explicit modeling of emotions and compliance level and a grounding component based on evidence of understanding. Using the authoring tools, we design and implement a version of the virtual human Hassan and compare to previous architectures for the character.
David DeVault, David Traum and Ron Artstein. Making grammar-based generation easier to deploy in dialogue systems. Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, pages 198–207. Columbus, Ohio, June 2008.
Abstract: We present a development pipeline and associated algorithms designed to make grammarbased generation easier to deploy in implemented dialogue systems. Our approach realizes a practical trade-off between the capabilities of a system s generation component and the authoring and maintenance burdens imposed on the generation content author for a deployed system. To evaluate our approach, we performed a human rating study with system builders who work on a common largescale spoken dialogue system. Our results demonstrate the viability of our approach and illustrate authoring/performance trade-offs between hand-authored text, our grammar-based approach, and a competing shallow statistical NLG technique.
David DeVault, David Traum and Ron Artstein. Practical grammar-based NLG from examples. Proceedings of the Fifth International Natural Language Generation Conference, pages 77–85. Salt Fork, Ohio, June 2008.
Abstract: We present a technique that opens up grammar-based generation to a wider range of practical applications by dramatically reducing the development costs and linguistic expertise that are required. Our method infers the grammatical resources needed for generation from a set of declarative examples that link surface expressions directly to the application s available semantic representations. The same examples further serve to optimize a run-time search strategy that generates the best output that can be found within an application-specific time frame. Our method offers substantially lower development costs than hand-crafted grammars for applicationspecific NLG, while maintaining high output quality and diversity.
Abstract: Arrau is a new corpus annotated for anaphoric relations, with information about agreement and explicit representation of multiple antecedents for ambiguous anaphoric expressions and discourse antecedents for expressions which refer to abstract entities such as events, actions and plans. The corpus contains texts from different genres: task-oriented dialogues from the Trains-91 and Trains-93 corpus, narratives from the English Pear Stories corpus, newspaper articles from the Wall Street Journal portion of the Penn Treebank, and mixed text from the Gnome corpus.
Ron Artstein, Sudeep Gandhe, Anton Leuski and David Traum. Field Testing of an interactive question-answering character. Proceedings of the ELRA workshop on evaluation, pages 36–40. Marrakech, Morocco, May 2008.
Abstract: We tested a life-size embodied question-answering character at a convention where he responded to questions from the audience. The character’s responses were then rated for coherence. The ratings, combined with speech transcripts, speech recognition results and the character’s responses, allowed us to identify where the character needs to improve, namely in speech recognition and providing off-topic responses.
Ron Artstein and Massimo Poesio. Identifying reference to abstract objects in dialogue. brandial 2006 proceedings, Potsdam, Germany, September 2006.
Abstract: In two experiments, many annotators marked antecedents for discourse deixis as unconstrained regions of text. The experiments show that annotators do converge on the identity of these text regions, though much of what they do can be captured by a simple model. Demonstrative pronouns are more likely than definite descriptions to be marked with discourse antecedents. We suggest that our methodology is suitable for the systematic study of discourse deixis.
Massimo Poesio, Patrick Sturt, Ron Artstein, and Ruth Filik. Underspecification and Anaphora: Theoretical Issues and Preliminary Evidence. Discourse Processes 42(2): 157-175, 2006.
Distributed as Technical report CSM-438, University of Essex Department of Computer Science, October 2005.
Abstract: Much experimental work in psycholinguistics suggests that fully specified syntactic and semantic interpretations are obtained incrementally. The finding that intepretation takes place incrementally is very robust and underlies our own view of sentence processing as well; however, most of this work tends to test very simple interpretive judgments, and using materials which have very clean-cut interpretations, which makes the view expressed above more questionable when applied to semantic interpretation. This article discusses a class of anaphoric expressions that do not appear to have a clear antecedent, using both corpus analysis and psychological experiments. We argue that these cases of anaphora are similar to cases of lexical polysemy, and propose an explicit semantic representation for such cases.
Ron Artstein and Massimo Poesio. Inter-coder agreement for computational linguistics (survey article). Computational Linguistics 34(4): 555-596, 2008.
Abstract: This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.
Ron Artstein and Nissim Francez. Plurality and temporal modification. Linguistics and Philosophy 29(3): 251-276, 2006.
Abstract: A semantics with plural entities and plural times accounts for cumulative relations between plural arguments and temporal expressions. The semantics equips nominal, verbal and sentential meanings with temporal context variables and treats temporal modifiers as temporal generalized quantifiers; cumulative conjunction, however, takes place at types lower than generalized quantifiers. The mediation of temporal context variables allows cumulative relations to percolate between an argument in a main clause and one in a temporal clause, in apparent violation of locality restrictions. Plural times form a semilattice structure imposed on the set of intervals; no interaction is observed between this and the internal temporal structure of intervals.
Ron Artstein and Massimo Poesio. Bias decreases in proportion to the number of annotators. In Gerhard Jaeger, Paola Monachesi, Gerald Penn, James Rogers, and Shuly Wintner (eds.), Proceedings of FG-MoL 2005, pages 141-150. Edinburgh, August 2005.
Abstract: The effect of the individual biases of corpus annotators on the value of reliability coefficients is inversely proportional to the number of annotators (less one). As the number of annotators increases, the effect of their individual preferences becomes more similar to random noise. This suggests using multiple annotators as a means to control individual biases.
Abstract: We report the results of a preliminary study attempting to identify ambiguous expressions in spoken language dialogues. In this study we developed methods for marking explicit ambiguity, and generalized previous proposals by Passonneau concerning a distance metric for anaphora to be used with the α coefficient to allow for ambiguous annotations.
Massimo Poesio and Ron Artstein. The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 76-83. Ann Arbor, June 2005.
Abstract: We report the results of a study of the reliability of anaphoric annotation which (i) involved a substantial number of naive subjects, (ii) used Krippendorff’s α instead of κ to measure agreement, as recently proposed by Passonneau, and (iii) allowed annotators to mark anaphoric expressions as ambiguous.
Abstract: Quantificational arguments can take scope outside of temporal adjunct clauses, in an apparent violation of locality restrictions: the sentence few secretaries cried after each executive resigned allows the quantificational NP each executive to take scope above few secretaries. I show how this scope relation is the result of local operations: the adjunct clause is a temporal generalized quantifier, which takes scope over the main clause (Pratt and Francez 2001), and within the adjunct clause, the quantificational argument takes scope above the implicit determiner which forms the temporal generalized quantifier. The paper explores various relations among quantificational arguments across clause boundaries, including temporal clauses that are modified internally by a temporal adverbial and temporal clauses with embedded sentential complements.
Abstract: Coordination of parts of words, as in ortho and periodontists, has to be interpreted at the level of the word parts because the above NP can felicitously describe a pair of one orthodontist and one periodontist. This paper develops a theory of denotations for arbitrary word parts, in which the coordinate word parts denote their own sound, and the rest of the word is a function from sounds to word meanings. This yields the correct interpretation for number in coordinate constructions. The paper also explores phonological constraints on coordinate structures, and shows how certain ungrammatical structures that can be interpreted by the semantics are ruled out on phonological grounds.
Abstract: Intonational focus can be observed on parts of words that appear to lack intrinsic meaning, and triggers alternatives that are similar in form. In order to provide a unified treatment of focus above and below the word level (they do, after all, behave the same in most respects), I develop a theory of denotations for arbitrary word parts in which focused word parts denote their own sound and the unfocused parts are functions from sounds to word meanings. This allows focus theories to generalize below the word level; any differences with focus above the word level are located in the semantics of word parts. The paper also explores phonological constraints on focus placement, and shows that the focusability of a word part depends solely on its prosodic status, not on any semantic factors.
Abstract: Echo questions are interpreted through focus semantics. Echo questions must be entailed by previous discourse; focus is therefore not needed to mark givenness, and instead it is used to compute the question denotation: the questioned element, marked with a pitch accent, is a focus constituent, and the alternative set of the echo question is its question denotation, i.e. the set of possible answers. The focus strategy exempts echo questions from locality restrictions (“islands”), allows echo questions on parts of words, and allows second-order echo questions which denote sets of questions.
Ron Artstein. Person, animacy and null subjects. In Tina Cambier-Langeveld, Anikó Lipták, Michael Redford and Erik Jan van der Torre (eds.), Proceedings of Console VII, pp. 1-15. SOLE, Leiden, 1999.
Abstract: Licensing of null subjects can be contingent on person and animacy specification. For example, Hebrew allows null subjects if they are first or second person, but not if they are third person. This follows from a general typology that is based on the universal person/animacy hierarchy: if a subject of a certain person or animacy specification may be null, then every subject higher on the hierarchy may be null as well. The above typology, in turn, follows from the general way abstract hierarchies interact in the grammar: elements that appear on the high end of one hierarchy and the low end of another give rise to marked configurations. The mechanism of alignment in Optimality Theory gives a formalization of these universal properties of hierarchies.
Ron Artstein. The incompatibility of underspecification and markedness in Optimality Theory. In Ron Artstein and Madeline Holler (eds.), RuLing Papers 1: Working Papers from Rutgers University, pp. 7-13. Rutgers University Department of Linguistics, New Brunswick, NJ, 1998.
Abstract: Underspecification in the underlying representation cannot give rise to marked structure on the surface, because Optimality Theory grammars force an output to be equally or less marked than the input. Underspecification can still account for alternations involving unmarked structure, but it is only useful when such alternations exist along with forms that do not alternate. The evidence for the existence of such grammatical systems is not very convincing, casting doubts about the usefulness of underspecification in general.
Ron Artstein. Group events as means for representing collectivity. In Benjamin Bruening (ed.), MITWPL 31: Proceedings of the Eighth Student Conference in Linguistics , pp. 41-51. MIT Working Papers in Linguistics, Cambridge, MA, 1997.
Abstract: In this paper I argue in favor of the introduction of "group" events into a framework of event semantics; these mirror the "group" individuals introduced by Landman (1989), and give the domain of events a structure similar to that of the domain of individuals. Group events are used in order to capture collectivity effects that cannot be represented through the domain of individuals, as in the case of predicate conjunction. An attempt to extend the notion of group events and to use them for counting with adverbials such as three times proves at the very least troublesome.