The Dataset contains the video recognition/tracking annotation and text’s semantic role annotation.
In this study, a static 3D bedroom scene is shown to the user. The system verbally asks the user a list of questions one at a time about the bedroom and the user answers the questions by speaking to the system.
In this domain, the user’s task is to find some treasures that are hidden in a 3D castle. The user can walk around inside the castle and move objects. The user needs to consult with a remote “expert” (i.e., an artificial agent) to find the treasures. The expert has some knowledge about the treasures but can not see the castle. The user has to talk to the expert for advice about finding the treasures.
In this domain, users were asked to accomplish tasks in two scenarios. Each scenario put the user into a specific role (e.g., college student, professor, etc.), and the task had to be completed with a set of constraints (e.g., budget of furnishings, bed size, number of domestic products, etc.). Users interacted with the system through a touchscreen using speech and deictic gesture.
The data used in EMNLP 2010 paper can be downloaded here.
Our annotated data is available here.