|
Sixth International Cognitive Vision Workshop (ICVW 2011) |
| Workshop Date |
September 26, 2011 |
| Location |
San Francisco, CA, USA
|
| Organizers |
- Barbara Caputo, Idiap Research Institute, Martigny, Switzerland,
- Fiora Pirri, ALCOR LAB, Dipartimento di Informatica e Sistemistica, University of Rome La Sapienza, Rome, Italy,
- Michael Zillich, Automation and Control Institute, Vienna University of Technology, Wien, Austria
|
Program
(9:00-9:10) Michael Zillich
Welcome
Session I: Categorization
(9:10-9:55) Mario Fritz
Towards recognition in the real-wold
(9:55-10:40) Michael Zillich
Situated 3D Vision in Robotic Systems
Coffee Break (10:40-11:10)
Session II: Attention
(11:10-11:55) Matthias Scheutz
Investigating Joint Attention Processes in
Natural Language Human-Robot Interactions
(11:55-12:40) Laurent Itti
Biologically-inspired attention and scene understanding algorithms for mobile robots
(12:40-12:50) Concluding remarks
Motivation
Computer vision is gaining importance in the fields of artificial cognitive systems and robotics, due to the
progress achieved in the last years in object localization, categorization and scene analysis as well as its low
cost and versatility. From robot localization to manipulation, the integration of state of the art vision algorithms
into robotic systems is a success story. Still, research in the two fields are largely separated. Vision has been
traditionally studied using a reductionistic approach. This trend is even more pronounced now, as computer
vision moves towards internet vision, i.e. computer vision methods for browse, classify and understand images
found on the Web. We argue that issues such as multi-cue integration, embodied categorization and situated
attention should be studied in the context of systems. The goals of this workshop are to document the progress
of the relatively young field of cognitive computer vision and systems, to bring together the researchers working
and interested in this field and giving them a platform to discuss the most recent advances and what are the
research challenges that is timely to attack today.
Topics
The meeting will particularly focus on two main issues:
categorization and attention, and how they are declined for a situated agent, or for a web-agent engaged with
millions of images available on the Web.
- Categorization:
The capability to categorize objects on the basis of their visual appearance is one of the
crucial cognitive abilities that enable humans to understand the outside world and interact with it. Providing an
autonomous robot with the same capability is a major scientific challenge. The computer vision community has
achieved impressive results in this field recently, but these results are not easily exploited by the robotics
community. The current mainstream approach in computer vision performs categorization from collections of
static images, typically acquired on the Web. A very recent trend focuses on the ability to categorize 10,000
object categories, representing objects on the basis of only one type of visual feature. New methods are
needed to enable abstractions and effective categorization, keeping into account the 3D structure of object
categories, their associated affordances and how embodiment, context and task affects modeling and learning
for an autonomous agent.
- Attention:
30 or so years ago attention was compelled by the hard constraints of the limited computational
resources computer vision had to deal with. But nowadays, in the era of the large database and machine
learning and brute force solutions, powerful statistical classifiers may yield impressive single task performance.
The drawback is that they do not lead to representations suitable for reasoning and acting in a real world as it is
required by an autonomous cognitive agent. Further, they ignore the fact that an autonomous agent world is
dynamic, unpredictable, and not describable by ten thousands of objects. Even the databases on which these
systems are tested require re-consideration so as to ensure they are statistically valid for the tasks at hands,
and are not biased. Still, results are impressive and only promise to become more so with computer power
increases and declining costs. Still, the goal for vision systems is really the same as it was 30 years ago. We
want systems to be robust to the whole variability in the visual world, to the way we view the world and to the
knowledge we have of the world, we want them to be flexible and not be single-task systems, we want them to
do more than image classification, and visual reasoning and problem solving remain important unsolved
problems. We want them to properly deal with the unexpected or a not-previously-viewed scene. Attention is
that capacity that helps optimize the search processes inherent in perception, cognition and action, thus
reducing the computational load of an agent. The spectrum of attentive behavior is broad and goes far beyond
the simple region-of-interest functions most common in today's systems. When embodied in an agent, attention
controls active sensing and action and promises to enable the kind of flexible and robust systems that have
been the goal of computer vision since its earliest days.