Skip to content. | Skip to navigation

Navigation

Idiap Speaker Series

Personal tools

This is SunRain Plone Theme
You are here: Home / Past Talks

Archives

Tue, 27 Sep 2016
11:00:00
Prof. Mark Gales
from Cambridge University
Talk place: Idiap Research Institute

Deep Learning for Speech Processing: An NST Perspective

Abstract:
Abstract:
The Natural Speech Technology EPSRC Programme Grant was a 5 year collaboration between Edinburgh,Cambridge and Sheffield Universities, with the aim of improving core speech recognition and synthesis technology. During the lifetime of the project dramatic changes took place in the underlying technology for speech processing with the introduction of deep learning. This has yielded significant performance improvements, as well as offering a very rich space of model to investigate. This talk discusses the general area of deep learning for speech processing, with a particular emphasis on sequence-to-sequence models: in speech recognition, waveform to text; and in synthesis, text to waveform. Both generative and discriminative models for sequence-to-sequence models are described along with variants on the standard topologies and the implications for both training and inference. Rather than focusing on results for particular models, the talk aims to describe the connections and differences between sequence-to-sequence models and the underlying assumptions for these models.

Short Biography:
Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He was appointed Reader in Information Engineering in 2004. He is currently a Professor of Information Engineering and a College Lecturer and Official Fellow of Emmanuel College. Mark Gales is a Fellow of the IEEE, a Senior Area Editor of IEEE/ACM Transactions on Audio Speech and Language Processing for speech recognition and synthesis, and a member of the Speech and Language Processing Technical Committee (2015-2017, previously a member from 2001-2004). He was an associate editor for IEEE Signal Processing Letters from 2008-2011 and IEEE Transactions on Audio Speech and Language Processing from 2009-2013. He is currently on the Editorial Board of Computer Speech and Language. 
Mark Gales has been awarded a number of paper awards, including a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.
               
 
Fri, 15 Jul 2016
11:00:00
Dr. Freek Stulp
from the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany
Talk place: Idiap Research Institute

TALK - Robot Skill Learning: From Reinforcement Learning to Evolution Strategies

Abstract:
A popular approach to robot skill learning is to initialize a skill through imitation learning, and to then refine and improve the skill through reinforcement learning. In this presentation, I highlight three contributions to this approach:
1) Enabling skills to adapt to task variations by using multiple demonstrations for imitation learning,
2) Improving skills through reinforcement learning based on reward-weighted averaging and black-box optimization with evolution strategies.
3) Using covariance matrix adaptation to automatically tune exploration during reinforcement learning.
Throughout the presentation I show several applications to challenging manipulation tasks on several humanoid robots.


BIO

http://freekstulp.net/#Bio

Dr. Freek Stulp's research focuses on using machine learning and artificial intelligence to improve the robustness and adaptivity of planning and control for autonomous robots. One of his main research themes is enabling robots to autonomously acquire and refine skills through imitation and reinforcement learning. He received his doctorate degree in Computer Science from the Technische Universität München in 2007. He was awarded post-doctoral research fellowships from the
Japanese Society for the Promotion of Science and the German Research Foundation (DFG), to pursue his research at the Advanced Telecommunications Research Institute International (Kyoto) and the University of Southern California (Los Angeles). From 2011 to 2015 he
was an assistant professor at the École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech). Since March 2016 he is the head of the new department of cognitive robotics at the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany.
               
 
Fri, 15 Jul 2016
15:00:00
Dr. Freek Stulp
from the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany
Talk place: Idiap Research Institute

TUTORIAL - Tutorial on Regression

Abstract:
Tutorial on Regression based on the article: 
Freek Stulp and Olivier Sigaud (2015). Many Regression Algorithms, One Unified Model - A Review. Neural Networks, 69:60-79.
Link: http://freekstulp.net/publications/pdfs/stulp15many.pdf
               
 
Thu, 7 Jul 2016
14:00:00
Harry Witchel* & Carina Westling#
from *Discipline Leader in Physiology, Brighton/Sussex Medical School --- #School of Media Film and Music
Talk place: Idiap Research Institute

Eliciting and recognising complex emotions and mental states including engagement and boredom

Abstract:
ABSTRACT: 
Complex emotions are any emotional state except for Ekman’s 6 basic emotions: happy, sad, fear, anger, surprise and disgust.  Complex emotions can include mixtures of the basic emotions (e.g. horror), emotions outside the basic emotions (e.g. musical “tension”), and emotions mixed with mental states that are not emotions (e.g. engagement and boredom).  Eliciting and recognising complex emotions, and allowing systems to respond to them, will be useful for eLearning, human factors (including vigilance), and responsive systems including human-robot interaction.  

In this talk we will present our work towards the elicitation and recognition of conscious or subconscious responses. Engineering and psychological solutions to non-invasively determine such mental states and complex emotions may use movement, posture, facial expression, physiology, and sound.  Furthermore, our team has shown that what people suppress is as revealing as what they do. We consider aspects of music listening, movie watching, game playing, quiz-taking, reading, and walking to untangle the complex emotions that can arise.  The mental states of engagement and boredom are considered in relation to fidgeting and to Non-Instrumental Movement Inhibition (NIMI), in order to clarify fundamental research problems and direct research design toward improved solutions.

SPEAKER BIOGRAPHIES

In 2016 Harry Witchel and Carina Westling published their ninth inter-disciplinary paper together, on Non-Instrumental Movement Inhibition.  It received significant international media attention, including an article about it in Scientific American.

Harry Witchel is Discipline Leader in Physiology at Brighton and Sussex Medical School at the University of Sussex.  His research interests are: Nonverbal Behaviour; Motion Capture; Gait in Multiple Sclerosis; Soundscape; Engagement; Psychobiology.  His laboratory uses wearable sensors, motion capture and time series analysis to determine the cognitive and behavioural correlates of engagement and disengagement in response to different psychologically relevant stimuli, especially music. He has performed experiments for many consultancy clients, including Honda, Nike, DHL and Tesco.  He also has an international track record of promoting public engagement with science including appearances on the Discovery Channel, BBC World Service Radio, and the Financial Times.  In 2004 he was awarded the national honour of the Charles Darwin Award lecture by the British Science Association.  In 2011 his book on music was published: “You Are What You Hear: How Music and Territory Change Who We Are” (Algora, New York).

Carina Westling researches live and mediated interaction design, and worked as a researching designer with Punchdrunk theatre company 2011-2014. She is the Creative Director of the Nimbus Group, who produce digital arts projects, including Giddy (2016), The Nimbus (2014), and 0-1 (2012). She is a contributing author to Digital Make-Believe, which was published in May 2016 (Springer, Berlin). Her research interests include interface design, interactive system narratives, audience research, spatial sound design, and nonverbal behaviour.
               
 
Wed, 22 Jun 2016
11:00:00
Asst Prof Gregoire Mariethoz
from University of Lausanne, Institute of Earth Surface Dynamics
Talk place: Idiap Research Institute

Training models with images: algorithms and applications

Abstract:
Abstract:
Multiple-point geostatistics (MPS) has received a lot of attention in the last decade for modeling complex spatial patterns. The underlying principle consists in representing spatial variability using training images. A common conception is that a training image can be seen as a prior for the desired spatial variability. As a result, a variety of algorithmic tools have been developed to generate stochastic realizations of spatial processes based on what can be seen broadly as texture generation algorithms.

While the initial applications of MPS were dedicated to the characterization of 3D subsurface structures and the study of geological/hydrogeological reservoirs, a new trend is to use MPS for the modeling of earth surface processes. In this domain, the availability of remote sensing data as a basis to construct training images offers new possibilities for represent complexity with such non-parametric data-driven approaches. Repeated satellite observations or climate models outputs, available at a daily frequency for periods of several years, provide the required patterns repetition for having robust statistics on high-order patterns that vary in both space and time.

This presentation will delineate recent results in this direction, including MPS applications to the stochastic downscaling of climate models, the completion of partially informed satellite images, the removal of noise in remote sensing data, and modeling of complex spatio-temporal phenomena such as precipitation.

Biography:
Grégoire Mariethoz was born in Neuchâtel (Switzerland) in 1978. He received a M.S. degree (2003), a MAS degree (2006) and a Ph.D. degree (2009) in hydrogeology from the University of Neuchâtel.

In 2009-2010 he worked as a postdoctoral researcher at Stanford University, then between 2010 and 2014 he was Senior Lecturer at UNSW Australia. Since 2014 he is Professor Assistant at the University of Lausanne, Switzerland. His interests include the development of spatial statistics algorithms and their application in hydrology, hydrogeology and remote sensing.
               
 
Thu, 12 May 2016
10:30:00
Prof. Steve Renals
from University of Edinburgh, UK
Talk place: Idiap Research Institute

Adaptation of Neural Network Acoustic Models

Abstract:
Neural networks can learn invariances through many layers of non-linear transformations.  Explicit adaptation to speaker or acoustic characteristics can further improve accuracy.   A good adaptation technique should: (1) have a compact representation to allow the speaker-dependent parameters to be estimated from small amounts of adaptation data, and minimises storage requirements; (2) operate in an unsupervised fashion without requiring labelled adaptation data; and (3) allow for both test-only adaptation and speaker-adaptive training.

In this talk I'll discuss some approaches to the adaptation of neural network acoustic models - for both speech recognition and speech synthesis - with a focus on some approaches that we have explored in the "Natural Speech Technology" programme: factorised i-vectors, LDA domain codes, learning hidden unit contributions (LHUC), and differentiable pooling.

------------------------------------------

Short Biography

Steve Renals is professor of Speech Technology and director of the Institute for Language, Cognition, and Communication in the School of Informatics, at the University of Edinburgh. Previously, he was director of the Centre for Speech Technology Research (CSTR).

He received a BSc in Chemistry from the University of Sheffield in 1986, an MSc in Artificial Intelligence from the University of Edinburgh in 1987, and a PhD in Speech Recognition and Neural Networks, also from Edinburgh, in 1990. From 1991-92 he was a postdoctoral fellow at the International Computer Science Institute (ICSI), Berkeley, and was then an EPSRC postdoctoral fellow in Information Engineering at the University of Cambridge (1992-94). From 1994-2003 he was lecturer, then reader, in Computer Science at the University of Sheffield, moving to Edinburgh in 2003.

He has over 200 publications in speech and language processing, and has led several large projects in the field, including EPSRC Programme Grant Natural Speech Technology and the AMI and AMIDA Integrated Projects. He is a senior area editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing and a member of the ISCA Advisory Council. He is a fellow of the IEEE, and a member of ISCA and of the ACM.
               
 
Mon, 21 Mar 2016
14:00:00
Prof. Réjean Plamondon
from Laboratoire Scribens, Département de Génie Électrique École Polytechnique de Montréal
Talk place: Idiap Research Institute

The Lognormality Principle

Abstract:
The Kinematic Theory of rapid human movements and its family of lognormal models provide analytical representations of pen tip strokes, often considered as the basic unit of handwriting. This paradigm has not only been experimentally confirmed in numerous predictive and physiologically significant tests but it has also been shown to be the ideal mathematical description of the impulse response of a neuromuscular system. This proof has led to postulate the LOGNORMALITY PRINCIPLE. In its simplest form, this fundamental premise states that the lognormality of the neuromuscular impulse responses is the result of an asymptotic convergence, a basic global feature reflecting the behaviour of individuals who are in perfect control of their movements. As a corollary, motor control learning in young children can be interpreted as a migration toward lognormality. For the larger part of their lives, healthy human adults take advantage of lognormality to control their movements. Finally, as aging and health issues intensify, a progressive departure from lognormality is occurring. To illustrate this principle, we present various software tools and psychophysical tests used to investigate the fine motor control of subjects, with respect to these ideal lognormal behaviors, from childhood to old age. In this latter case, we focus particularly on investigations dealing with brain strokes, Parkinson and Alzheimer diseases. We also show how lognormality can be exploited in many pattern recognition applications for automatic generation of gestures, signatures, words and script independent patterns as well as CAPTCHA production, graffiti generation, anthropomorphic robot control and even speech modelling. Among other things, this lecture aims at elaborating a theoretical background for many handwriting applications as well as providing some basic knowledge that could be integrated or taking care of in the development of new automatic pattern recognition systems to be used for e-Learning, e-Security and e-Health.

Biographical notes
Réjean Plamondon is a Full Professor in the department of Electrical Engineering at École Polytechnique de Montréal and Head of Laboratoire Scribens at this institution. Throughout his career, he has been involved in many pattern recognition projects, particularly in the field of on-line and off-line handwriting analysis and processing. His main contribution has been the development of a kinematic theory of rapid human movements which can take into account, with the help of lognormal functions, the major psychophysical phenomena reported in studies dealing with rapid movement control. The theory has been found successful in describing the basic kinematic properties of velocity profiles as observed in finger, hand, arm, head and eye movements. Professor Plamondon has studied and analyzed these bio-signals extensively in order to develop creative and powerful methods and systems in various domains of engineering, publishing more than 300 papers on these topics. He is a Fellow of the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS; 1989), of the International Association for Pattern Recognition (IAPR; 1994) and of the Institute of Electrical and Electronics Engineers (IEEE; 2000). He recently received the IAPR/ICDAR 2013 outstanding achievement award for “theoretical contributions to the understanding of human movement and its applications to signature verification, handwriting recognition, instruction, and health assessment, and for promoting on-line document processing in numerous multidisciplinary fields”.
               
 
Tue, 19 Jan 2016
11:00:00
Gareth Morlais
from Welsh Government, Cardiff, Wales
Talk place: Idiap Research Institute

How technology is opening up new potential for democracy, participation and collaboration

Abstract:
The barriers to production are being lowered so it's a good time to build platforms which make it as simple as possible for everyone to join in and help train and refine language technologies, share their stories and spread the word. Gareth draws on digital storytelling with the BBC, democratic activism via hyperlocal journalism and tools for citizenship to see if there's a new way to corral people's enthusiasm for languages to help build better, more relevant resources.
               
 
Mon, 14 Dec 2015
14:30:00
Dr. Baptiste Caramiaux
from Goldsmiths, University of London
Talk place: Idiap Research Institute

Probabilistic Models for Music Performance: Interaction, Creation, Cognition

Abstract:
Abstract
Music performance is an epitome of complex and creative motor skills. It is indeed striking that musicians can continuously show more physical virtuosity in playing their instrument and can show more creativity in varying their interpretation. Technology-mediated music performance has naturally explored the potential of interfaces and interactions for enhancing musical expression. It is however a difficult (and ill-posed) problem and musical interactive systems cannot yet challenge traditional instruments in terms of expressive control and skill learning.
I believe that an important aspect of the problem relies on the understanding of variability in the performer’s movements. I will start my talk by presenting the computational approach based on probabilistic models, particularly suited to handle uncertainty in motion data that stem from noise or intentional variations of the performers. I will then illustrate the potential of the approach in the design of expressive music interactions through experiments with proofs of concept developed and evaluated in the lab; as well as real world applications in artistic projects and in industrial products for consumer devices. Finally, I will present my upcoming EU-funded research project that takes a more theoretical perspective by examining how this approach could potentially be used to infer an understanding of the cognitive processes underlying sensorimotor learning in music performance.

Bio
Baptiste Caramiaux is a Marie Sklowodska Curie Research Fellow between McGill University (Montreal, Canada) and IRCAM (Paris, France). His current research focuses on the understanding of the cognitive processes of motor learning in musical performance and the computational modelling of these processes.  Before, he worked on gesture expressivity and the design of musical interactive systems through machine learning. He conducted academic research at Goldsmiths University of London, and applied part of his academic research works on industrial products at Mogees Ltd. Baptiste holds a PhD in computer science from University Pierre et Marie Curie in Paris, and IRCAM Centre Pompidou.
               
 
Thu, 3 Sep 2015
14:00:00
Prof. Frederic Fol Leymarie
from Goldsmiths, University of London
Talk place: Idiap Research Institute

Shape, Medialness and Applications

Abstract:
Summary:

I will present on-going research in my group with a focus on shape understanding with applications to computer vision, robotics and the creative industries. I will principally discuss our recent work on building an algorithmic chain exploiting models of shape derived from the cognitive science literature but relating closely to well-known approaches in computer vision and computational geometry: that of medial descriptors of shape.

Recent relevant publications:

[1] Point-based medialness for 2D shape description and identification
P. Aparajeya and F. F. Leymarie
Multimedia Tools and Applications, May 2015
http://link.springer.com/article/10.1007%2Fs11042-015-2605-6

[2] Portrait drawing by Paul the robot
P. Tresset and F. F. Leymarie
Computers & Graphics, April 2013
Special Section on Expressive Graphics
http://www.sciencedirect.com/science/article/pii/S0097849313000149


Short bio:

Frederic Fol Leymarie is a Professor of Computing at Goldsmiths, University of London since late 2004. Previously he was the co-founder of the SHAPE Lab. at Brown University (1999) and later its Lab manager (2002-4) while a postdoctoral fellow. He completed his PhD thesis at Brown in 2002 on the topic of 3D Shape Representation by Shock Scaffolds. This work was supported in part by two (US) NSF grants Frederic co-wrote and one IBM Doctoral Fellowship
(1999). Since joining Goldsmiths, Frederic has launched and directed the MSc Arts Computing (2004-7), as well as the MSc Computer Games Entertainment (since 2008) and the MA Computer Games Art and Design (starting in Sept. 2015), both of these in collaboration with Prof. William Latham. More details on his publication record and research and other interests and professional activities can be found on his LinkedIn profile via: www.folleymarie.com
               
 
Mon, 8 Jun 2015
16:00:00
Prof. Fausto Giunchiglia
from University of Trento, Italy
Talk place: Idiap Research Institute

Discoverying Life patterns

Abstract:
Abstract: The main goal of this proposal is to discover a person’s life patterns (e.g., where she goes, what she does, how she is and feels and whom she spends time with) namely those situations that repeat themselves, almost but not exactly identical, with regularity, and to exploit this knowledge for improving her quality of life.

The challenge is how to synchronize a sensor and data-driven representation of the world, which is noisy, imprecise and agnostic of the user needs with a knowledge level representation of the world which should be: (i) general, by allowing for the representation and integration of different combinations of sensors and interesting aspects of the user’s life and, (ii) adaptive, by representing life happenings at the desired level of abstraction, capturing their progress, and adapting to changes in the life dynamics.

The solution exploits three main components: (i) a methodology and mechanisms for an incremental evolution of a knowledge level representation of the world (e.g., ontologies), (ii) an extension of deep learning to take into account and adapt to the constraints coming from the knowledge level and (iii) a Question Answering (Q/A) service which allows the user to interact with the computer according to her needs and terminology.


BIO: Fausto Giunchiglia is a professor of computer science at the University of Trento, an ECCAI fellow, and a member of  Academia Europaea. 

Fausto’s current main interest is in providing a theory, algorithms and systems for handling of highly heterogeneous big data in highly dynamic and unpredictable environments. The issues he is mainly interested in are (in decreasing order of importance) variety, veracity and vulnerability. His focus is on three types of data: open government data, enterprise data and personal data. 

Fausto has covered all the spectrum from theory to technology transfer and innovation.  Some relevant roles: member of the Panel "Computer Science and Informatics" of the European Research Council (ERC), "ERC Advanced Grants" (2008 – present), Chair of the International Advisory board of the Scottish Informatics and Strategic Informatics and Computer Science Alliance (SICSA) of the 10 Scottish Universities. More than 40 invited talks in international events; chair of more than 10 international events; was/is editor or editorial board member of around 10 journals, among them: Journal of Autonomous Agents and Multi-agent Systems, Journal of applied non Classical Logics, Journal of Software Tools for Technology Transfer, Journal of Artificial Intelligence Research. He held the following roles in scientific organizations: member of the IJCAI Board of Trustees (01-11), President of IJCAI (05-07), President of KR, Inc. (02-04), Advisory Board member of  KR, Inc., Steering Committee of the  CONTEXT conference. Fausto has coordinated and participated in various EC projects; among them: coordination of the FP7 FET IP Smart Society and of the FP7 FET IP Living knowledge, local coordinator of the FP7 IP Cubrik, Open Knowledge, Knowledge Web.



               
 
Mon, 11 May 2015
11:00:00
Prof. Louis-Philippe Morency
from Language Technology Institute, Carnegie Mellon University
Talk place: Idiap Research Institute

Modeling Human Communication Dynamics

Abstract:
Abstract:

Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal cues from the social context. Today's computers and interactive devices are still lacking many of these human-like abilities to hold fluid and natural interactions. Leveraging recent advances in machine learning, audio-visual signal processing and computational linguistic, my research focuses on creating human-computer interaction (HCI) technologies able to analyze, recognize and predict human subtle communicative behaviors in social context. I formalize this new research endeavor with a Human Communication Dynamics framework, addressing four key computational challenges: behavioral dynamic, multimodal dynamic, interpersonal dynamic and societal dynamic. Central to this research effort is the introduction of new probabilistic models able to learn the temporal and fine-grained latent dependencies across behaviors, modalities and interlocutors. In this talk, I will present some of our recent achievements modeling multiple aspects of human communication dynamics, motivated by applications in healthcare (depression, PTSD, suicide, autism), education (learning analytics), business (negotiation, interpersonal skills) and social multimedia (opinion mining, social influence).


Bio:

Louis-Philippe Morency is Assistant Professor in the Language Technology Institute at the Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He received his Ph.D. and Master degrees from MIT Computer Science and Artificial Intelligence Laboratory. In 2008, Dr. Morency was selected as one of "AI's 10 to Watch" by IEEE Intelligent Systems. He has received 7 best paper awards in multiple ACM- and IEEE-sponsored conferences for his work on context-based gesture recognition, multimodal probabilistic fusion and computational models of human communication dynamics. For the past two years, Dr. Morency has been leading a DARPA-funded multi-institution effort called SimSensei which was recently named one of the year’s top ten most promising digital initiatives by the NetExplo Forum, in partnership with UNESCO.
               
 
Fri, 24 Apr 2015
11:00:00
Prof. Vincent Lepetit
from TU Graz, Austria
Talk place: Idiap Research Institute

Robust image feature extraction learning and object registration

Abstract:
Extracting image features such as feature points or edges is a critical step of many Computer Vision systems, however this is still performed with carefully handcrafted methods. In this talk, I will first present a new Machine Learning-based approach to detecting local image features, with application to contour detection in natural images, but also biomedical and aerial images, and to feature point extraction under drastic weather and lighting changes. I will then show that it is also possible to learn efficient object description based on low-level features for scalable 3D object detection.

Bio: 
Dr. Vincent Lepetit is a Professor at the Institute for Computer Graphics and Vision, TU Graz and a Visiting Professor at the Computer Vision Laboratory, EPFL. He received the PhD degree in Computer Vision in 2001 from the University of Nancy, France, after working in the ISA INRIA team. He then joined the Virtual Reality Lab at EPFL as a post-doctoral fellow and became a founding member of the Computer Vision Laboratory. He became a Professor at TU GRAZ in February 2014. His research interests include vision-based Augmented Reality, 3D camera tracking, Machine Learning, object recognition, and 3D reconstruction. He often serves as program committee member and area chair of major vision conferences (CVPR, ICCV, ECCV, ACCV, BMVC). He is an editor for the International Journal of Computer Vision (IJCV) and the Computer Vision and Image Understanding (CVIU) journal

http://www.icg.tugraz.at/Members/lepetit/vincent-lepetits-homepage
               
 
Thu, 19 Feb 2015
11:00:00
Prof. Henning Mueller
from HES-SO Sierre, Switzerland
Talk place: Idiap Research Institute

Medical visual information retrieval: techniques & evaluation

Abstract:
Medical imaging has enormously increased in importance and volume in medical institutions, particularly 3D tomographic imaging. Via digital analysis the knowledge stored in medical cases can be used for more than a single patient to help decision-making.

This presentation will highlight several challenges in medical image data processing starting with the VISCERAL EU project that evaluates segmentation, lesion detection and similar case retrieval on large amounts of medical 3D data using a cloud-based infrastructure for participants. The description of the MANY project highlights techniques for 3D texture analysis that can be used in a variety of contexts. Finally an overview of the radiology search system of the Khresmoi project will show a combination of the 3D data and the 3D analyses in a multi-modal environment.

Bio:
Henning Müller studied medical informatics at the University of Heidelberg, Germany, then worked at Daimler-Benz research in Portland, OR, USA. From 1998-2002 he worked on his PhD degree at the University of Geneva, Switzerland with a research stay at Monash University, Melbourne, Australia. Since 2002 Henning has been working for medical informatics at the University hospital of Geneva, where he habilitated in 2008 and was named titular professor in medicine in 2014. Since 2007 he has also been a full professor at the HES-SO Valais and since 2011 he is responsible for the eHealth unit of the school. Henning was coordinator of the Khresmoi EU project, is scientific coordinator of the VISCERAL EU project and initiator of the ImageCLEF benchmark. He has worked on several other EU projects that include the access to and the analysis of medical data. He has authored over 400 scientific papers and is in the editorial board of several journals. 
               
 
Thu, 5 Feb 2015
14:00:00
Prof. Yann Gousseau
from ENST Telecom Paris
Talk place: Idiap Research Institute

Video Inpainting of Complex Scenes

Abstract:
While image inpainting is  a relatively mature subject whose numerical results are often visually striking, the automatic filling-in of video is still prone to yield incoherent results in many situations. Moreover, the subject is impaired by strong computational bottlenecks. In this talk, we present a patch-based approach to inpaint videos, relying on a global, multi-scale optimization heuristic. Contrarily to previous approaches, the best patch candidates are selected using texture attributes, that are built within a multi-scale video representation. We show that this rationale prevents the usual wash-out of textured and cluttered parts of video. Combined with an appropriate nearest neighbor search and a simple stabilization-like procedure, the resulting approach is able to successfully and automatically inpaint complex situations, including high resolution sequences with dynamic textures and multiple moving objects.

Bio: Yann Gousseau received the engineering degree from the École Centrale de Paris, France, in 1995, and the Ph.D. degree in applied mathematics from the University of Paris-Dauphine in 2000. He is currently a professor at Telecom ParisTech. His research interests include the mathematical modeling of natural images and textures, mono and multi-image restoration, computational photography, stochastic geometry, image analysis, computer vision
and image processing.
               
 
Thu, 8 Jan 2015
11:00:00
Dr. Mary Ellen Foster
from Interaction Lab, Heriot-Watt University Edinburgh, UK
Talk place: Idiap Research Institute

Trainable Interaction Models for Embodied Conversational Agents

Abstract:
Human communication is inherently multimodal: when we communicate with one another, we use a wide variety of channels, including speech, facial expressions, body postures, and gestures. An embodied conversational agent (ECA) is an interactive character -- virtual or physically embodied -- with a human-like appearance, which uses its face and body to communicate in a natural way. Giving such an agent the ability to understand and produce natural, multimodal communicative behaviour will allow humans to interact with such agents as naturally and freely as they interact with one another, enabling the agents to be used in applications as diverse as service robots, manufacturing, personal companions, automated customer support, and therapy.

To develop an agent capable of such natural, multimodal communication, we must first record and analyse how humans communicate with one another.  Based on that analysis, we then develop models of human multimodal interaction and integrate those models into the reasoning process of an ECA.  Finally, the models are tested and validated through human-agent interactions in a range of contexts.

In this talk, I will give three examples where the above steps have been followed to create interaction models for ECAs. First, I will describe how human-like referring expressions improve user satisfaction with a collaborative robot; then I show how data-driven generation of facial displays affects interactions with an animated virtual agent; finally, I describe how trained classifiers can be used to estimate engagement for customers of a robot bartender.

Bio: 

Mary Ellen Foster is a Research Fellow in the Interaction Lab at the School of Mathematical and Computer Sciences at Heriot-Watt University in Edinburgh, Scotland. She received her Ph.D. in Informatics from the University of Edinburgh, and has previously worked in the Robotics and Embedded Systems Group at the Technical University of Munich and in the School of Informatics at the University of Edinburgh. Her research interests include embodied communication, natural language generation, and multimodal dialogue systems. In particular, she is interested in designing, implementing, and evaluating practical artificial systems that support embodied interaction with human users, such as embodied conversational agents and human-robot dialogue systems. She has worked on European and national projects including COMIC, JAST, ECHOES, JAMES, and EMOTE.
               
 
Fri, 17 Oct 2014
11:00:00
Prof. Christian Wolf
from LIRIS team, INSA Lyon, France
Talk place: Idiap Research Institute

Pose estimation and gesture recognition using structured deep learning

Abstract:
Abstract: In this talk I will address the problem of gesture recognition and pose estimation from videos, following two different strategies: 
(i) estimation of articulated pose (full body or hand pose) alleviates subsequent recognition steps in many conditions and allows smooth interaction modes and tight coupling between object and manipulator; 
(ii) in situations of low image quality (e.g. large distances between hand and camera), obtaining an articulated pose is hard. Training a deep model directly on video data can give excellent results in these situations.

We tackle both cases by training deep architectures capable of learning discriminative intermediate representations. The main goal is to integrate structural information into the model in order to decrease the dependency on large amounts of training data.To achieve this, we propose an approach for hand pose estimation that requires very little labelled data. It leverages both unlabeled data and synthetic data produced by a rendering pipeline. The key to making it work is to integrate structural information not into the model architecture, which would slow down inference, but into the training objective. We show that adding unlabeled real-world samples significantly improves results compared to a purely supervised setting.   

In the context of multi-modal gesture detection and recognition, we propose a deep recurrent architecture that iteratively learns and integrates discriminative data representations from individual channels (pose, video, audio), modeling complex cross-modality correlations and temporal dependencies. It is based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalities; and ii) gradual fusion of modalities from strongest to weakest cross-modality structure. 

We present experiments on the "ChaLearn 2014 Looking at People Challenge" gesture recognition track organized in conjunction with ECCV 2014, in which we placed 1st out of 17 teams. The objective of the challenge was to detect, localize and classify Italian conversational gestures from large database of 13858 gestures. The multimodal data included color video, range maps and a skeleton stream.

The talk will be preceded by a  brief introduction to the work done in my LIRIS team.
 
Site : http://liris.cnrs.fr/christian.wolf/research/gesturerec.html


Bio: Christian WOLF received his MSc in computer science from Vienna University of Technology in 2000, and the PhD in computer science from the National Institut of Applied Science(INSA de Lyon), France, in 2003. In 2012 he obtained the habilitation diploma, also from INSA de Lyon. From september 2004 to august 2005 he was assistant professor at the Louis Pasteur University, Strasbourg, and member of the Computer and Image Science and Remote Sensing Laboratory (LSIIT). Since september 2005 he is assistant professor at INSA de Lyon, and member of LIRIS a laboratory of the CNRS, where he is interested in computer vision and machine learning, espacially in structured models, deep learning, gesture and activity recognition and computer vision for robotics.

               
 
Tue, 1 Jul 2014
11:00:00
Dr. Gabrielle Vail
from New College of Florida
Talk place: Idiap Research Institute

Fitting Ancient Texts into Modern Technology: The Maya Hieroglyphic Codices Database Project

Abstract:
The Maya hieroglyphic codices provide a rich dataset concerning astronomical beliefs, divinatory practices, and the ritual life of prehispanic Maya cultures inhabiting the Yucatán Peninsula in the years leading up to the Spanish conquest in the early sixteenth century. Structurally, the codices are organized in terms of almanacs and astronomical tables, both of which incorporate several types of data—calendrical, iconographic, and textual—that together allowed Maya scribes to encode complex relationships among deities, dates having ritual and/or celestial significance, and associated activities. In order to better understand these relationships, the Maya Hieroglyphic Codices Database project was initiated to develop sophisticated online research tools to aid in analysis of these manuscripts. Because the Maya scribes did not live in a culture that demanded strict adherence to paradigms that we take for granted when organizing information for electronic search and retrieval, this posed a significant challenge in efforts to discover how data contained in ancient manuscripts could be converted into data structures that facilitated computer searching and information retrieval. This presentation discusses the approaches taken by the author and the architect of the database project to find compromises that enable computer analysis of a set of texts created by scribes more than half a millennium ago, while avoiding the biases inherent in translating knowledge across spatial and cultural divides. The presentation will be made by Dr. Vail; the technical architect to the project, William Giltinan, will be available to answer questions at the conclusion of the lecture. 

Presenter bio:
Gabrielle Vail specializes in the study of Maya hieroglyphic texts, with an emphasis on prehispanic Maya ritual and religion as documented in screenfold manuscripts painted in the fourteenth and fifteenth centuries. Her research is highlighted in numerous print and online publications, as well as the online Maya Codices Database (www.mayacodices.org), a collaborative project undertaken with funding from the National Endowment for the Humanities. Dr. Vail has published ten books and edited journals, most recently Códice de Madrid (Universidad Mesoamericana, 2013) and Re-Creating Primordial Time: Foundation Rituals and Mythology in the Postclassic Maya Codices (University Press of Colorado, 2013; with Christine Hernández). Dr. Vail received her Ph.D. from Tulane University in 1996 and holds a research and faculty position at New College of Florida in Sarasota, where she teaches courses on a variety of subjects, including the decipherment of Maya hieroglyphic texts and the astronomy of prehispanic cultures of the Americas. 


Technical architect:
William Giltinan earned his bachelor’s degree in computer science from New College of Florida and a master’s degree in computer science and engineering from the University of Michigan. Following this, he spent more than a decade as a software engineer and entrepreneur in technology-driven enterprises. In 1992, he assumed the role of technical architect of the Maya Hieroglyphic Codices Database project and has continued in this capacity through the present. Mr. Giltinan returned to academia in 2003 to earn his Juris Doctorate and later his Master of Law degree in intellectual property law from the George Washington University Law School. He is a practicing intellectual property attorney and teaches patent law as an adjunct professor. 
               
 
Wed, 11 Jun 2014
11:00:00
Prof. Richard Bowden
from University of Surrey
Talk place: Idiap Research Institute

Recognising people, motion and actions in video

Abstract:
Abstract:
Learning to recognise the motion or actions of people in video has wide applications covering topic from sign or gesture recognition through to surveillance and HCI. This talk will discuss approaches to video mining, allowing the discovery of weakly supervised spatiotemporal signatures such as actions embedded in video or Signs/facial motion weakly supervised by language. Whether the task is recognising an atomic action of an individual or their implied activity, the continuous multichannel nature of sign language recognition or the appearance of words on the lips, all approaches can be categorised at the most basic level as the learning and recognition of spatio-temporal patterns. However, in all cases, inaccuracies in labelling and the curse of dimensionality lead us to explore new learning approaches that can operate in a weakly supervised setting. This talk will discuss the adaptation of mining to the video domain and new approaches to learning spatiotemporal signatures covering a broad range of application areas such as facial feature extraction and regression, lip reading, activity recognition and sign and gesture recognition in both 2D and 3D.

Bio:
Prof Richard Bowden received a BSc degree in computer science from the University of London in 1993, an MSc degree with distinction from the University of Leeds in 1995, and a PhD degree in computer vision from Brunel University. He is currently Professor of computer vision and machine learning at the University of Surrey, United Kingdom, where he leads the Cognitive Vision Group within the Centre for Vision Speech and Signal Processing and was recently awarded a Royal Society Leverhulme Trust Senior Research Fellowship. He was a visiting research fellow at the University of Oxford 2001-2004 working with Profs Zisserman and Brady. His research focuses on the use of computer vision to locate, track, and understand humans with specific examples in Sign and Gesture recognition, Activity and Action recognition, lip-reading and facial feature tracking. His research into tracking and artificial life received worldwide media coverage, appearing at the British Science Museum and the Minnesota Science Museum. He has published more than 140 peer reviewed papers and has served as either program committee member or area chair for ICCV, CVPR and ECCV in addition to numerous international workshops and conferences. He was general chair for BMVC2012, track chair for ICPR2012 and is associate editor for the journal Image and Vision Computing and IEEE Pattern Analysis and Machine Learning. He was a member of the British Machine Vision Association (BMVA) executive committee and a company director for seven years. He is a member of the BMVA, a fellow of the Higher Education Academy, and a senior member of the IEEE. He has held over 20 research grants worth in excess of £5M and supervised over fifteen PhD students. His research has been recognised by prizes, plenary talks & media/press coverage including the Sullivan thesis prize in 2000 and many best paper awards. 
               
 
Wed, 21 May 2014
15:00:00
Prof. Ricardo Baeza-Yates
from Yahoo! Labs
Talk place: Idiap Research Institute

The Web: Wisdom of Crowds or Wisdom of a Few?

Abstract:
Abstract:

The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as more than two billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. But how is the Web? What are the activities of people? How content is generated? Web data mining is the main approach to answer these questions. Web data comes in three main flavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or log mining. Each case reflects the wisdom of some group of people that can be used to make the Web better. For example, user generated tags in Web 2.0 sites. In this presentation we explore the wisdom of crowds in relation to several dimensions such as bias, privacy, scalability, and spam. We also cover related concepts such as the long tail of the special interests of people, or the digital desert, content that nobody sees. 


Biography:

Ricardo Baeza-Yates is VP of Yahoo! Labs for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electrical engineering degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.
               
 
Tue, 15 Apr 2014
11:00:00
Prof. Mohamed Chetouani
from University Pierre and Marie Curie-Paris 6
Talk place: Idiap Research Institute

Interpersonal synchrony: social signal processing and social robotics for revealing social signatures

Abstract:
Social signal processing is an emerging research domain with rich and open fundamental and applied challenges. In this talk, I’ll focus on the development of social signal processing techniques for real applications in the field of psycho-pathology. I’ll overview recent research and investigation methods allowing neuroscience, psychology and developmental science to move from isolated individuals paradigms to interactive contexts by jointly analyzing behaviors and social signals of partners. From the concept of interpersonal synchrony, we’ll show how to address the complex problem of evaluating children with pervasive developmental disorders. These techniques are also demonstrated in the context of human-robot interaction by a new way of using robots in autism (moving from assistive devices to clinical investigations tools). I will finish by closing the loop between behaviors and physiological states by presenting new results on oxytocin and proxemics during early parent-infant interactions.


Prof. Mohamed Chetouani is the head of the IMI2S (Interaction, Multimodal Integration and Social Signal) research group at the Institute for Intelligent Systems and Robotics (CNRS UMR 7222), University Pierre and Marie Curie-Paris 6. He received the M.S. degree in Robotics and Intelligent Systems from the UPMC, Paris, 2001. He received the PhD degree in Speech Signal Processing from the same university in 2004. In 2005, he was an invited Visiting Research Fellow at the Department of Computer Science and Mathematics of the University of Stirling (UK). Prof. Chetouani was also an invited researcher at the Signal Processing Group of Escola Universitaria Politecnica de Mataro, Barcelona (Spain). He is currently a Full Professor in Signal Processing, Pattern Recognition and Machine Learning at the UPMC. His research activities, carried out at the Institute for Intelligent Systems and Robotics, cover the areas of social signal processing and personal robotics through non-linear signal processing, feature extraction, pattern classification and machine learning. He is the head of the interdisciplinary research group IMI2S (Interaction, Multimodal Integration and Social Signal) gathering researchers from social signal processing, social robotics, psycho-pathology and neuroscience. This group develops models and methods for the analysis, recognition and prediction of social signals, behaviors with a life-span perspective with a particular attention to disorders (autism, Alzheimer). He has published numerous research papers including some in high impact journals (Plos One,  Biology Letters, Pattern Recognition, IEEE Transactions on Audio, Speech and Language Processing). He is also the co-chairman of the French Working Group on Human-Robots/Systems Interaction (GDR Robotique CNRS) and a Deputy Coordinator of the Topic Group on Natural Interaction with Social Robots (euRobotics).
               
 
Fri, 14 Feb 2014
11:00:00
Dr. Ivan Laptev
from INRIA, Paris
Talk place: Idiap Research Institute

Recent trends and future challenges in action recognition

Abstract:
This talk will overview recent progress and open challenges in human action recognition. Specifically, I will focus on the three problems of (i) action representation in video, (ii) weakly-supervised action learning and (iii) ambiguity of action vocabulary. To the first problem, I will overview local feature methods providing state-of-the-art results on current action recognition benchmarks. Motivated by the difficulty of large-scale video annotation, I will next present our recent work on weakly-supervised action learning from video and corresponding video scripts. I will finish by highlighting limitations of the standard action classification paradigm and will show some of our work addressing this problem. 

Short bio:
Ivan Laptev is a research director at INRIA Paris-Rocquencourt, France. He received his PhD degree in Computer Science from the Royal Institute of Technology (KTH) in 2004 and a Master of Science degree from the same institute in 1997. He was a research assistant at the Technical University of Munich (TUM) during 1998-1999. He has joined INRIA as a postdoc in 2004 and became a full-time INRIA researcher in 2005. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 50 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of International Journal of Computer Vision and Image and Vision Computing Journal, he was/is an area chair for CVPR 2010, ICCV 2011, ECCV 2012, CVPR 2013 and ECCV 2014, he has co-organized several workshops and tutorials on human action recognition at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013). Ivan was awarded ERC Starting Grant in 2012. 
               
 
Tue, 21 Jan 2014
11:00:00
Prof. Marc Langheinrich
from Università della Svizzera italiana (USI)
Talk place: Idiap Research Institute

Privacy & Trust Challenges in Open Public Display Networks

Abstract:
Future public displays have the potential to become much more than a simple digital signage -- they can form the basis for a novel communication medium. By interconnecting displays and opening them up to applications and content from a wide range of sources, they can not only support individuals and their communities, but also increase their relevance and ultimately their economic benefits. Ultimately, open display networks could have the same impact on society as radio, television and the Internet. In this talk, I will briefly summarize this vision and its related challenges, in particular with respect to privacy and trust, and present the work that we did in this area in the context of a recently finished FET-Open project titled "PD-Net". 

Bio:
Marc Langheinrich is an Associate Professor at the Università della Svizzera italiana (USI) in Lugano, Switzerland. Marc received his PhD (Dr. sc. ETH) on the topic of "Privacy in Ubiquitous Computing" from the ETH Zurich, Switzerland, in 2005. He has published extensively on both privacy and usability of ubiquitous and pervasive computing systems, and is a regular program committee member of various conferences and workshops in the areas of pervasive computing, security and privacy, and usability. Marc currently serves on the editorial board of IEEE Pervasive Computing Magazine and Elsevier's "Personal and Mobile Communications" Journal, and is a Steering Committee member of the UbiComp and IoT conference series.
               
 
Thu, 31 Oct 2013
11:00:00
Prof. Francis Quek
from Texas A&M University
Talk place: Idiap Research Institute

Interacting with the Embodied Mind

Abstract:
Humans do not think like computers. Our minds are ‘designed’ for us to function as embodied beings in the world in ways that are: 1. Physical-Spatial; 2. Temporal-Dynamic; 3 Social-Cultural; and 4. Affective-Emotional. These aspects of embodiment give us four lenses to understand the embodied mind and how computation/technology may support its function. I adopt a two-pronged to human-computer interaction research by first harnessing technological means to contribute to the understanding of how embodiment ultimately ascends into mind, and second, to inform the design and engineering of technologies that support and augment human higher psychological functions of learning, sensemaking, creating, and experiencing.

In line with the first approach, I shall first show how language, as a core human capacity, is rooted in human embodied function. We will see that mental imagery shapes multimodal (gesture, gaze, and speech) human discourse. In line with the second approach, I shall then present an assemblage of interactive projects that illustrate how our concept of human embodiment can inform technology design through the light of our four lenses. Projects cluster around three application domains, namely 1. Technology for special populations (e.g. mathematics instruction and reading for the blind, games for older adults); 2. Learning and Education (e.g. learning and knowledge discovery through device/display ecologies, creativity support for children); and 3. Experience (e.g. socially-based information access, experience of images, affective communication).

------

Francis Quek is a currently Professor of Visualization and TAMU Chancellor’s Research Initiative hire at Texas A&M University. He has formerly been Professor of Computer, Director of the Center for Human-Computer Interaction, and Director of Vision Interfaces and Systems Laboratory at Virginia Tech. He has previously been affiliated with Wright State University, the University of Illinois at Chicago, the University of Michigan, and Hewlett-Packard. Francis received both his B.S.E. summa cum laude (1984) and M.S.E. (1984) in electrical engineering from the University of Michigan.  He completed his Ph.D. in Computer Science at the same university in 1990. Francis is a member of the IEEE and ACM.

He performs research in embodied interaction, embodied learning and sensemaking, interactive systems for special populations (individuals who are blind, children, older adults), systems to support learning and creativity in children, multimodal verbal/non-verbal interaction, multimodal meeting analysis, vision-based interaction, multimedia databases, medical imaging, assistive technology for the blind, human computer interaction, computer vision, and computer graphics.   He has published over 150 peer-reviewed journal and conference articles in human-computer interaction, computer vision, and medical imaging.  

               
 
Thu, 19 Sep 2013
15:00:00
Nuria Oliver
from Telefonica Research, Barcelona, Spain
Talk place: Idiap Research Institute

The power of the cellphone: small devices for big impact

Abstract:
There are almost as many mobile phones in the world as humans. The mobile phone is the piece of technology with the highest levels of adoption in human history. We carry them with us all through the day (and night, in many cases). Therefore, mobile phones have become sensors of human activity in the large scale and also the most personal devices.

In my talk, I will present some of the work that we are doing at Telefonica Research in the area of mobile computing, both in terms of analyzing and understanding large-scale human behavioral data from mobile traces and in designing novel mobile systems in the areas of healthcare, education and information access.
               
 
Tue, 3 Sep 2013
14:00:00
Prof. Anil K. Jain
from Michigan State University
Talk place: Idiap Research Institute

Biometric Recognition: Sketch to photo matching, Tattoo Matching and Fingerprint Obfuscation

Abstract:
http://biometrics.cse.msu.edu
http://scholar.google.com/citations?user=g-_ZXGsAAAAJ&hl=en

If you are like many people, navigating the complexities of everyday life depends on an array of cards and passwords that confirm your identity. But lose a card, and your ATM will refuse to give you money. Forget a password, and your own computer may balk at your command. Allow your card or passwords to fall into the wrong hands, and what were intended to be security measures can become the tools of fraud or identity theft. Biometrics - the automated recognition of people via distinctive anatomical and behavioral traits has the potential to overcome many of these problems. 

Biometrics is not a new idea. Pioneering work by several British scholars, including Fauld, Galton and Henry in the late 19th century established that fingerprints exhibit a unique pattern that persists over time. This set the stage for the development of Automatic Fingerprint Identification Systems that are now used by law enforcement agencies worldwide. The success of fingerprints in law enforcement coupled with growing concerns related to homeland security, financial fraud and identity theft has generated renewed interest in research and development of biometric systems. It is, therefore, not surprising to see biometrics permeating our society (laptops and mobile phones, border crossing, civil registration, and access to secure facilities). Despite these successful deployments, biometrics is not a panacea for human recognition. There are challenges related to data acquisition, image quality, robust matching, multibiometrics, biometric system security and user privacy. This talk will introduce three challenging problems of particular interest to law enforcement and border crossing agencies: (i) face sketch to photo matching, (ii) scars, marks & tattoos (SMT) and (iii) fingerprint obfuscation. 

Short bio
Anil K. Jain is a University Distinguished Professor in the Department of Computer Science at Michigan State University where he conducts research in pattern recognition, computer vision and biometrics. He has received Guggenheim fellowship, Humboldt Research award, Fulbright fellowship, IEEE Computer Society Technical Achievement award, W. Wallace McDowell award, IAPR King-Sun Fu Prize, and ICDM Research Award for contributions to pattern recognition and biometrics. He served as the Editor-in-Chief of the IEEE Trans. Pattern Analysis and Machine Intelligence and is a Fellow of ACM, IEEE, AAAS, IAPR and SPIE. Holder of eight patents in biometrics, he is the author of several books. ISI has designated him as a highly cited author. He served as a member of the National Academies panels on Information Technology, Whither Biometrics and Improvised Explosive Devices (IED). He also served as a member of the Defense Science Board. 

His H-index is 137 (Source: Google Scholar). 
               
 
Thu, 29 Aug 2013
11:00:00
Dr. Fernando De la Torre
from Robotics Institute, CMU
Talk place: Idiap Research Institute

Component Analysis for Human Sensing

Abstract:
Enabling computers to understand human behavior has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human computer interaction, and social robotics. A critical element in the design of any behavioral sensing system is to find a good representation of the data for encoding, segmenting, classifying and predicting subtle human behavior. In this talk I will propose several extensions of Component Analysis (CA) techniques (e.g., kernel principal component analysis, support vector machines, spectral clustering) that are able to learn spatio-temporal representations or components useful in many human sensing tasks.

In the first part of the talk I will give an overview of several ongoing projects in the CMU Human Sensing Laboratory, including our current work on depression assessment from videos. In the second part, I will show how several extensions of CA methods outperform state-of-the-art algorithms in problems such as facial feature detection and tracking, temporal clustering of human behavior, early detection of activities, weakly-supervised visual labeling, and robust classification. The talk will be adaptive, and I will discuss the topics of major interest to the audience.

Biography:

Fernando De la Torre received his B.Sc. degree in Telecommunications (1994), M.Sc. (1996), and Ph. D. (2002) degrees in Electronic Engineering from La Salle School of Engineering in Ramon Llull University, Barcelona, Spain. In 2003 he joined the Robotics Institute at Carnegie Mellon University, and since 2010 he has been a Research Associate Professor. Dr. De la Torre's research interests include computer vision and machine learning, in particular face analysis, optimization and component analysis methods, and its applications to human sensing. He is Associate Editor at IEEE PAMI and leads the Component Analysis Laboratory (http://ca.cs.cmu.edu) and the Human Sensing Laboratory (http://humansensing.cs.cmu.edu).