Idiap Speaker Series and public talks

The Idiap Speaker Series are lectures by leading researchers in human and media computing and related areas. The invited speakers will visit our Institute, discuss new trends in academia and industry, and interact with our community. The talks are open to the public. The Series is currently coordinated by Jean-Marc Odobez.
Idiap also proposes Public talks from researcher visiting the Institute in different contexts like project meetings or collaborative work (e.g. meetings, stay at Idiap).

 

NEXT TALK

No upcoming talks yet




PAST TALKS

Can Power-sharing Foster Peace? Evidence From Northern Ireland


Idiap Speaker Series
Date/time:

Nov 22, 2017 11:00 AM

Prof Dominic Rohner  

Abstract

In the absence of power-sharing, minority groups in opposition have powerful incentives to substitute the ballot with the bullet. In contrast, when power is shared among all major groups in society, the relative gains of sticking to electoral politics are larger for minority groups. After making the theoretical argument, we provide in the current paper an empirical analysis of the impact of power-sharing at the local level, making use of fine-grained data from Northern Ireland's 26 local district councils over the 1973-2001 period. We find that power-sharing has a sizable and robust conflict-reducing impact.


Biography

Holding a PhD in Economics from the University of Cambridge, Dominic Rohner is a Professor of Economics at the University of Lausanne. He is among others an Associate Editor of the Journal of the European Economic Association, and a Research Fellow of CEPR, CESifo, OxCarre and HiCN. His research focuses on political and development economics and has won several prizes, such as for example the KfW Development Bank Excellence Award or the SNIS International Geneva Award. He currently hold a Starting Grant of the European Research Council (ERC) investigating “Policies for Peace”. He has published and forthcoming papers in several leading international journals, including, among others: American Economic Review, Econometrica, Journal of Political Economy, Quarterly Journal of Economics, and Review of Economic Studies.

Algorithms on manifolds: geometric means and recommender systems


Idiap Speaker Series
Date/time:

Sep 06, 2017 11:00 AM

Prof Bart Vandereycken  

Abstract

Many data in scientific computing and machine learning is highly structured. When this structure is given as a mathematically smooth manifold, it is usually advisable to explcilty exploit this property in theoretical analyses and numerical algorithms. I will illustrate this using two examples. In the first, the manifold is classical: the set of symmetric and positive definite matrices. The problem we consider is the computation of the geometric mean, also called Karcher mean, which is a generalization of the arithmetic mean where we explicitly take into account that the data lives on a manifold. The application is denoising or interpolation of covariance matrices. The other example considers a non-standard manifold: the set of matrices of fixed rank. The application is now recommender systems (the Netflix problem) and the algorithm low-rank matrix completion. I will show that one of the benefits of the manifold approach is that the generalisation to low-rank tensor completion is conceptually straightforward but also computationally efficient.


Biography

Bart Vandereycken is an assistant professor in the numerical analysis group at the mathematics department of the university of Geneva. Prior to joining the university of Geneva, he was an instructor of mathematics at Princeton University and a post doc at EPF Lausanne and ETH Zurich. He obtained his PhD at KU Leuven in December 2010. He was awarded the Alston S. Householder award for best PhD thesis in numerical linear algebra. For his research on Riemannian optimization for low-rank matrix equations, he received a Leslie Fox Prize in 2011 and a SIAM Outstanding Paper prize in 2012. His research is on large-scale and high-dimensional problems that are solved numerically using low-rank matrix and tensor techniques. Examples of such problems are the electronic Schrödinger equation, parametric partial differential equations, and low-rank matrix completion. In his work, he tends to focus on practical algorithms that can be formulated on Riemannian matrix manifolds and use techniques from numerical linear algebra. His other research interests include pseudospectra, matrix means, model-order reduction, and multilevel preconditioning.

Computational methods for fluorescence microscopy and quantitative bioimaging


Idiap Speaker Series
Date/time:

Aug 30, 2017 02:00 PM

Dr. Charles Kervrann   Senior Researcher

Abstract

During the past two decades, biological imaging has undergone a revolution in the development of new microscopy techniques that allow visualization of tissues, cells, proteins and macromolecular structures at all levels of resolution. Thanks to recent advances in optics, digital sensors and labeling probes, one can now visualize sub-cellular components and organelles at the scale of a few dozens nanometers to several hundreds of nanometers. As a result, fluorescent microscopy and multimodal imaging has become the workhorse of modern biology. All these technological advances in microscopy, created new challenges for researchers in quantitative image processing and analysis. Therefore, dedicated efforts are necessary to develop and integrate cutting-edge approaches in image processing and optical technologies to push the limits of the instrumentation and to analyze the large amount of data being produced.

In this talk, we present image processing methods, mathematical models, and algorithms to build an integrated imaging approach that bridges the resolution gaps between the molecule and the whole cell, in space and time. The presented methods are dedicated to the analysis of proteins in motion inside the cell, with a special focus on Rab protein trafficking observed in time-lapse confocal microscopy or total internal reflection fluorescence microscopy. Nevertheless, the proposed image processing methods and algorithms are flexible in most cases, with a minimal number of control parameters to be tuned. They can be applied to a large range of problems in cell imaging and can be integrated in generic image-based workflows, including for high content screening applications.


Biography

Charles Kervrann received the M.Sc. (1992), the PhD (1995) and the HDR (2010) in Signal Processing and Telecommunications from the University of Rennes 1, France. From 1997 to 2010, he was researcher at the INRA Applied Mathematics and Informatics Department (1997-2003) and he joined the VISTA Inria research group in 2003 (Rennes, France). In 2010, he was appointed to the rank of Research Director, Inria Research Centre in Rennes. His is currently the head of the Serpico (Space-timE RePresentation, Imaging and cellular dynamics of molecular COmplexes) research group. His work focuses on image sequence analysis, motion estimation, object detection, noise modeling for microscopy and protein trafficking and dynamics modeling in cell biology. He is member of the editorial board of IEEE Signal Processing Letters, member of the IEEE BISP (Bio Imaging and Signal Processing) technical committee and co-head of the IPDM-BioImage Informatics node of the french national infrastructure France-BioImaging.

Multilingual speech recognition in under-resourced environments


Idiap Speaker Series
Date/time:

Jun 02, 2017 11:00 AM

Prof. Marelie Davel  

Abstract

When speech processing systems are designed for use in multilingual environments, additional complexity is introduced. Identifying when language switching has occurred, predicting how cross-lingual terms will be pronounced, obtaining sufficient speech data from diverse language backgrounds: such factors all complicate the development of practical speech-oriented systems. In this talk, I will discuss our research group's experience in building speech recognition systems for the South African environment, one in which 11 official languages are recognised. I will also show how this relates to our participation in the BABEL project, a recent 5-year international collaborative project aimed at solving the spoken term detection task in under-resourced languages.


Biography

Marelie Davel is a research professor at North-West University, South Africa, and the director of the Multilingual Speech Technologies (MuST) research group. She has a specific interest in multilingual speech technology development in under-resourced environments and the data-driven modelling of human speech and language. She received her BSc degree (Computer Science & Mathematics) from Stellenbosch University, her MSc from University of London, and her PhD (Electronic Engineering, 2005) from the University of Pretoria. She joined the South African CSIR in 1995 as an electronic engineer, later becoming a principal researcher and the research group leader of the Human Language Technologies (HLT) research group at the same institution. In 2002 she spent a year as a visiting scholar at Carnegie Mellon University’s Robust Speech group. She joined MuST in 2011 and became the group’s director in 2014. Recent MuST projects include the development of multilingual resources for Google, pronunciation modelling for the BABEL project, and the development of an automatic speech transcription platform for the South African government. She has published approx. 90 papers related to speech and language processing.

Charisma: Measurement and outcomes


Idiap Speaker Series
Date/time:

May 11, 2017 11:00 AM

Prof. John Antonakis  

Abstract

Charisma has been devilishly difficult to measure; there has also been a dearth of studies estimating the causal impact of charisma on outcomes. In this seminar I will use a new definition of charisma to demonstrate how it can be manipulated, and will also show the economic impact of charisma on worker productivity. Moreover, I will discuss how charisma can be coded from archival data, and demonstrate its utility for predicting a range of outcomes including winning the U.S. presidential election, the amount of views on TED talks, and retweets of tweets.


Biography

John Antonakis is of Swiss, Greek, and South-African nationality. He is Professor of Organizational Behavior, and Director of the Ph.D. Program in Management in the Faculty of Business and Economics of the University of Lausanne, Switzerland. He received his Ph.D. from Walden University in Applied Management and Decision Sciences specializing in the psychometrics of leadership. He was a postdoctoral fellow in the Department of Psychology at Yale University focusing on leader development and expertise. Professor Antonakis’ research is currently focused on charisma, predictors of leadership, and research methods. Professor Antonakis is Editor in Chief of The Leadership Quarterly. He has previous served as associate editor for The Leadership Quarterly and Organizational Research Methods, and is on the boards of several top academic journals including the Academy of Management Review and the Journal of Management. He is a fellow of the Society of Industrial and Organizational Psychology as well as the Association for Psychological Science. He has published in prestigious academic journals such as Science, Psychological Science, Academy of Management Journal, Intelligence, The Leadership Quarterly, Journal of Operations Management, Journal of Management, Harvard Business Review, Academy of Management Learning and Education, Organizational Research Methods, among others. He has also published two books: The Nature of Leadership (two editions), and Being There Even When You Are Not: Leading Through Strategy, Structures, and Systems. He has been awarded or directed research funds totaling over Sfr. 2.3 million (about $2.45 million). He frequently consults—and provides talks, trainings, and workshops—to organizations on leadership and human resources issues. His clients regularly include organizations in various business sectors including banks, manufacturing, high-tech, consulting, and finance as well as government organizations, NGOs, and athletics organizations. His research is regularly quoted in the international media and has been showcased on political and science-based TV shows. He engages a general audience in many science-based videos; for an example, refer to his TEDx talk on charisma: https://youtu.be/SEDvD1IICfE

Domain Adaptation for Visual Recognition: From Shallow to Deep


Public
Date/time:

Apr 24, 2017 11:00 AM

Mathieu Salzmann  

Abstract

In this talk, I will present our work on Domain Adaptation, which tackles scenarios where the training (source) and test (target) data have been acquired in different conditions. To address this, we have introduced learning algorithms that attempt to make the distributions of the source and target data as similar as possible. In particular, I will present a (shallow) transformation learning method, and discuss different measures that can be used to compare the source and target distributions. I will then turn to a Deep Learning approach, in which I will show that allowing the weights of the network to differ between the source and target samples yields better accuracy. I will show results on standard image recognition benchmarks, as well as on the task of leveraging synthetic data to train a classifier for real images.


Deep Learning for Speech Processing: An NST Perspective


Idiap Speaker Series
Date/time:

Sep 27, 2016 11:00 AM

Prof. Mark Gales  

Abstract

The Natural Speech Technology EPSRC Programme Grant was a 5 year collaboration between Edinburgh,Cambridge and Sheffield Universities, with the aim of improving core speech recognition and synthesis technology. During the lifetime of the project dramatic changes took place in the underlying technology for speech processing with the introduction of deep learning. This has yielded significant performance improvements, as well as offering a very rich space of model to investigate. This talk discusses the general area of deep learning for speech processing, with a particular emphasis on sequence-to-sequence models: in speech recognition, waveform to text; and in synthesis, text to waveform. Both generative and discriminative models for sequence-to-sequence models are described along with variants on the standard topologies and the implications for both training and inference. Rather than focusing on results for particular models, the talk aims to describe the connections and differences between sequence-to-sequence models and the underlying assumptions for these models.


Biography

Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He was appointed Reader in Information Engineering in 2004. He is currently a Professor of Information Engineering and a College Lecturer and Official Fellow of Emmanuel College. Mark Gales is a Fellow of the IEEE, a Senior Area Editor of IEEE/ACM Transactions on Audio Speech and Language Processing for speech recognition and synthesis, and a member of the Speech and Language Processing Technical Committee (2015-2017, previously a member from 2001-2004). He was an associate editor for IEEE Signal Processing Letters from 2008-2011 and IEEE Transactions on Audio Speech and Language Processing from 2009-2013. He is currently on the Editorial Board of Computer Speech and Language. Mark Gales has been awarded a number of paper awards, including a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.

TUTORIAL - Tutorial on Regression


Idiap Speaker Series
Date/time:

Jul 15, 2016 03:00 PM

Dr. Freek Stulp  

Abstract

Tutorial on Regression based on the article:

Freek Stulp and Olivier Sigaud (2015). Many Regression Algorithms, One Unified Model - A Review. Neural Networks, 69:60-79.

Link: http://freekstulp.net/publications/pdfs/stulp15many.pdf


Biography

http://freekstulp.net/#Bio Dr. Freek Stulp's research focuses on using machine learning and artificial intelligence to improve the robustness and adaptivity of planning and control for autonomous robots. One of his main research themes is enabling robots to autonomously acquire and refine skills through imitation and reinforcement learning. He received his doctorate degree in Computer Science from the Technische Universität München in 2007. He was awarded post-doctoral research fellowships from the Japanese Society for the Promotion of Science and the German Research Foundation (DFG), to pursue his research at the Advanced Telecommunications Research Institute International (Kyoto) and the University of Southern California (Los Angeles). From 2011 to 2015 he was an assistant professor at the École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech). Since March 2016 he is the head of the new department of cognitive robotics at the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany.

TALK - Robot Skill Learning: From Reinforcement Learning to Evolution Strategies


Idiap Speaker Series
Date/time:

Jul 15, 2016 11:00 AM

Dr. Freek Stulp  

Abstract

A popular approach to robot skill learning is to initialize a skill through imitation learning, and to then refine and improve the skill through reinforcement learning. In this presentation, I highlight three contributions to this approach:

1) Enabling skills to adapt to task variations by using multiple demonstrations for imitation learning,

2) Improving skills through reinforcement learning based on reward-weighted averaging and black-box optimization with evolution strategies.

3) Using covariance matrix adaptation to automatically tune exploration during reinforcement learning.

Throughout the presentation I show several applications to challenging manipulation tasks on several humanoid robots.


Biography

http://freekstulp.net/#Bio Dr. Freek Stulp's research focuses on using machine learning and artificial intelligence to improve the robustness and adaptivity of planning and control for autonomous robots. One of his main research themes is enabling robots to autonomously acquire and refine skills through imitation and reinforcement learning. He received his doctorate degree in Computer Science from the Technische Universität München in 2007. He was awarded post-doctoral research fellowships from the Japanese Society for the Promotion of Science and the German Research Foundation (DFG), to pursue his research at the Advanced Telecommunications Research Institute International (Kyoto) and the University of Southern California (Los Angeles). From 2011 to 2015 he was an assistant professor at the École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech). Since March 2016 he is the head of the new department of cognitive robotics at the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany.

Eliciting and recognising complex emotions and mental states including engagement and boredom


Idiap Speaker Series
Date/time:

Jul 07, 2016 02:00 PM

Harry Witchel* & Carina Westling#  

Abstract

Complex emotions are any emotional state except for Ekman's 6 basic emotions: happy, sad, fear, anger, surprise and disgust. Complex emotions can include mixtures of the basic emotions (e.g. horror), emotions outside the basic emotions (e.g. musical "tension"), and emotions mixed with mental states that are not emotions (e.g. engagement and boredom). Eliciting and recognising complex emotions, and allowing systems to respond to them, will be useful for eLearning, human factors (including vigilance), and responsive systems including human-robot interaction.

In this talk we will present our work towards the elicitation and recognition of conscious or subconscious responses. Engineering and psychological solutions to non-invasively determine such mental states and complex emotions may use movement, posture, facial expression, physiology, and sound. Furthermore, our team has shown that what people suppress is as revealing as what they do. We consider aspects of music listening, movie watching, game playing, quiz-taking, reading, and walking to untangle the complex emotions that can arise. The mental states of engagement and boredom are considered in relation to fidgeting and to Non-Instrumental Movement Inhibition (NIMI), in order to clarify fundamental research problems and direct research design toward improved solutions.


Biography

In 2016 Harry Witchel and Carina Westling published their ninth inter-disciplinary paper together, on Non-Instrumental Movement Inhibition. It received significant international media attention, including an article about it in Scientific American. Harry Witchel is Discipline Leader in Physiology at Brighton and Sussex Medical School at the University of Sussex. His research interests are: Nonverbal Behaviour; Motion Capture; Gait in Multiple Sclerosis; Soundscape; Engagement; Psychobiology. His laboratory uses wearable sensors, motion capture and time series analysis to determine the cognitive and behavioural correlates of engagement and disengagement in response to different psychologically relevant stimuli, especially music. He has performed experiments for many consultancy clients, including Honda, Nike, DHL and Tesco. He also has an international track record of promoting public engagement with science including appearances on the Discovery Channel, BBC World Service Radio, and the Financial Times. In 2004 he was awarded the national honour of the Charles Darwin Award lecture by the British Science Association. In 2011 his book on music was published: “You Are What You Hear: How Music and Territory Change Who We Are” (Algora, New York). Carina Westling researches live and mediated interaction design, and worked as a researching designer with Punchdrunk theatre company 2011-2014. She is the Creative Director of the Nimbus Group, who produce digital arts projects, including Giddy (2016), The Nimbus (2014), and 0-1 (2012). She is a contributing author to Digital Make-Believe, which was published in May 2016 (Springer, Berlin). Her research interests include interface design, interactive system narratives, audience research, spatial sound design, and nonverbal behaviour.

Training models with images: algorithms and applications


Idiap Speaker Series
Date/time:

Jun 22, 2016 11:00 AM

Asst Prof Gregoire Mariethoz  

Abstract

Multiple-point geostatistics (MPS) has received a lot of attention in the last decade for modeling complex spatial patterns. The underlying principle consists in representing spatial variability using training images. A common conception is that a training image can be seen as a prior for the desired spatial variability. As a result, a variety of algorithmic tools have been developed to generate stochastic realizations of spatial processes based on what can be seen broadly as texture generation algorithms.

While the initial applications of MPS were dedicated to the characterization of 3D subsurface structures and the study of geological/hydrogeological reservoirs, a new trend is to use MPS for the modeling of earth surface processes. In this domain, the availability of remote sensing data as a basis to construct training images offers new possibilities for represent complexity with such non-parametric data-driven approaches. Repeated satellite observations or climate models outputs, available at a daily frequency for periods of several years, provide the required patterns repetition for having robust statistics on high-order patterns that vary in both space and time.

This presentation will delineate recent results in this direction, including MPS applications to the stochastic downscaling of climate models, the completion of partially informed satellite images, the removal of noise in remote sensing data, and modeling of complex spatio-temporal phenomena such as precipitation.


Biography

Grégoire Mariethoz was born in Neuchâtel (Switzerland) in 1978. He received a M.S. degree (2003), a MAS degree (2006) and a Ph.D. degree (2009) in hydrogeology from the University of Neuchâtel. In 2009-2010 he worked as a postdoctoral researcher at Stanford University, then between 2010 and 2014 he was Senior Lecturer at UNSW Australia. Since 2014 he is Professor Assistant at the University of Lausanne, Switzerland. His interests include the development of spatial statistics algorithms and their application in hydrology, hydrogeology and remote sensing.

Adaptation of Neural Network Acoustic Models


Idiap Speaker Series
Date/time:

May 12, 2016 10:30 AM

Prof. Steve Renals  

Abstract

Neural networks can learn invariances through many layers of non-linear transformations. Explicit adaptation to speaker or acoustic characteristics can further improve accuracy. A good adaptation technique should: (1) have a compact representation to allow the speaker-dependent parameters to be estimated from small amounts of adaptation data, and minimises storage requirements; (2) operate in an unsupervised fashion without requiring labelled adaptation data; and (3) allow for both test-only adaptation and speaker-adaptive training.

In this talk I'll discuss some approaches to the adaptation of neural network acoustic models - for both speech recognition and speech synthesis - with a focus on some approaches that we have explored in the "Natural Speech Technology" programme: factorised i-vectors, LDA domain codes, learning hidden unit contributions (LHUC), and differentiable pooling.


Biography

Steve Renals is professor of Speech Technology and director of the Institute for Language, Cognition, and Communication in the School of Informatics, at the University of Edinburgh. Previously, he was director of the Centre for Speech Technology Research (CSTR). He received a BSc in Chemistry from the University of Sheffield in 1986, an MSc in Artificial Intelligence from the University of Edinburgh in 1987, and a PhD in Speech Recognition and Neural Networks, also from Edinburgh, in 1990. From 1991-92 he was a postdoctoral fellow at the International Computer Science Institute (ICSI), Berkeley, and was then an EPSRC postdoctoral fellow in Information Engineering at the University of Cambridge (1992-94). From 1994-2003 he was lecturer, then reader, in Computer Science at the University of Sheffield, moving to Edinburgh in 2003. He has over 200 publications in speech and language processing, and has led several large projects in the field, including EPSRC Programme Grant Natural Speech Technology and the AMI and AMIDA Integrated Projects. He is a senior area editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing and a member of the ISCA Advisory Council. He is a fellow of the IEEE, and a member of ISCA and of the ACM.

Securing Encrypted Biometric Authentication With Multi-Factor Liveness Detection And One Time Passwords


Public
Date/time:

May 04, 2016 02:00 PM

Kenneth Okereafor  

Abstract

Basic Multi-biometric Authentication System was thought to have sealed the vulnerabilities and escape route from cyber criminals, but emerging attack patterns have proved us wrong. In spite of their benefits, multi-biometric systems also have peculiar challenges especially circumvention of security strategy. Circumvention refers to how susceptible the system or the presented biometric trait is to spoof attacks and identity fraud. Liveness detection has long been applied as an anti-spoofing mechanism to checkmate spoofing, however its application approach has thrown up more vulnerabilities. We have adopted the Structured Systems Analysis and Design Methodology (SSADM) to assist us understand the weaknesses and propose a solution which integrates liveness detection to halt spoofing. In this seminar, we present a different approach to performing liveness detection in multi-biometric systems to significantly minimize the probability of circumvention and considerably strengthen the overall security strategy of the authentication process.


Biography

Kenneth Okereafor is a Ph.D student of the University of Azteca, Mexico. His doctoral research focuses on Multi-biometric liveness detection. With over 18 years’ professional IT experience, he currently works with the Nigerian National Health Insurance Scheme (NHIS) as Assistant Director of Network Security and has facilitated several International presentations in Cybersecurity. A multiple recipient of the United Nations Cybersecurity Scholarship award under the ITU Global Cybersecurity Agenda, Kenneth has a combined background in Electrical & Electronics Engineering, and Computer Information Systems Security, with special interests in biometric security, electronic communications, and digital forensics. He is a certified Network Security Specialist.

The Lognormality Principle


Idiap Speaker Series
Date/time:

Mar 21, 2016 02:00 PM

Prof. Réjean Plamondon  

Abstract

The Kinematic Theory of rapid human movements and its family of lognormal models provide analytical representations of pen tip strokes, often considered as the basic unit of handwriting. This paradigm has not only been experimentally confirmed in numerous predictive and physiologically significant tests but it has also been shown to be the ideal mathematical description of the impulse response of a neuromuscular system. This proof has led to postulate the LOGNORMALITY PRINCIPLE. In its simplest form, this fundamental premise states that the lognormality of the neuromuscular impulse responses is the result of an asymptotic convergence, a basic global feature reflecting the behaviour of individuals who are in perfect control of their movements. As a corollary, motor control learning in young children can be interpreted as a migration toward lognormality. For the larger part of their lives, healthy human adults take advantage of lognormality to control their movements. Finally, as aging and health issues intensify, a progressive departure from lognormality is occurring. To illustrate this principle, we present various software tools and psychophysical tests used to investigate the fine motor control of subjects, with respect to these ideal lognormal behaviors, from childhood to old age. In this latter case, we focus particularly on investigations dealing with brain strokes, Parkinson and Alzheimer diseases. We also show how lognormality can be exploited in many pattern recognition applications for automatic generation of gestures, signatures, words and script independent patterns as well as CAPTCHA production, graffiti generation, anthropomorphic robot control and even speech modelling. Among other things, this lecture aims at elaborating a theoretical background for many handwriting applications as well as providing some basic knowledge that could be integrated or taking care of in the development of new automatic pattern recognition systems to be used for e-Learning, e-Security and e-Health.


Biography

Réjean Plamondon is a Full Professor in the department of Electrical Engineering at École Polytechnique de Montréal and Head of Laboratoire Scribens at this institution. Throughout his career, he has been involved in many pattern recognition projects, particularly in the field of on-line and off-line handwriting analysis and processing. His main contribution has been the development of a kinematic theory of rapid human movements which can take into account, with the help of lognormal functions, the major psychophysical phenomena reported in studies dealing with rapid movement control. The theory has been found successful in describing the basic kinematic properties of velocity profiles as observed in finger, hand, arm, head and eye movements. Professor Plamondon has studied and analyzed these bio-signals extensively in order to develop creative and powerful methods and systems in various domains of engineering, publishing more than 300 papers on these topics. He is a Fellow of the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS; 1989), of the International Association for Pattern Recognition (IAPR; 1994) and of the Institute of Electrical and Electronics Engineers (IEEE; 2000). He recently received the IAPR/ICDAR 2013 outstanding achievement award for “theoretical contributions to the understanding of human movement and its applications to signature verification, handwriting recognition, instruction, and health assessment, and for promoting on-line document processing in numerous multidisciplinary fields”.

How technology is opening up new potential for democracy, participation and collaboration


Idiap Speaker Series
Date/time:

Jan 19, 2016 11:00 AM

Gareth Morlais  

Abstract

The barriers to production are being lowered so it's a good time to build platforms which make it as simple as possible for everyone to join in and help train and refine language technologies, share their stories and spread the word. Gareth draws on digital storytelling with the BBC, democratic activism via hyperlocal journalism and tools for citizenship to see if there's a new way to corral people's enthusiasm for languages to help build better, more relevant resources.


Probabilistic Models for Music Performance: Interaction, Creation, Cognition


Idiap Speaker Series
Date/time:

Dec 14, 2015 02:30 PM

Dr. Baptiste Caramiaux  

Abstract

Music performance is an epitome of complex and creative motor skills. It is indeed striking that musicians can continuously show more physical virtuosity in playing their instrument and can show more creativity in varying their interpretation. Technology-mediated music performance has naturally explored the potential of interfaces and interactions for enhancing musical expression. It is however a difficult (and ill-posed) problem and musical interactive systems cannot yet challenge traditional instruments in terms of expressive control and skill learning.

I believe that an important aspect of the problem relies on the understanding of variability in the performer's movements. I will start my talk by presenting the computational approach based on probabilistic models, particularly suited to handle uncertainty in motion data that stem from noise or intentional variations of the performers. I will then illustrate the potential of the approach in the design of expressive music interactions through experiments with proofs of concept developed and evaluated in the lab; as well as real world applications in artistic projects and in industrial products for consumer devices. Finally, I will present my upcoming EU-funded research project that takes a more theoretical perspective by examining how this approach could potentially be used to infer an understanding of the cognitive processes underlying sensorimotor learning in music performance.


Biography

Baptiste Caramiaux is a Marie Sklowodska Curie Research Fellow between McGill University (Montreal, Canada) and IRCAM (Paris, France). His current research focuses on the understanding of the cognitive processes of motor learning in musical performance and the computational modelling of these processes. Before, he worked on gesture expressivity and the design of musical interactive systems through machine learning. He conducted academic research at Goldsmiths University of London, and applied part of his academic research works on industrial products at Mogees Ltd. Baptiste holds a PhD in computer science from University Pierre et Marie Curie in Paris, and IRCAM Centre Pompidou.

Shape, Medialness and Applications


Idiap Speaker Series
Date/time:

Sep 03, 2015 02:00 PM

Prof. Frederic Fol Leymarie  

Abstract

I will present on-going research in my group with a focus on shape understanding with applications to computer vision, robotics and the creative industries. I will principally discuss our recent work on building an algorithmic chain exploiting models of shape derived from the cognitive science literature but relating closely to well-known approaches in computer vision and computational geometry: that of medial descriptors of shape.

Recent relevant publications:

[1] Point-based medialness for 2D shape description and identification

P. Aparajeya and F. F. Leymarie

Multimedia Tools and Applications, May 2015

http://link.springer.com/article/10.1007%2Fs11042-015-2605-6

[2] Portrait drawing by Paul the robot

P. Tresset and F. F. Leymarie

Computers & Graphics, April 2013

Special Section on Expressive Graphics

http://www.sciencedirect.com/science/article/pii/S0097849313000149


Biography

Frederic Fol Leymarie is a Professor of Computing at Goldsmiths, University of London since late 2004. Previously he was the co-founder of the SHAPE Lab. at Brown University (1999) and later its Lab manager (2002-4) while a postdoctoral fellow. He completed his PhD thesis at Brown in 2002 on the topic of 3D Shape Representation by Shock Scaffolds. This work was supported in part by two (US) NSF grants Frederic co-wrote and one IBM Doctoral Fellowship (1999). Since joining Goldsmiths, Frederic has launched and directed the MSc Arts Computing (2004-7), as well as the MSc Computer Games Entertainment (since 2008) and the MA Computer Games Art and Design (starting in Sept. 2015), both of these in collaboration with Prof. William Latham. More details on his publication record and research and other interests and professional activities can be found on his LinkedIn profile via: www.folleymarie.com

Enabling novices to create behaviours for autonomous agents


Public
Date/time:

Jun 16, 2015 11:15 AM

Dr Stéphane Magnenat  

Abstract

This talk will present my research path under the overarching theme of enabling non-specialists to create behaviours for autonomous robot. I will start with a short description on my work on scaling up robot autonomy in the context of autonomous construction. I will then focus on modular 3-D mapping using the iterative closest point algorithm and programming by demonstration with a method requiring little user-defined parameters to be tuned. Finally, I will present my work on teaching the computer science concept of event handling using the Thymio mobile robot. I will present quantitative and qualitative results with students of different ages, and will show an experiment exploring the use of augmented reality to provide real-time program tracing. Finally, I will propose a roadmap for future work.


Biography

Dr Stéphane Magnenat is currently Associate Research Scientist at Disney Research Zürich. He received his PhD from EPFL in 2010, and before joining Disney, worked as a senior researcher at Autonomous Systems Lab at ETH Zürich. In fall 2012, he visited Willow Garage at Menlo Park, CA, USA. He then visited Tufts University, MA, USA in 2013 and Aalto University, Helsinki, Finland in 2015. He is a co-founder and board member of Mobsya, the association producing the Thymio educational robot. His current research focus on mobile robotics, CS education, and visual computing.

A hybrid approach to segmentation of speech


Public
Date/time:

Jun 12, 2015 11:00 AM

Prof. Hema Murthy  

Abstract

The most common approach to automatic segmentation of speech is, to perform forced alignment using monophone HMM models that have been obtained using embedded reestimation after flat start initialisation. Segmentation using this approach requires large amounts of data and does not work very well for low resource languages. To address the issue of paucity of data, signal processing cues are used to restrict embedded reestimation.

Voice activity detection is first performed to determine the voiced regions in an utterance. Short-term energy (STE) and spectral flux (SF) are computed on intra voiced segments. STE yields syllable boundaries, while locations of significant change in spectral flux are indicative of fricatives, nasals. STE and SF can not be used directly to segment an utterance. Minimum phase group delay based smoothing is performed to preserve these landmarks, while at the same time reducing the local fluctuations. Boundary corrections are performed at the syllable level, wherever it is known that the syllable boundaries are correct. Embedded reestimation of monophone HMM models is then restricted to the syllable boundaries. The boundaries obtained using group delay smoothing results in a number of false alarms. HMM boundaries are used to correct these boundaries. Similarly, spectral flux is used to correct fricative boundaries. Thus, using signal processing cues and HMM reestimation in tandem, robust monophone HMM models are built. These models are then used in an HTS framework to build speech synthesis systems for a number (9 at the time of this presentation) of Indian languages. Both quantitative and qualitative assessments indicate that there is a significant improvement in quality of synthesis.

In another experiment on key word spotting (KWS) in speech, the group delay based syllable boundaries are used to reduce the search space for keyword spotting on Indian English lectures. Appropriate score normalisation based on vowel normalisation in a neural network framework is used to learn the thresholds. An F-score of 72.32% was obtained on a subset of the NPTEL lectures (http://www.nptel.iitm.ac.in).


Discoverying Life patterns


Idiap Speaker Series
Date/time:

Jun 08, 2015 04:00 PM

Prof. Fausto Giunchiglia  

Abstract

The main goal of this proposal is to discover a person's life patterns (e.g., where she goes, what she does, how she is and feels and whom she spends time with) namely those situations that repeat themselves, almost but not exactly identical, with regularity, and to exploit this knowledge for improving her quality of life.

The challenge is how to synchronize a sensor and data-driven representation of the world, which is noisy, imprecise and agnostic of the user needs with a knowledge level representation of the world which should be: (i) general, by allowing for the representation and integration of different combinations of sensors and interesting aspects of the user's life and, (ii) adaptive, by representing life happenings at the desired level of abstraction, capturing their progress, and adapting to changes in the life dynamics.

The solution exploits three main components: (i) a methodology and mechanisms for an incremental evolution of a knowledge level representation of the world (e.g., ontologies), (ii) an extension of deep learning to take into account and adapt to the constraints coming from the knowledge level and (iii) a Question Answering (Q/A) service which allows the user to interact with the computer according to her needs and terminology.


Biography

Fausto Giunchiglia is a professor of computer science at the University of Trento, an ECCAI fellow, and a member of Academia Europaea. Fausto’s current main interest is in providing a theory, algorithms and systems for handling of highly heterogeneous big data in highly dynamic and unpredictable environments. The issues he is mainly interested in are (in decreasing order of importance) variety, veracity and vulnerability. His focus is on three types of data: open government data, enterprise data and personal data. Fausto has covered all the spectrum from theory to technology transfer and innovation. Some relevant roles: member of the Panel "Computer Science and Informatics" of the European Research Council (ERC), "ERC Advanced Grants" (2008 – present), Chair of the International Advisory board of the Scottish Informatics and Strategic Informatics and Computer Science Alliance (SICSA) of the 10 Scottish Universities. More than 40 invited talks in international events; chair of more than 10 international events; was/is editor or editorial board member of around 10 journals, among them: Journal of Autonomous Agents and Multi-agent Systems, Journal of applied non Classical Logics, Journal of Software Tools for Technology Transfer, Journal of Artificial Intelligence Research. He held the following roles in scientific organizations: member of the IJCAI Board of Trustees (01-11), President of IJCAI (05-07), President of KR, Inc. (02-04), Advisory Board member of KR, Inc., Steering Committee of the CONTEXT conference. Fausto has coordinated and participated in various EC projects; among them: coordination of the FP7 FET IP Smart Society and of the FP7 FET IP Living knowledge, local coordinator of the FP7 IP Cubrik, Open Knowledge, Knowledge Web.

Modeling Human Communication Dynamics


Idiap Speaker Series
Date/time:

May 11, 2015 11:00 AM

Prof. Louis-Philippe Morency  

Abstract

Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal cues from the social context. Today's computers and interactive devices are still lacking many of these human-like abilities to hold fluid and natural interactions. Leveraging recent advances in machine learning, audio-visual signal processing and computational linguistic, my research focuses on creating human-computer interaction (HCI) technologies able to analyze, recognize and predict human subtle communicative behaviors in social context. I formalize this new research endeavor with a Human Communication Dynamics framework, addressing four key computational challenges: behavioral dynamic, multimodal dynamic, interpersonal dynamic and societal dynamic. Central to this research effort is the introduction of new probabilistic models able to learn the temporal and fine-grained latent dependencies across behaviors, modalities and interlocutors. In this talk, I will present some of our recent achievements modeling multiple aspects of human communication dynamics, motivated by applications in healthcare (depression, PTSD, suicide, autism), education (learning analytics), business (negotiation, interpersonal skills) and social multimedia (opinion mining, social influence).


Biography

Louis-Philippe Morency is Assistant Professor in the Language Technology Institute at the Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He received his Ph.D. and Master degrees from MIT Computer Science and Artificial Intelligence Laboratory. In 2008, Dr. Morency was selected as one of "AI's 10 to Watch" by IEEE Intelligent Systems. He has received 7 best paper awards in multiple ACM- and IEEE-sponsored conferences for his work on context-based gesture recognition, multimodal probabilistic fusion and computational models of human communication dynamics. For the past two years, Dr. Morency has been leading a DARPA-funded multi-institution effort called SimSensei which was recently named one of the year’s top ten most promising digital initiatives by the NetExplo Forum, in partnership with UNESCO.

Can biometric similarity scores be used to calculate forensically interpretable likelihood ratios?


Public
Date/time:

May 08, 2015 11:00 AM

Geoffrey Stewart Morrison  

Abstract

Dr Morrison is currently Scientific Counsel, Office of Legal Affairs, INTERPOL General Secretariat. He is contributing to the European Union funded Speaker Identification Integrated Project (SIIP), which aims to develop investigative and police intelligence solutions for law enforcement agencies, including sharing of data via INTERPOL. He is also an Adjunct Associate Professor, Department of Linguistics, University of Alberta. He has been Director of the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunication, University of New South Wales; Chair of the Forensic Acoustics Subcommittee, Acoustical Society of America; and a Subject Editor for the journal Speech Communication. He has been involved in forensic casework in Australia and the United States.


Robust image feature extraction learning and object registration


Idiap Speaker Series
Date/time:

Apr 24, 2015 11:00 AM

Prof. Vincent Lepetit  

Abstract

Extracting image features such as feature points or edges is a critical step of many Computer Vision systems, however this is still performed with carefully handcrafted methods. In this talk, I will first present a new Machine Learning-based approach to detecting local image features, with application to contour detection in natural images, but also biomedical and aerial images, and to feature point extraction under drastic weather and lighting changes. I will then show that it is also possible to learn efficient object description based on low-level features for scalable 3D object detection.


Biography

Dr. Vincent Lepetit is a Professor at the Institute for Computer Graphics and Vision, TU Graz and a Visiting Professor at the Computer Vision Laboratory, EPFL. He received the PhD degree in Computer Vision in 2001 from the University of Nancy, France, after working in the ISA INRIA team. He then joined the Virtual Reality Lab at EPFL as a post-doctoral fellow and became a founding member of the Computer Vision Laboratory. He became a Professor at TU GRAZ in February 2014. His research interests include vision-based Augmented Reality, 3D camera tracking, Machine Learning, object recognition, and 3D reconstruction. He often serves as program committee member and area chair of major vision conferences (CVPR, ICCV, ECCV, ACCV, BMVC). He is an editor for the International Journal of Computer Vision (IJCV) and the Computer Vision and Image Understanding (CVIU) journal. http://www.icg.tugraz.at/Members/lepetit/vincent-lepetits-homepage

Medical visual information retrieval: techniques & evaluation


Idiap Speaker Series
Date/time:

Feb 19, 2015 11:00 AM

Prof. Henning Mueller  

Abstract

Medical imaging has enormously increased in importance and volume in medical institutions, particularly 3D tomographic imaging. Via digital analysis the knowledge stored in medical cases can be used for more than a single patient to help decision-making.

This presentation will highlight several challenges in medical image data processing starting with the VISCERAL EU project that evaluates segmentation, lesion detection and similar case retrieval on large amounts of medical 3D data using a cloud-based infrastructure for participants. The description of the MANY project highlights techniques for 3D texture analysis that can be used in a variety of contexts. Finally an overview of the radiology search system of the Khresmoi project will show a combination of the 3D data and the 3D analyses in a multi-modal environment.


Biography

Henning Müller studied medical informatics at the University of Heidelberg, Germany, then worked at Daimler-Benz research in Portland, OR, USA. From 1998-2002 he worked on his PhD degree at the University of Geneva, Switzerland with a research stay at Monash University, Melbourne, Australia. Since 2002 Henning has been working for medical informatics at the University hospital of Geneva, where he habilitated in 2008 and was named titular professor in medicine in 2014. Since 2007 he has also been a full professor at the HES-SO Valais and since 2011 he is responsible for the eHealth unit of the school. Henning was coordinator of the Khresmoi EU project, is scientific coordinator of the VISCERAL EU project and initiator of the ImageCLEF benchmark. He has worked on several other EU projects that include the access to and the analysis of medical data. He has authored over 400 scientific papers and is in the editorial board of several journals.

The role of electrochemical energy storage systems in a Smart Grid


Public
Date/time:

Feb 18, 2015 11:00 AM

Prof. Hubert Girault  

Abstract

He shall present the demonstrator they are installing at the water treatment plant in Martigny. It is based on a redox flow battery able to produce hydrogen to maintain the battery at an optimum state of charge. He shall therefore explain how a redox flow battery works and discuss the advantages and disadvantages. Then, he shall present how our concept of service station for electric cars, with lithium batteries like the Tesla or with hydrogen fuel cells like the Hyundai ix35.


Data Valorisation based on Linked (open) Data approaches


Public
Date/time:

Feb 12, 2015 11:00 AM

Prof. Maria Sokhn  

Abstract

Maria will also present her group at the Hes-so Valais Wallis


Video Inpainting of Complex Scenes


Idiap Speaker Series
Date/time:

Feb 05, 2015 02:00 PM

Prof. Yann Gousseau  

Abstract

While image inpainting is a relatively mature subject whose numerical results are often visually striking, the automatic filling-in of video is still prone to yield incoherent results in many situations. Moreover, the subject is impaired by strong computational bottlenecks. In this talk, we present a patch-based approach to inpaint videos, relying on a global, multi-scale optimization heuristic. Contrarily to previous approaches, the best patch candidates are selected using texture attributes, that are built within a multi-scale video representation. We show that this rationale prevents the usual wash-out of textured and cluttered parts of video. Combined with an appropriate nearest neighbor search and a simple stabilization-like procedure, the resulting approach is able to successfully and automatically inpaint complex situations, including high resolution sequences with dynamic textures and multiple moving objects.


Biography

Yann Gousseau received the engineering degree from the École Centrale de Paris, France, in 1995, and the Ph.D. degree in applied mathematics from the University of Paris-Dauphine in 2000. He is currently a professor at Telecom ParisTech. His research interests include the mathematical modeling of natural images and textures, mono and multi-image restoration, computational photography, stochastic geometry, image analysis, computer vision and image processing.

Trainable Interaction Models for Embodied Conversational Agents


Idiap Speaker Series
Date/time:

Jan 08, 2015 11:00 AM

Dr. Mary Ellen Foster  

Abstract

Human communication is inherently multimodal: when we communicate with one another, we use a wide variety of channels, including speech, facial expressions, body postures, and gestures. An embodied conversational agent (ECA) is an interactive character -- virtual or physically embodied -- with a human-like appearance, which uses its face and body to communicate in a natural way. Giving such an agent the ability to understand and produce natural, multimodal communicative behaviour will allow humans to interact with such agents as naturally and freely as they interact with one another, enabling the agents to be used in applications as diverse as service robots, manufacturing, personal companions, automated customer support, and therapy.

To develop an agent capable of such natural, multimodal communication, we must first record and analyse how humans communicate with one another. Based on that analysis, we then develop models of human multimodal interaction and integrate those models into the reasoning process of an ECA. Finally, the models are tested and validated through human-agent interactions in a range of contexts.

In this talk, I will give three examples where the above steps have been followed to create interaction models for ECAs. First, I will describe how human-like referring expressions improve user satisfaction with a collaborative robot; then I show how data-driven generation of facial displays affects interactions with an animated virtual agent; finally, I describe how trained classifiers can be used to estimate engagement for customers of a robot bartender.


Biography

Mary Ellen Foster is a Research Fellow in the Interaction Lab at the School of Mathematical and Computer Sciences at Heriot-Watt University in Edinburgh, Scotland. She received her Ph.D. in Informatics from the University of Edinburgh, and has previously worked in the Robotics and Embedded Systems Group at the Technical University of Munich and in the School of Informatics at the University of Edinburgh. Her research interests include embodied communication, natural language generation, and multimodal dialogue systems. In particular, she is interested in designing, implementing, and evaluating practical artificial systems that support embodied interaction with human users, such as embodied conversational agents and human-robot dialogue systems. She has worked on European and national projects including COMIC, JAST, ECHOES, JAMES, and EMOTE.

Language identification@BUT


Public
Date/time:

Nov 12, 2014 11:00 AM

Pavel Matejka  

Abstract

This talk presents an ongoing work in language identification for DARPA RATS programme. The talk will describe an application of Neural Network Bottleneck (BN) features in Language Identification (LID). BN features are generally used for Large Vocabulary Speech Recognition in conjunction with conventional acoustic features, such as MFCC or PLP. We compare the BN features to several common types of acoustic features used in the present-day state-of-the-art LID systems. The test set is from DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from highly degraded radio communication channels. On this type of noisy data, we show that in average, the BN features provide a 45% relative improvement in the Cavg or Equal Error Rate (EER) metrics across several test duration conditions, with respect to our single best acoustic features.


Speech technologies - going from the research labs to market


Public
Date/time:

Nov 12, 2014 10:00 AM

Petr Schwarz  

Abstract

Several speech technologies like speech transcription, keyword spotting, language identification, speaker identification will be discussed from the architecture point of view. Then cases how these speech technologies are used in call centers, banks, by governmental agencies, or by broad cast service providers for speech data mining, voice analytic or voice biometry will be presented. Each client and use case has some specific requirements on technology, data handling and services. The requirements and its implication on technology development and research will be mentioned.


Pose estimation and gesture recognition using structured deep learning


Idiap Speaker Series
Date/time:

Oct 17, 2014 11:00 AM

Prof. Christian Wolf  

Abstract

In this talk I will address the problem of gesture recognition and pose estimation from videos, following two different strategies:

(i) estimation of articulated pose (full body or hand pose) alleviates subsequent recognition steps in many conditions and allows smooth interaction modes and tight coupling between object and manipulator;

(ii) in situations of low image quality (e.g. large distances between hand and camera), obtaining an articulated pose is hard. Training a deep model directly on video data can give excellent results in these situations.

We tackle both cases by training deep architectures capable of learning discriminative intermediate representations. The main goal is to integrate structural information into the model in order to decrease the dependency on large amounts of training data.To achieve this, we propose an approach for hand pose estimation that requires very little labelled data. It leverages both unlabeled data and synthetic data produced by a rendering pipeline. The key to making it work is to integrate structural information not into the model architecture, which would slow down inference, but into the training objective. We show that adding unlabeled real-world samples significantly improves results compared to a purely supervised setting.

In the context of multi-modal gesture detection and recognition, we propose a deep recurrent architecture that iteratively learns and integrates discriminative data representations from individual channels (pose, video, audio), modeling complex cross-modality correlations and temporal dependencies. It is based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalities; and ii) gradual fusion of modalities from strongest to weakest cross-modality structure.

We present experiments on the "ChaLearn 2014 Looking at People Challenge" gesture recognition track organized in conjunction with ECCV 2014, in which we placed 1st out of 17 teams. The objective of the challenge was to detect, localize and classify Italian conversational gestures from large database of 13858 gestures. The multimodal data included color video, range maps and a skeleton stream.

The talk will be preceded by a brief introduction to the work done in my LIRIS team.

Site : http://liris.cnrs.fr/christian.wolf/research/gesturerec.html


Biography

Christian WOLF received his MSc in computer science from Vienna University of Technology in 2000, and the PhD in computer science from the National Institut of Applied Science(INSA de Lyon), France, in 2003. In 2012 he obtained the habilitation diploma, also from INSA de Lyon. From september 2004 to august 2005 he was assistant professor at the Louis Pasteur University, Strasbourg, and member of the Computer and Image Science and Remote Sensing Laboratory (LSIIT). Since september 2005 he is assistant professor at INSA de Lyon, and member of LIRIS a laboratory of the CNRS, where he is interested in computer vision and machine learning, espacially in structured models, deep learning, gesture and activity recognition and computer vision for robotics.

Fitting Ancient Texts into Modern Technology: The Maya Hieroglyphic Codices Database Project


Idiap Speaker Series
Date/time:

Jul 01, 2014 11:00 AM

Dr. Gabrielle Vail  

Abstract

The Maya hieroglyphic codices provide a rich dataset concerning astronomical beliefs, divinatory practices, and the ritual life of prehispanic Maya cultures inhabiting the Yucatan Peninsula in the years leading up to the Spanish conquest in the early sixteenth century. Structurally, the codices are organized in terms of almanacs and astronomical tables, both of which incorporate several types of data--calendrical, iconographic, and textual--that together allowed Maya scribes to encode complex relationships among deities, dates having ritual and/or celestial significance, and associated activities. In order to better understand these relationships, the Maya Hieroglyphic Codices Database project was initiated to develop sophisticated online research tools to aid in analysis of these manuscripts. Because the Maya scribes did not live in a culture that demanded strict adherence to paradigms that we take for granted when organizing information for electronic search and retrieval, this posed a significant challenge in efforts to discover how data contained in ancient manuscripts could be converted into data structures that facilitated computer searching and information retrieval. This presentation discusses the approaches taken by the author and the architect of the database project to find compromises that enable computer analysis of a set of texts created by scribes more than half a millennium ago, while avoiding the biases inherent in translating knowledge across spatial and cultural divides. The presentation will be made by Dr. Vail; the technical architect to the project, William Giltinan, will be available to answer questions at the conclusion of the lecture.


Biography

Gabrielle Vail specializes in the study of Maya hieroglyphic texts, with an emphasis on prehispanic Maya ritual and religion as documented in screenfold manuscripts painted in the fourteenth and fifteenth centuries. Her research is highlighted in numerous print and online publications, as well as the online Maya Codices Database (www.mayacodices.org), a collaborative project undertaken with funding from the National Endowment for the Humanities. Dr. Vail has published ten books and edited journals, most recently Códice de Madrid (Universidad Mesoamericana, 2013) and Re-Creating Primordial Time: Foundation Rituals and Mythology in the Postclassic Maya Codices (University Press of Colorado, 2013; with Christine Hernández). Dr. Vail received her Ph.D. from Tulane University in 1996 and holds a research and faculty position at New College of Florida in Sarasota, where she teaches courses on a variety of subjects, including the decipherment of Maya hieroglyphic texts and the astronomy of prehispanic cultures of the Americas. Technical architect: William Giltinan earned his bachelor’s degree in computer science from New College of Florida and a master’s degree in computer science and engineering from the University of Michigan. Following this, he spent more than a decade as a software engineer and entrepreneur in technology-driven enterprises. In 1992, he assumed the role of technical architect of the Maya Hieroglyphic Codices Database project and has continued in this capacity through the present. Mr. Giltinan returned to academia in 2003 to earn his Juris Doctorate and later his Master of Law degree in intellectual property law from the George Washington University Law School. He is a practicing intellectual property attorney and teaches patent law as an adjunct professor.

Recognising people, motion and actions in video


Idiap Speaker Series
Date/time:

Jun 11, 2014 11:00 AM

Prof. Richard Bowden  

Abstract

Learning to recognise the motion or actions of people in video has wide applications covering topic from sign or gesture recognition through to surveillance and HCI. This talk will discuss approaches to video mining, allowing the discovery of weakly supervised spatiotemporal signatures such as actions embedded in video or Signs/facial motion weakly supervised by language. Whether the task is recognising an atomic action of an individual or their implied activity, the continuous multichannel nature of sign language recognition or the appearance of words on the lips, all approaches can be categorised at the most basic level as the learning and recognition of spatio-temporal patterns. However, in all cases, inaccuracies in labelling and the curse of dimensionality lead us to explore new learning approaches that can operate in a weakly supervised setting. This talk will discuss the adaptation of mining to the video domain and new approaches to learning spatiotemporal signatures covering a broad range of application areas such as facial feature extraction and regression, lip reading, activity recognition and sign and gesture recognition in both 2D and 3D.


Biography

Prof Richard Bowden received a BSc degree in computer science from the University of London in 1993, an MSc degree with distinction from the University of Leeds in 1995, and a PhD degree in computer vision from Brunel University. He is currently Professor of computer vision and machine learning at the University of Surrey, United Kingdom, where he leads the Cognitive Vision Group within the Centre for Vision Speech and Signal Processing and was recently awarded a Royal Society Leverhulme Trust Senior Research Fellowship. He was a visiting research fellow at the University of Oxford 2001-2004 working with Profs Zisserman and Brady. His research focuses on the use of computer vision to locate, track, and understand humans with specific examples in Sign and Gesture recognition, Activity and Action recognition, lip-reading and facial feature tracking. His research into tracking and artificial life received worldwide media coverage, appearing at the British Science Museum and the Minnesota Science Museum. He has published more than 140 peer reviewed papers and has served as either program committee member or area chair for ICCV, CVPR and ECCV in addition to numerous international workshops and conferences. He was general chair for BMVC2012, track chair for ICPR2012 and is associate editor for the journal Image and Vision Computing and IEEE Pattern Analysis and Machine Learning. He was a member of the British Machine Vision Association (BMVA) executive committee and a company director for seven years. He is a member of the BMVA, a fellow of the Higher Education Academy, and a senior member of the IEEE. He has held over 20 research grants worth in excess of £5M and supervised over fifteen PhD students. His research has been recognised by prizes, plenary talks & media/press coverage including the Sullivan thesis prize in 2000 and many best paper awards.

On the use of multimodal cues for the modeling of group involvement and individual engagement in multiparty dialogue


Public
Date/time:

Jun 05, 2014 10:30 AM

Catharine Oertel  

Abstract

Multiparty conversations are characterized by various degrees of participants' engagement and group involvement. Humans are able to detect and interpret these degrees, basing their perception on multimodal cues. The automatic detection, in particular for bigger groups of people, poses however many challenges. In this talk, I will mainly focus on a study in which we analysed group-behaviour in an eight-party, multimodal corpus. We propose four features that summarize different aspects of eye-gaze patterns and allow us to describe individual engagement as well as group involvement in time. Our overall aim is to build a system which is able to foster group involvement.

In addition, I will briefly comment on 2 studies in which we use the robot head Furhat to advance in this direction. Furhat is a robotic head that combines state-of-the-art facial animation with physical embodiment in order to facilitate multi-party dialogues with robots.


Biography

Catharine Oertel is a PhD candidate at the Department of Speech, Music and Hearing at the Royal Institute of Technology (KTH) in Sweden since 2012. She is a member of the Speech group and is supervised by Prof. Joakim Gustafson. She received her Master's degree in Linguistics: Communication, Cognition and Speech Technology from Bielefeld University in 2010. From 2010-2012 she was a member of the Speech Communication Lab at Trinity College, Dublin. Her work has mainly been focused on the multi-modal modeling of conversational dynamics but she has also been active in the area of Human-Robot-Interaction.

The Web: Wisdom of Crowds or Wisdom of a Few?


Idiap Speaker Series
Date/time:

May 21, 2014 03:00 PM

Prof. Ricardo Baeza-Yates  

Abstract

The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as more than two billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. But how is the Web? What are the activities of people? How content is generated? Web data mining is the main approach to answer these questions. Web data comes in three main flavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or log mining. Each case reflects the wisdom of some group of people that can be used to make the Web better. For example, user generated tags in Web 2.0 sites. In this presentation we explore the wisdom of crowds in relation to several dimensions such as bias, privacy, scalability, and spam. We also cover related concepts such as the long tail of the special interests of people, or the digital desert, content that nobody sees.


Biography

Ricardo Baeza-Yates is VP of Yahoo! Labs for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electrical engineering degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.

Interpersonal synchrony: social signal processing and social robotics for revealing social signatures


Idiap Speaker Series
Date/time:

Apr 15, 2014 11:00 AM

Prof. Mohamed Chetouani  

Abstract

Social signal processing is an emerging research domain with rich and open fundamental and applied challenges. In this talk, I'll focus on the development of social signal processing techniques for real applications in the field of psycho-pathology. I'll overview recent research and investigation methods allowing neuroscience, psychology and developmental science to move from isolated individuals paradigms to interactive contexts by jointly analyzing behaviors and social signals of partners. From the concept of interpersonal synchrony, we'll show how to address the complex problem of evaluating children with pervasive developmental disorders. These techniques are also demonstrated in the context of human-robot interaction by a new way of using robots in autism (moving from assistive devices to clinical investigations tools). I will finish by closing the loop between behaviors and physiological states by presenting new results on oxytocin and proxemics during early parent-infant interactions.


Biography

Prof. Mohamed Chetouani is the head of the IMI2S (Interaction, Multimodal Integration and Social Signal) research group at the Institute for Intelligent Systems and Robotics (CNRS UMR 7222), University Pierre and Marie Curie-Paris 6. He received the M.S. degree in Robotics and Intelligent Systems from the UPMC, Paris, 2001. He received the PhD degree in Speech Signal Processing from the same university in 2004. In 2005, he was an invited Visiting Research Fellow at the Department of Computer Science and Mathematics of the University of Stirling (UK). Prof. Chetouani was also an invited researcher at the Signal Processing Group of Escola Universitaria Politecnica de Mataro, Barcelona (Spain). He is currently a Full Professor in Signal Processing, Pattern Recognition and Machine Learning at the UPMC. His research activities, carried out at the Institute for Intelligent Systems and Robotics, cover the areas of social signal processing and personal robotics through non-linear signal processing, feature extraction, pattern classification and machine learning. He is the head of the interdisciplinary research group IMI2S (Interaction, Multimodal Integration and Social Signal) gathering researchers from social signal processing, social robotics, psycho-pathology and neuroscience. This group develops models and methods for the analysis, recognition and prediction of social signals, behaviors with a life-span perspective with a particular attention to disorders (autism, Alzheimer). He has published numerous research papers including some in high impact journals (Plos One, Biology Letters, Pattern Recognition, IEEE Transactions on Audio, Speech and Language Processing). He is also the co-chairman of the French Working Group on Human-Robots/Systems Interaction (GDR Robotique CNRS) and a Deputy Coordinator of the Topic Group on Natural Interaction with Social Robots (euRobotics).

Anthropomorphic media design and attention modeling


Public
Date/time:

Mar 10, 2014 11:00 AM

Dr. Tomoko Yonezawa and Ms. Yukari Nakat  

Abstract

In this talk, we would like to introduce our past trials on the human-robot / human-agent interactions especially focusing on the user's attention and the gaze communication.

At first, in "Communication on Anthropomorphic Media", Dr. Tomoko Yonezawa will make a presentation on the past researches on gaze-communication and the robot's behaviors. Additionally, she will talk about her current research on touch interaction between human and wearable robot.

Second, in "Presences with Avatars' Appearances Attached to Tex Communication in Twitter", Ms. Yukari Nakatani introduces her research theme on the representations of multiple virtual agents for sustainable communications in SNS.

Finally we introduce the students' researches in our laboratory with some presentation movies.


Building a Multilingual Heritage Corpus with Applications in Geo-Tagging and Machine Translation


Public
Date/time:

Mar 03, 2014 04:00 PM

Martin Volk  

Abstract

In this talk Martin Volk will present the Text+Berg project, an initiative to digitize and annotate all the yearbooks of the Swiss Alpine Club from its start in 1864 until today. The resulting corpus of 40 million words contains texts in the 4 official Swiss languages, with a large parallel part in German and French. Based on these translations Martin's group works on domain-specific machine translation systems, but also on search systems for word-aligned parallel corpora as a new resource for translators and linguists. Most of the yearbooks (more than 100'000 pages) were scanned and converted to text at the University of Zurich. Martin Volk will share his experiences on automatically correcting OCR errors as well as on dealing with tokenization, lemmatization and PoS-tagging issues in a corpus that spans 150 years and multiple languages. He will also report on the Text+Berg toponym detection and classification as well as person name recognition and tagging of temporal expressions. Recently the group has released Kokos, a system for collaborative correction of OCR errors in the yearbooks of the 19th century (http://kokos.cl.uzh.ch) and asked the SAC members to join in creating a clean corpus.


Biography

Martin Volk is Professor of Computational Linguistics at the University of Zurich. His research focuses on multilingual systems, in particular on Machine Translation. His group has been investigating domain adaptation techniques for statistical machine translation, hybrid machine translation for lesser resourced languages, and machine translation into sign language. He is also known for his work on machine translation of film and TV subtitles. Together with Noah Bubenhofer he is leading the Text+Berg project for the digitization and annotation of a large multilingual heritage document as a showcase in the Digital Humanities.

Recent trends and future challenges in action recognition


Idiap Speaker Series
Date/time:

Feb 14, 2014 11:00 AM

Dr. Ivan Laptev  

Abstract

This talk will overview recent progress and open challenges in human action recognition. Specifically, I will focus on the three problems of (i) action representation in video, (ii) weakly-supervised action learning and (iii) ambiguity of action vocabulary. To the first problem, I will overview local feature methods providing state-of-the-art results on current action recognition benchmarks. Motivated by the difficulty of large-scale video annotation, I will next present our recent work on weakly-supervised action learning from video and corresponding video scripts. I will finish by highlighting limitations of the standard action classification paradigm and will show some of our work addressing this problem.


Biography

Ivan Laptev is a research director at INRIA Paris-Rocquencourt, France. He received his PhD degree in Computer Science from the Royal Institute of Technology (KTH) in 2004 and a Master of Science degree from the same institute in 1997. He was a research assistant at the Technical University of Munich (TUM) during 1998-1999. He has joined INRIA as a postdoc in 2004 and became a full-time INRIA researcher in 2005. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 50 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of International Journal of Computer Vision and Image and Vision Computing Journal, he was/is an area chair for CVPR 2010, ICCV 2011, ECCV 2012, CVPR 2013 and ECCV 2014, he has co-organized several workshops and tutorials on human action recognition at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013). Ivan was awarded ERC Starting Grant in 2012.

Privacy & Trust Challenges in Open Public Display Networks


Idiap Speaker Series
Date/time:

Jan 21, 2014 11:00 AM

Prof. Marc Langheinrich  

Abstract

Future public displays have the potential to become much more than a simple digital signage -- they can form the basis for a novel communication medium. By interconnecting displays and opening them up to applications and content from a wide range of sources, they can not only support individuals and their communities, but also increase their relevance and ultimately their economic benefits. Ultimately, open display networks could have the same impact on society as radio, television and the Internet. In this talk, I will briefly summarize this vision and its related challenges, in particular with respect to privacy and trust, and present the work that we did in this area in the context of a recently finished FET-Open project titled "PD-Net".


Biography

Marc Langheinrich is an Associate Professor at the Università della Svizzera italiana (USI) in Lugano, Switzerland. Marc received his PhD (Dr. sc. ETH) on the topic of "Privacy in Ubiquitous Computing" from the ETH Zurich, Switzerland, in 2005. He has published extensively on both privacy and usability of ubiquitous and pervasive computing systems, and is a regular program committee member of various conferences and workshops in the areas of pervasive computing, security and privacy, and usability. Marc currently serves on the editorial board of IEEE Pervasive Computing Magazine and Elsevier's "Personal and Mobile Communications" Journal, and is a Steering Committee member of the UbiComp and IoT conference series.

Cost-effective, Autonomic and Adaptive Cloud Resource Management


Public
Date/time:

Dec 18, 2013 10:00 AM

Thanasis Papaioannou  

Abstract

Current large scale web applications pose enormous and dynamic processing and storage requirements. Failures of any type are common in current datacenters, partly due to the higher scales of the data stored. As data scales up, its availability becomes more complex, while different availability levels per application or per data item may be required. At the same time, cloud infrastructures should be able to effectively deal with the elastic nature of these applications in an autonomic manner. To make things worse, as clients get increasingly averse to vendor lock-in and data unavailability risks, client data has to be efficiently split across clouds. In this talk, we briefly discuss three very effective cloud resource management solutions that deal with the different aforementioned requirements: Skute, Scarce and Scalia. Skute is a self-managed key-value store that dynamically allocates the resources of a data cloud to several applications in a cost-efficient and fair way. Scarce is a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. Scalia is a cloud storage brokerage solution that continuously adapts the placement of data based on its access pattern, subject to optimizations objectives and data placement constraints, such as storage costs and vendor lock-in avoidance.


Biography

Dr. Thanasis G. Papaioannou is a senior researcher at the Information Technologies Institute of the Center for Research and Technology Hellas (CERTH). Formerly, he was a postdoctoral fellow at the Distributed Information Systems Laboratory of Ecole Polytechnique Fédérale de Lausanne (EPFL). He received his B.Sc. (1998) and M.Sc. (2000) in Networks and in Parallel/Distributed Systems from the Department of Computer Science, University of Crete, Greece, and his Ph.D. (2007) from the Department of Computer Science, Athens University of Economics and Business (AUEB). From spring 2007 to spring 2008, he was a Visiting Professor in the Department of Computer Science of AUEB, teaching i) Distributed Systems and ii) Networks - Network Security. He has over 45 publications in high quality journals and conferences including Springer Electronic Commerce Research, Elsevier Computer Networks Journal, INFOCOM'13, EDBT'13, CIKM'12, ACM SC'12 (SuperComputing), IEEE ICDE'10, ACM SOCC'10, IEEE CCGRID'11, INFOCOM'08, etc. He has been TPC member in over 25 conferences including SSDBM'14, ICDCS'13, SIGMOD Demo'13, SSDBM'13, SIGMOD Demo'12, SSDBM'12, ICDE'12, SocInfo'10, ICEC '07-09, Valuetools'08, etc.

Statistical methods for environmental modelling and monitoring


Public
Date/time:

Nov 29, 2013 10:00 AM

Dr. Eric A. Lehmann  

Abstract

The CSIRO Division of Computational Informatics (CCI) aims to transform information and decision making to enhance productivity, foster collaboration and deliver impact through services across a wide range of sectors. CCI researchers have in-depth expertise in applying statistical and mathematical methods in a variety of scien- tific fields including, among others, environmental and agricultural informatics, wireless sensor networks, in- formation and communication technologies for healthcare and clinical treatment, development of early screening tests for Alzheimer's disease (bioinformatics), computational and simulation sciences (high perform- ance computing), as well as statistical modelling for seasonal climate forecasting and complex biogeochemical systems (e.g. marine environments).

This presentation will focus on some aspects of the research being carried out at CCI on applications of statisti- cal and computational methods for environmental modelling and natural resource management. In particular, I will present an overview of my recent work on the following topics:

- multi-sensor integration of remote sensing data for large-scale vegetation mapping and monitoring,

- data fusion methods for water resources assessment using ground-based and remote sensing data, and

- spatial modelling of extreme weather events and associated risks in the context of a changing climate.

These projects involve several aspects of multivariate Bayesian modelling and analysis (spatial and temporal), computational simulation methods (Markov chain Monte Carlo), issues of data quality and continuity, as well as scientific dissemination and stakeholder engagement.


Biography

Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Dipl. El.-Ing. ETH diploma (M.Sc. in Electrical Engineering). He received the M.Phil. and Ph.D. degrees, both in Elec- trical Engineering, from the Australian National University (Canberra) in 2000 and 2004, respectively. From 2004 to 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, where he was active in the field of acoustics, array signal processing and beamforming, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker localisation and tracking. He now works as a Research Scientist for CSIRO in Perth, within the division of Computational Informatics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatiotemporal data for environmental mapping and monitoring. He also contributes to the development of Bayesian hierarchical methods for natural resource management and climate modelling purposes.

Robot learning by imitation and exploration with probabilistic dynamical systems


Public
Date/time:

Nov 22, 2013 10:00 AM

Dr. Sylvain Calinon  

Abstract

Robots in current industrial settings reproduce repetitive movements in a stiff and precise manner, with sensory information often limited to the role of stopping the motion if a human or object enters the robot's workspace. The new developments in robot sensors and compliant actuators bring a new human-centric perspective to robotics. An increase of robots in small and medium-sized enterprises (SMEs) is predicted for the next few years. Products in SMEs are characterized by small batch sizes, short life-cycles and end-user driven customization, requiring frequent re-programming of the robot. SMEs also often involve confined spaces, so that the robots must work in safe collaboration with the users by generating natural movements and anticipating co-workers' movements with active perception and human activity understanding.

Interestingly, these robots are much closer to human capabilities in terms of compliance, precision and repeatability. In contrast to previous technology, the planning, control, sensing and interfacing aspects must work hand-in-hand, where the robot is only one part of a broader robotics-based technology. The variety of signals to process and the richness of interaction with the users and the environment constitute a formidable area of research for machine learning.

Current programming solutions used by the leading commercial robotics companies do not satisfy the new requirements of re-using the same robot for different tasks and interacting with multiple users. The representation of manipulation movements must be augmented with forces (for task execution, but also as a communication channel for collaborative manipulation), compliance and reactive behaviors. An attractive approach to the problem of transferring skills to robots is to take inspiration from the way humans learn by imitation and self-refinement.

I will present a task-parametrized model based on dynamic movement primitives and Gaussian mixture regression to exploit the local correlations in the movement and the varying accuracy requirements of the task. The model is used to devise a controller for the robot that can adapt to new situations and that is safe for the surrounding users. Examples of applications with a compliant humanoid and with gravity-compensated manipulators will be showcased.


Biography

Dr Sylvain Calinon is Team Leader of the Learning and Interaction Lab at the Italian Institute of Technology (IIT), and a visiting researcher at the Learning Algorithms and Systems Laboratory (LASA), Ecole Polytechnique Fédérale de Lausanne (EPFL). He received a PhD on robot programming by demonstration in 2007 from LASA, EPFL, which was awarded by the Robotdalen Scientific Award, ABB Award and EPFL-Press Distinction. From 2007 to 2009, he was a postdoctoral research fellow at LASA, EPFL. His research interests cover robot learning by imitation, machine learning and human-robot interaction. Webpage: http://programming-by-demonstration.org/SylvainCalinon/

Quality in Face and Iris Research


Public
Date/time:

Nov 20, 2013 10:30 AM

Dr. Stephanie Schuckers  

Abstract

Because of limited resources (e.g. number and type of cameras, amount of time to focus on an individual, real-time processing power), using intelligence within standoff biometric capture systems can help in determining which individuals to focus on and for how long. Benchmark datasets available to the general research community are needed designing a stand-off multimodal biometric system. The overall goal of the research to investigate the fusion approaches to measure face, iris, and voice through experiments for identity at distances from 10 to 25 meters. This research includes a growing corpus of data, entitled Quality in Face and Iris Research Ensemble-Q-FIRE dataset which includes the following: (1) Q-FIRE Release 1 (made available in early 2010) is composed of 4T of face and iris video for 90 subjects out to 8.3meters (25 feet) with controlled quality degradation. (2) Release 2 is an additional 83 subjects with same collection specifications. Release 1 and 2 were used by NIST in IREX II: Iris Quality Calibration and Evaluation (IQCE). (3) Last, an extension of the dataset has been collected with unconstrained behavior of subjects on the same set of subjects, entitled Q-FIRE Phase II Unconstrained out to 8.3 meters. In this talk, the datasets will be described as well as results of experiments fusing face and iris scores with quality.

http://people.clarkson.edu/~sschucke/


Multimodal Interaction with Humanoid Robots


Public
Date/time:

Nov 19, 2013 10:00 AM

Prof. Kristiina Jokinen  

Abstract

In this talk I will discuss issues related to multimodal interaction with intelligent agents, and in particular, present the Nao Wikitalk, an application that enables the user to query Wikipedia via the Nao robot. The robot can talk about an unlimited range of topics, so it supports open-domain conversations using Wikipedia as a knowledge source. The robot suggests some topics to start with, and the user can shift to related topics by speaking the topic names after the robot mentions them. The user can also switch to a totally new topic by spelling the first few letters. The challenge in presenting Wikipedia information is how to convey its structure to the user so that she can understand what is new information, and how to navigate in the topic structure. In Wikipedia, new relevant information is marked with hyperlinks to other entries, and the robot's interaction capabilities have been extended so that it signals these links non-verbally while reading the text. As well as speaking, the robot uses gestures, nods and other multimodal signals to enable clear and rich interaction. Gesture and posture changes can also be used to manage turn-taking, and to add liveliness to the interaction in general. To manage the interaction in a smooth way, it is also important to capture the user's emotional and attentional state. For this, we have experimented with gazing and face tracking to infer the user's interest level. The Nao WikiTalk system was evaluated by comparing the users' expectations with their experience of the robot interaction. In many respects the users had high expectations regarding the robot's interaction capabilities, but they were impressed by the robot's lively appearance and natural gesturing.


Biography

Kristiina Jokinen is Adjunct Professor and Research Manager at University of Helsinki, and she is also Adjunct Professor of Interaction Technology at University of Tampere, Finland, and Visiting Professor at University of Tartu, Estonia. She received her PhD from University of Manchester, UK, and was alltogether four years as a post-doc at NAIST and as an invited researcher at ATR in Japan. In 2009-2010 she was Visiting Professor at Doshisha University in Kyoto. Her research focuses on spoken dialogue modelling, multimodal interaction management (especially gestures and eye gaze), natural language communication, and human-machine interaction. She has published many papers and articles, and three books: "Constructive Dialogue Modelling - Speech Interaction and Rational Agents" (John Wiley), "Spoken Dialogue Systems" (together with M. McTear; Morgan & Claypool), and "New Trends in Speech-based Interactive Systems" (edited together with F. Chen; Springer). She has been invited speaker e.g. at IWSDS 2010 and Multimodal Symposium in 2013. She organised the Nordic Research Training Course "Feedback, Communicative Gesturing, and Gazing" in Helsinki in 2011, and led the summer workshop "Speech, gaze and gesturing - multimodal conversational interaction with the Nao robot" in Metz, together with Graham Wilcock, in 2012. She has had several national and international cooperation projects and served in several programme and review committees. She is Programme Chair for the 2013 International Conference of Multimodal Interaction (ICMI), and she is Secretary-Treasurer of SIGDial, the ACL/ISCA Special Interest Group for Discourse and Dialogue.

Advancing bio-microscopy with the help of image processing


Public
Date/time:

Nov 18, 2013 10:00 AM

Prof. Michael Liebling  

Abstract

Image processing in bio-microscopy is no longer confined to the post-processing stage, but has gained wide acceptance as an integral part of the image acquisition process itself, as it allows overcoming hard limits set by instrumentation and biology. In this talk, I will present my lab's efforts to image dim and highly dynamic biological samples by boosting the temporal and spatial resolution of optical microscopes via software solutions and modified imaging protocols. Focusing on spatio-temporal image registration strategies to build 3D+time models of samples with repetitive motions, a superresolution algorithm to reconstruct image sequences from multiple low temporal resolution acquisitions, and a fast multi-channel deconvolution algorithm for multi-view imaging, I will illustrate the central role signal processing can play to advance bio-imaging. I will share the approaches we implemented in my group to rapidly bring new ideas from theory to full deployment in remote biology labs , where our tools can be applied with a variety of microscopy types. Finally, I will speculate on the future of image processing in bio-microscopy and suggest areas where efforts may be most rewarding.


Biography

Michael Liebling is an Associate Professor of Electrical and Computer Engineering at the Universitz of California, Santa Barbara (UCSB). He received the MS in Physics (2000) and PhD in image processing (2004) from EPFL. From 2004 to 2007, he was a Postdoctoral Scholar in Biology at the California Institute of Technology, before joining the faculty in the department of Electrical and Computer Engineering in 2007, first as an Assistant Professor and, since Summer 2013, as an Associate Professor. His research interests include biological microscopy and image processing for the study of dynamic biological processes and, more generally, computational methods for optical imaging. He teaches both at the graduate and undergraduate level in the areas of signal processing, image processing and biological microscopy. Michael Liebling is a recipient of prospective and advanced researcher fellowships from the Swiss National Science Foundation and a 2011 Hellman Family Faculty Fellowship. He is v ice-chair (2014 Chair-elect) of the IEEE Signal Processing Society's Bio-Imaging and Signal Processing technical committee and was Technical Program co-chair of the IEEE International Symposium on Biomedical Imaging in 2011 and 2013.

Human-Centered Computing for Critical Multimodal Cyber-Physical Environments


Public
Date/time:

Nov 05, 2013 11:00 AM

Dr. Nadir Weibel  

Abstract

Critical cyber-physical environments such as the ones found in many healthcare settings or on the flight deck of modern airplanes are built on complex systems characterized by important properties spanning the physical and digital world, and centered on human activity. In order to properly understand this critical activity, researchers need to first understand the context and environment in which the activity is situated. Central in those environments is often interaction with the available technology and the communication between the individuals, both of which often involve multiple parallel modalities. Only an in-depth understanding of the properties of these multimodal distributed environments can inform the design and development of multimodal human-centered computing.

After presenting an overview of my current research in human-centered computing, this talk will present some of the challenges and proposed solutions in terms of technologies and theoretical frameworks for collecting and making sense of rich multimodal data in two critical cyber-physical environments: the cockpit of a Boeing 787 airplane, and the medical office. The talk will explain how the combination of a range of data collection devices such as depth cameras, eye tracking, digital-pens, and HD video cameras, combined with powerful data visualization and a flexible analysis suite, allows in-depth understanding of those complex environments. I will end with a discussion of cutting-edge multimodal technology and how devices such as depth cameras and wearable augmented reality glasses open up a range of opportunities to develop new technology for knowledge workers of critical cyber-physical environments.

BIO: Dr. Nadir Weibel is a Research Assistant Professor in the Department of Computer Science and Engineering at the University of California San Diego (UCSD), where he is teaching human-computer interaction and ubiquitous computing. His research is situated at the intersection of computer science, cognitive science, communication, health and social sciences. Dr. Weibel investigates tools, techniques and infrastructure supporting the deployment of innovative interactive multimodal and tangible devices in context, and studies the cognitive consequences of the introduction of this technology in the everyday life. Current work focuses on interactive physical-digital systems that exploit pen-based and touch-based devices, depth-cameras, wearable and mobile devices, in the setting of critical populations such as healthcare and education. Dr. Weibel is author of more than 45 publications on these topics. His work has been funded by the Swiss National Science Foundation, the European Union, Boeing, the US NSF, NIH and AHRQ.


Interacting with the Embodied Mind


Idiap Speaker Series
Date/time:

Oct 31, 2013 11:00 AM

Prof. Francis Quek  

Abstract

Humans do not think like computers. Our minds are 'designed' for us to function as embodied beings in the world in ways that are: 1. Physical-Spatial; 2. Temporal-Dynamic; 3 Social-Cultural; and 4. Affective-Emotional. These aspects of embodiment give us four lenses to understand the embodied mind and how computation/technology may support its function. I adopt a two-pronged to human-computer interaction research by first harnessing technological means to contribute to the understanding of how embodiment ultimately ascends into mind, and second, to inform the design and engineering of technologies that support and augment human higher psychological functions of learning, sensemaking, creating, and experiencing.

In line with the first approach, I shall first show how language, as a core human capacity, is rooted in human embodied function. We will see that mental imagery shapes multimodal (gesture, gaze, and speech) human discourse. In line with the second approach, I shall then present an assemblage of interactive projects that illustrate how our concept of human embodiment can inform technology design through the light of our four lenses. Projects cluster around three application domains, namely 1. Technology for special populations (e.g. mathematics instruction and reading for the blind, games for older adults); 2. Learning and Education (e.g. learning and knowledge discovery through device/display ecologies, creativity support for children); and 3. Experience (e.g. socially-based information access, experience of images, affective communication).


Biography

Francis Quek is a currently Professor of Visualization and TAMU Chancellor’s Research Initiative hire at Texas A&M University. He has formerly been Professor of Computer, Director of the Center for Human-Computer Interaction, and Director of Vision Interfaces and Systems Laboratory at Virginia Tech. He has previously been affiliated with Wright State University, the University of Illinois at Chicago, the University of Michigan, and Hewlett-Packard. Francis received both his B.S.E. summa cum laude (1984) and M.S.E. (1984) in electrical engineering from the University of Michigan. He completed his Ph.D. in Computer Science at the same university in 1990. Francis is a member of the IEEE and ACM. He performs research in embodied interaction, embodied learning and sensemaking, interactive systems for special populations (individuals who are blind, children, older adults), systems to support learning and creativity in children, multimodal verbal/non-verbal interaction, multimodal meeting analysis, vision-based interaction, multimedia databases, medical imaging, assistive technology for the blind, human computer interaction, computer vision, and computer graphics. He has published over 150 peer-reviewed journal and conference articles in human-computer interaction, computer vision, and medical imaging.

Technology Innovation and Related Partnerships – Case Idiap and Nokia


Public
Date/time:

Oct 10, 2013 10:45 AM

Dr. Juha K. Laurila  

Abstract

This talk focuses on technology related innovation within companies like Nokia - and covers the flow from early phase ideas towards the technology transfer and productization. Further, the role of research partnerships as a part of the overall innovation process is discussed. More specifically, various modes of industry-academia collaboration and related drivers for each of them are briefly covered. Aspects like, technology licensing are touched briefly too.

More particularly this presentation focuses on collaboration between Idiap and Nokia as a case study and investigates the role of Idiap-Nokia interactions from the perspective of overall innovation chain. This part covers e.g. Idiap's contribution on Nokia's Call for Research Proposals in 2008, joint initiatives around mobile data (Lausanne Data Collection Campaign 2009-2012 and Mobile Data Challenge 2011-2012) as well as bi-lateral research projects.


The power of the cellphone: small devices for big impact


Idiap Speaker Series
Date/time:

Sep 19, 2013 03:00 PM

Nuria Oliver  

Abstract

There are almost as many mobile phones in the world as humans. The mobile phone is the piece of technology with the highest levels of adoption in human history. We carry them with us all through the day (and night, in many cases). Therefore, mobile phones have become sensors of human activity in the large scale and also the most personal devices.

In my talk, I will present some of the work that we are doing at Telefonica Research in the area of mobile computing, both in terms of analyzing and understanding large-scale human behavioral data from mobile traces and in designing novel mobile systems in the areas of healthcare, education and information access.


The LiveLabs Urban LifeStyle Innovation Platform : Opportunities, Challenges, and Current Results


Public
Date/time:

Sep 13, 2013 03:00 PM

Rajesh K. Balan  

Abstract

A central question in mobile computing is how do you test mobile applications, that depend on real context, in real environments with real users? User studies done in lab environments are frequently insufficient to understand the real-world interactions between user context, environmental factors, application behaviour, and performance results. In this talk, I will describe LiveLabs, a new 5 year project that started at the Singapore Management University in early 2012. The goal of LiveLabs is to convert four real environments, the entire Singapore Management University campus, a popular resort island, a large airport, and a popular shopping mall, into living testbeds where we instrument both the environment and the cell phones of opted-in participants (drawn from the student population and members of the public). We can then provide 3rd party companies, and researchers the opportunity to test their mobile applications and scenarios on the opted-in participants -- on their real phones in the four real environments described above. LiveLabs will provide the software necessary to collect network statistics and any necessary context information. In addition, LiveLabs will provide software and mechanisms to ensure that privacy, proper participant selection, resource management, and experimental results and data are maintained and provided on a need-to-know basis to the appropriate parties.

I will describe the broad LiveLabs vision and identify the key research challenges and opportunities. In particular, I will highlight our current insight into indoor location tracking, dynamic group and queue detection, and energy aware context sensing for mobile phones.


Detecting Conversing Groups in Still Images


Public
Date/time:

Sep 13, 2013 11:00 AM

Hayley Hung  

Abstract

In our daily lives, we cannot help but communicate with people. Aside from organised and more structured communication like emails, meetings, or phone calls, we communicate instantaneously and often in adhoc, freely formed groups where it is not known beforehand how long the conversation will last for, who will be in the conversation, or what it will be about. In crowded settings like a

conference, for example, this type of conversing group exists and who gravitates towards whom tells us a lot about the relationship between the members of the group. In this talk, I will discuss the challenges of this problem, solutions, and open questions of this emerging topic.


Biometric Recognition: Sketch to photo matching, Tattoo Matching and Fingerprint Obfuscation


Idiap Speaker Series
Date/time:

Sep 03, 2013 02:00 PM

Prof. Anil K. Jain  

Abstract

http://biometrics.cse.msu.edu

http://scholar.google.com/citations?user=g-_ZXGsAAAAJ&hl=en

If you are like many people, navigating the complexities of everyday life depends on an array of cards and passwords that confirm your identity. But lose a card, and your ATM will refuse to give you money. Forget a password, and your own computer may balk at your command. Allow your card or passwords to fall into the wrong hands, and what were intended to be security measures can become the tools of fraud or identity theft. Biometrics - the automated recognition of people via distinctive anatomical and behavioral traits has the potential to overcome many of these problems.

Biometrics is not a new idea. Pioneering work by several British scholars, including Fauld, Galton and Henry in the late 19th century established that fingerprints exhibit a unique pattern that persists over time. This set the stage for the development of Automatic Fingerprint Identification Systems that are now used by law enforcement agencies worldwide. The success of fingerprints in law enforcement coupled with growing concerns related to homeland security, financial fraud and identity theft has generated renewed interest in research and development of biometric systems. It is, therefore, not surprising to see biometrics permeating our society (laptops and mobile phones, border crossing, civil registration, and access to secure facilities). Despite these successful deployments, biometrics is not a panacea for human recognition. There are challenges related to data acquisition, image quality, robust matching, multibiometrics, biometric system security and user privacy. This talk will introduce three challenging problems of particular interest to law enforcement and border crossing agencies: (i) face sketch to photo matching, (ii) scars, marks & tattoos (SMT) and (iii) fingerprint obfuscation.


Biography

Anil K. Jain is a University Distinguished Professor in the Department of Computer Science at Michigan State University where he conducts research in pattern recognition, computer vision and biometrics. He has received Guggenheim fellowship, Humboldt Research award, Fulbright fellowship, IEEE Computer Society Technical Achievement award, W. Wallace McDowell award, IAPR King-Sun Fu Prize, and ICDM Research Award for contributions to pattern recognition and biometrics. He served as the Editor-in-Chief of the IEEE Trans. Pattern Analysis and Machine Intelligence and is a Fellow of ACM, IEEE, AAAS, IAPR and SPIE. Holder of eight patents in biometrics, he is the author of several books. ISI has designated him as a highly cited author. He served as a member of the National Academies panels on Information Technology, Whither Biometrics and Improvised Explosive Devices (IED). He also served as a member of the Defense Science Board. His H-index is 137 (Source: Google Scholar).

Component Analysis for Human Sensing


Idiap Speaker Series
Date/time:

Aug 29, 2013 11:00 AM

Dr. Fernando De la Torre  

Abstract

Enabling computers to understand human behavior has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human computer interaction, and social robotics. A critical element in the design of any behavioral sensing system is to find a good representation of the data for encoding, segmenting, classifying and predicting subtle human behavior. In this talk I will propose several extensions of Component Analysis (CA) techniques (e.g., kernel principal component analysis, support vector machines, spectral clustering) that are able to learn spatio-temporal representations or components useful in many human sensing tasks.

In the first part of the talk I will give an overview of several ongoing projects in the CMU Human Sensing Laboratory, including our current work on depression assessment from videos. In the second part, I will show how several extensions of CA methods outperform state-of-the-art algorithms in problems such as facial feature detection and tracking, temporal clustering of human behavior, early detection of activities, weakly-supervised visual labeling, and robust classification. The talk will be adaptive, and I will discuss the topics of major interest to the audience.


Biography

Fernando De la Torre received his B.Sc. degree in Telecommunications (1994), M.Sc. (1996), and Ph. D. (2002) degrees in Electronic Engineering from La Salle School of Engineering in Ramon Llull University, Barcelona, Spain. In 2003 he joined the Robotics Institute at Carnegie Mellon University, and since 2010 he has been a Research Associate Professor. Dr. De la Torre's research interests include computer vision and machine learning, in particular face analysis, optimization and component analysis methods, and its applications to human sensing. He is Associate Editor at IEEE PAMI and leads the Component Analysis Laboratory (http://ca.cs.cmu.edu) and the Human Sensing Laboratory (http://humansensing.cs.cmu.edu).

Signal Analysis using Autoregressive Models of Amplitude Modulation


Public
Date/time:

Aug 23, 2013 11:00 AM

Dr. Sriram Ganapathy  

Abstract

Conventional speech analysis techniques are based on estimating the spectral content of relatively short (about 10-20 ms) segments of the signal. However, an alternate way to describe a speech signal is a long-term summation of amplitude modulated frequency bands, where each frequency band consists of a smooth envelope (gross structure) modulating a carrier signal (fine structure). We develop an auto-regressive (AR) modeling approach for estimating the smooth envelope of the sub-band signal. This model, referred to as frequency domain linear prediction (FDLP), is based on the application of linear prediction on discrete cosine transform of the signal and it describes the perceptually dominant peaks in the signal while removing the finer details. This suppression of detail is useful for developing a parametric representation of speech/audio signals. In this talk, I will also show several applications of the FDLP model for speech and audio processing systems.

In the last leg of the talk, I will focus on our recent efforts at IBM for speech analysis in noisy radio communication channels. This will highlight the challenges involved along with a few solutions addressing parts of the problem.


Biography

Sriram Ganapathy received his Doctor of Philosophy from the Center of Language and Speech Processing, Johns Hopkins University in January 2012. Prior to this, he obtained his Bachelor of Technology from College of Engineering, Trivandrum, India in 2004 and Master of Engg. from Indian Institute of Science, Bangalore in 2006. He has worked as a Research Assistant in Idiap Research Institute, Switzerland from 2006 to 2008 working on speech and audio projects. Currently, he is a post-doctoral researcher at IBM T.J. Watson Research Center working on signal analysis methods for radio communication speech in highly degraded environments. His research interests include signal processing, machine learning and robust methodologies for speech and speaker recognition.

Three Factor Authentication for Commodity Hand-Held Communication Devices


Public
Date/time:

Jul 17, 2013 02:00 PM

Prof Brian C. Lovell  

Abstract

User authentication to online services is at a cross-roads. Attacks are increasing, and current authentication schemes are no longer able to provide adequate protection. The time has come to include the third factor of authentication, and start using biometrics to authenticate people. However, despite signficant progress in biometrics, they still suffer from a major mode of attack: replay attacks, where biometric signals may be captured previously and reused. Replay attacks defeat all current liveness tests. Current literature recognises replay attacks as a significant issue, but there are no practical and tested solu- tions available today. The purpose of this research is to improve authentication to online services by including a face recognition biometric, as well as providing one solution to the replay attack problem for the proposed face recognition system. If this research is success- ful, it will enable the use of enhanced authentication mechanisms on mobile devices, and open new research into methods of addressing biometric replay attacks.


Biography

Brian C. Lovell was born in Brisbane, Australia in 1960. He received a BE in electrical engineering Honours I) in 1982, a BSc in computer science in 1983, and a PhD in signal processing in 1991: all from the University of Queensland (UQ). Professor Lovell is Project Leader of the Advanced Surveillance Group in the School of ITEE, UQ. He served as President of the International Association of Pattern Recognition 2008-2010, and is a Fellow of the IAPR, Senior Member of the IEEE, Fellow of the IEAust, and voting member for Australia on the Governing Board of the International Association for Pattern Recognition since 1998. Professor Lovell was Program Co-Chair of ICPR2008 in Tampa, Florida, and was General Co-Chair of ACPR2011 in Beijing, and General Co-Chair of ICIP2013 in Melbourne. His Advanced Surveillance Group works with port, rail and airport organizations as well as several national and international agencies to identify and develop solutions addressing operational and security concerns. http://itee.uq.edu.au/~lovell/ http://scholar.google.com.au/citations?user=gXiGxcMAAAAJ&hl=en

Biosignals and Interfaces


Public
Date/time:

May 14, 2013 11:00 AM

Prof. Tanja Schultz  

Abstract

Human communication relies on signals like speech, mimics, or gestures and the interpretation of these signals seems to be innate to humans. In contrast, human interaction with machines and thus human communication mediated through machines is far from being natural. To date, it is restricted to few channels and the capabilities of machines to interpret human signals are still very limited.

At the Cognitive Systems Lab (CSL) we explore human-centered cognitive systems to improve human-machine interaction as well as machine-mediated human communication. We aim to benefit from the strength of machines by departing from just mimicking the human way of communication. Rather we focus on considering the full range of biosignals emitted from the human body, such as electrical biosignals like brain and muscle activity. These signals can be directly measured and interpreted by machines, leveraging emerging wearable, small and wireless sensor technologies. Using these biosignals offers an inside perspective on human mental activities, intentions, or needs and thus complement the traditional way of observing humans from the outside.

In my talk I will discuss ongoing research on "Biosignals and Interfaces" at CSL, such as speech recognition, silent speech interfaces that rely on articulatory muscle movement, and interfaces that use brain activity to determine users' mental states, such as task activity, cognitive workload, attention, emotion, and personality. We hope that our research will lead to a new generation of human centered systems, which are completely aware of the users' needs and provide an intuitive, efficient, robust, and adaptive input mechanism to interaction and communication.


Biography

Tanja Schultz received her Ph.D. and Masters in Computer Science from University Karlsruhe, Germany in 2000 and 1995 respectively and got a German Staatsexamen in Mathematics, Sports, and Educational Science from University of Heidelberg, in 1990. She joined Carnegie Mellon University in 2000 and became a Research Professor at the Language Technologies Institute. Since 2007 she is also a Full Professor at the Department of Informatics of the Karlsruhe Institute of Technology (KIT) in Germany. She is the director of the Cognitive Systems Lab, where her research activities focus on human-machine interfaces with a particular area of expertise in rapid adaptation of speech processing systems to new domains and languages. She co-edited a book on this subject and received several awards for this work. In 2001 she received the FZI price for an outstanding Ph.D. thesis. In 2002 she was awarded the Allen Newell Medal for Research Excellence from Carnegie Mellon for her contribution to Speech Translation and the ISCA best paper award for her publication on language independent acoustic modeling. In 2005 she received the Carnegie Mellon Language Technologies Institute Junior Faculty Chair. Her recent research focuses on human-centered technologies and intuitive human-machine interfaces based on biosignals, by capturing, processing, and interpreting signals such as muscle and brain activities. Her development of silent speech interfaces based on myoelectric signals was in the top-ten most important attractions at CeBIT 2010, received best demo and paper awards in 2006 and 2013, and was awarded with the Alcatel-Lucent Research Award for Technical Communication in 2012. Tanja Schultz is the author of more than 250 articles published in books, journals, and proceedings. She is a member of the Society of Computer Science (GI) for more than 20 years, of the IEEE Computer Society, and the International Speech Communication Association ISCA, where she serves her second term as an elected ISCA Board member.

Perceptually motivated speech recognition and mispronunciation detection


Public
Date/time:

Dec 12, 2012 04:00 PM

Christos Koniaris, PhD.  

Abstract

Chris will be presenting his doctoral thesis as the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes.


Incorporation of phonetic constraints in acoustic- to-articulatory inversion


Public
Date/time:

Dec 10, 2012 10:00 AM

Blaise Potard, PhD.  

Abstract

Blaise will be talking about his doctoral research on the acoustic-to-articulatory inversion problem. The main aim of his Ph. D. was to investigate the use of additional constraints (phonetical and visual) to improve the realism of the solutions found by an existing inversion framework. This research was conducted in LORIA, Nancy, France, under the supervision of Yves Laprie.


Grapheme-to-Phoneme (G2P) Training and Conversion with WFSTs


Public
Date/time:

Jul 30, 2012 01:30 PM

Josef Novak  

Abstract

The talk is of tutorial nature. Basically, a hands-on introduction to using some of the features of OpenFst-based G2P toolkit, Phonetisaurus, developed by Josef Novak with some high-level background information and a description of the features/shortcomings/goals of the toolkit.

The slides, a special tutorial distribution, and cut-and-paste terminal commands in wiki format can be found on the Phonetisaurus googlecode site,

Home page and code:

http://code.google.com/p/phonetisaurus/ (see the downloads' section of the lefthand sidebar)

Copy-and-paste tutorial companion:

http://code.google.com/p/phonetisaurus/wiki/FSMNLPTutorial


Biography

Josef Novak is currently a Ph.D. student in Hirose-Minematsu laboratory, in the EEIC department at the University of Tokyo. More information: http://www.gavo.t.u-tokyo.ac.jp/~novakj/

On the beauty of Online Selective Sampling


Public
Date/time:

May 02, 2012 11:00 AM

Francesco Orabona  

Abstract

Online selective sampling is an active variant of online learning in which the learner is allowed to adaptively subsample the labels of an observed sequence of feature vectors. The learner's goal is to achieve a good trade-off between mistakes rate and number of sampled labels. This can viewed as an abstract protocol for interactive learning applications. For example, a system for categorizing stories in a newsfeed asks for human supervision whenever it feels that more training examples are needed to keep the desired accuracy.

A formal theory, almost assumptionless, that allows to calculate exact confidence values on the predictions will be presented. Using this theory, two selective sampling algorithms that use regularized least squares (RLS) as base classifier will be shown. These algorithms have formal guarantees on the performance and the maximum number of labels queried. Moreover the RLS is easy and efficient to implement and empirical results will be shown as well to validate the theoretical results.


Overview of some research activities at Australia s Commonwealth Scientific and Industrial Research Organisation (CSIRO)


Public
Date/time:

Apr 20, 2012 02:00 PM

Eric Lehmann  

Abstract

CSIRO is Australia's national science agency and one of the largest and most diverse research organisations in the world. It employs over 6000 scientists at more than 50 centres throughout Australia and overseas. The core research undertaken at CSIRO focuses on the main challenges facing Australia at present time, and includes research areas such as health, agriculture and food supply, mineral resources and mining, information and communication technologies, understanding climate change, and sustainable management of the environment, the oceans and water resources. In this talk, I will present an overview of my recent research work at CSIRO, which involves aspects of Bayesian filtering and hierarchical modelling for applications related to environmental mapping and monitoring, and model-data fusion for water resource assessment at continental scale.

About the presenter:

Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Diploma in Electrical Engineering. He received the M.Phil. and Ph.D. degrees, both in Electrical Engineering, from the Australian National University (Canberra) in 2000 and 2004 respectively. Between 2004 and 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, WA, where he was active in the field of acoustics and array signal processing, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker tracking. He is now working as a Research Scientist for CSIRO in Perth, within the division of Mathematics, Informatics and Statistics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatio-temporal data for environmental mapping and monitoring. He also contributes to the scientific research on Bayesian hierarchical methods for the assimilation of soil moisture satellite data with modeled estimates (model-data fusion) for water resource management.


Biography

CSIRO is Australia's national science agency and one of the largest and most diverse research organisations in the world. It employs over 6000 scientists at more than 50 centres throughout Australia and overseas. The core research undertaken at CSIRO focuses on the main challenges facing Australia at present time, and includes research areas such as health, agriculture and food supply, mineral resources and mining, information and communication technologies, understanding climate change, and sustainable management of the environment, the oceans and water resources. In this talk, I will present an overview of my recent research work at CSIRO, which involves aspects of Bayesian filtering and hierarchical modelling for applications related to environmental mapping and monitoring, and model-data fusion for water resource assessment at continental scale.

Fractal Marker Fields


Public
Date/time:

Apr 20, 2012 11:00 AM

Marketa Dubska  

Abstract

Many augmented reality systems are using fiduciary markers to localize the camera in the 3D scene. One big disadvantage of the markers used today is that the camera motion is tightly limited: the marker (one of the markers) must be visible and it must be observed at a proper scale.

This talk presents a fractal structure of markers similar to matrix codes (such as QRcode or DataMatrix): the Fractal Marker Field. The FMF allows for embedding markers of a virtually unlimited number of scales. At the same time, for each of the scales it guarantees a constant density of markers at that scale. The talk sketches out construction of FMF and a baseline algorithm for detecting the markers.


Parallel Coordinates and Hough Transform


Public
Date/time:

Apr 19, 2012 11:00 AM

Marketa Dubska  

Abstract

Parallel coordinates provide coordinate system used mostly or solely for high-dimensional data visualization. There exist only few applications which used them for computational tasks. We proposed new utilization of them - as a new line parametrization for Hough transform. This parameterization, called PClines, outperform the existing approaches in terms of accuracy. Besides, PClines are computationally extremely efficient, require no floating-point operations, and can be easily accelerated by different hardware architectures. What is more, regular patterns as grids and groups of parallel lines can be effectively detected by this parameterization.


Cost Minimization of WaldBoost Classifiers


Public
Date/time:

Apr 18, 2012 11:00 AM

Roman Juranek  

Abstract

Detection of objects in computer vision is a complex task. One of most popular and well explored approaches is use of statistical classifiers and scanning windows. In this approach, classifiers learned by AdaBoost algorithm are often used as they achieve low error rates and high detection rates. Process of object detection can be implemented by various methods. For the purpose of acceleration, graphics hardware, multi-core architectures, SIMD or custom hardware can be used. In this talk I will present a method which enhance object detection performance with respect to an user defined cost function. The method balances computations of previously learned classifier between two or more different implementations in order to minimize the cost function. The method is verified on a basic example - division of classifier to a pre-processing unit implemented in FPGA, and a post-processing unit in a standard PC. The technique has its application mainly in the design of low power smart cameras.


Recent work at Graph@FIT


Public
Date/time:

Apr 17, 2012 11:00 AM

Roman Juranek  

Abstract

In this talk, I will present the ongoing work of the graphics and video processing groups on FIT BUT. In the past, we participated in several successful projects, such as Center of Computer Graphics or FP6/FP7 projects. Currently, we participate in Artemis JU projects R3COP (development of robotic systems), SMECY (algorithms and compilers for embedded systems) and RECOMP, FP7 projects, such as SRS or TA2, and projects funded from the structural funds of the EU, such as Center of Excellence IT4I (IT for Innovations). Our research topics include, for example, statistical classification based object detection and recognition, environment mapping for mobile robots, augumented reality, real-time rendering and more. I will shortly present important results of our research.


The magical, two-dimensional world of graphene


Public
Date/time:

Mar 09, 2012 11:00 AM

Prof. Philippe Jacquod  

Abstract

Carbon comes into different forms: graphite and diamond have been known for centuries, while fullerenes, buckyballs and carbon nanotubes, were discovered in the second half of the twentieth century. A new allotrope of carbon was isolated in 2004: graphene, which is a one-atom thick, two-dimensional lattice of carbon atoms. The discovery of graphene generated an almost unprecedented hype in physics. As a matter of fact, graphene has proven to be the material of all superlatives. It is the thinnest, but also the strongest, the stiffest but also the most stretchable of all crystals. Its electronic properties, together with its dimensionality, make it a strong potential candidate for replacing silicon in information processors. In this colloquial presentation, I will make a general introduction to the wonder material graphene, stressing its exceptional electronic and mechanical properties, sketching the many surprises it gave us and discussing future potential applications. In the last part of my talk, I will summarize some of our recent investigations on the local topography and spectroscopy of graphene [Xue et al., Nature Materials 10, 282 (2011); Yankowitz et al., Nature Physics (in press, 2012)]. The presentation is intended to be pedagogical and directed at a general, nonspecialist audience of scientists.


Biography

Philippe Jacquod studied physics at the ETHZ and the University of Neuchatel, where he obtained his PhD in 1997. He was a postdoctoral associate at Yale University from 1997 to 2000 and at the University of Leiden from 2000 to 2003. He became assistant professor of theoretical physics at the University of Geneva in 2003. He joined the physics department at the University of Arizona in 2006, where he is now a professor of physics and optical sciences. His field of research is in condensed matter physics, with a focus on quantum transport and nanophysics.

Extended Pen+ Tools for Multimodal Analysis and Interaction


Public
Date/time:

Jan 31, 2012 11:00 AM

Nadir Weibel  

Abstract

Access to information is one of the most crucial aspects of everyday life. As computation becomes ubiquitous and our environment is enriched with new possibilities for communication and interaction, the existing infrastructure of science, business, and social interaction is confronted with the difficult challenges of supporting complex tasks, mediating networked interactions, and managing the increasing availability of digital information and technology. Despite the tremendous development in terms of both new digital devices and novel interaction techniques that we all witnessed during the last years, it is almost unbelievable how paper documents and pen-based interaction still represent a very important way of interacting with both physical and digital information spaces. In an effort of re-thinking what pen and paper user interfaces (PPUI) mean in a modern world, we are studying multi-modal interactions of pen+ a range of tangible devices at the intersection of the physical and the digital worlds.

In this talk I will present my latest research around pen- and paper-computing, looking at how multimodal interaction with this "very old" technology enables a range of novel affordances and supports communication and interaction.

In the first part of the talk, I will speak about the development of new systems and prototypes that encompasses pen and other modalities, such as speech and gestures, different devices, such as smart phones, tablets, high-resolution wall displays, as well as different domains such as healthcare, accessibility, data visualization and interaction, social networks, augmented office environments, and communication for early education, older adults and other specific populations. I will present some examples of the prototypes we developed and some brief extracts of the data we collected about their usage in the wild.

The second part of the talk will focus on pen- and paper-based techniques and tools to get richer access to multimodal data in various contexts. While a new generation of inexpensive digital recording devices and storage facilities is revolutionizing data collection in behavioral science, one of the main obstacles to fully capitalizing on this opportunity is the huge time investment required for analysis using current methods. To address this analysis bottleneck we developed ChronoViz, a system providing synchronized interactive visual representations of multiple data streams. By using two multimodal datasets (a recent study of pilot/co-pilot interaction in a Boeing 787 simulator, and an ongoing learning analytics research project), I will present how the analysis tool works and how the integration of paper-based annotations, analysis, and interactions as part of the tool itself enable the exploration of new exciting methods for observational research.


Biography

Dr. Nadir Weibel is a Post-doctoral fellow at the University of California San Diego, member of both the Distributed Cognition and Human-Computer Interaction Laboratory and the Ubiquitous Computing and Social Dynamics research group. He holds a Bachelor and Master in Computer Science from ETH Zurich (Dipl. Informatik-Ing. ETH), and a Ph.D. in Computer Science also from ETH Zurich. During his Ph.D, he explored new ways of enhancing a seemingly mundane, but ubiquitous, resource such as paper to support everyday work, interaction and collaboration as a member of the Global Information Systems research group at ETH. His current research is situated at the intersection of computer science, communication, and social sciences, studying the cognitive consequences of the introduction and the deployment of interactive multimodal and tangible devices. His main interests ranges from software engineering to human computer interaction, including computer supported collaborative work, mobile and ubiquitous computing. In his work he is developing theory and methods, designing representations, implementing prototypes, and evaluating the effectiveness of interactive physical-digital systems in order to understand the broader design space in which they are situated. He is currently collaborating with researchers at UCSD, Stanford, Berkeley, Drexel University, Children's Hospital in Washington DC, TU Darmstadt, INRIA Paris / Université Paris Sud and Telecom Paristech.

Combining Transcription-based and acoustic-based speaker identifications for Broadcast news


Public
Date/time:

Dec 22, 2011 02:00 PM

Sylvain Meignier, Le Maine University, F  

Abstract

In this presentation, we consider the issue of speaker identification within audio records of broadcast news. The speaker identity information is extracted from both transcript-based and acoustic-based speaker identification systems. This information is combined in the belief functions framework, which makes coherent the knowledge representation of the problem. The Kuhn-Munkres algorithm is used to optimize the assignment problem of speaker identities and speaker clusters. Experiments carried out on French broadcast news from the French evaluation campaign ESTER show the efficiency of the proposed combination method.

keywords: speaker identification, speaker diarization, belief functions.


Speaker Verification Using the Spectral and Time Parameters of Voice Signal


Public
Date/time:

Dec 20, 2011 02:00 PM

Prof. Victor Sorokin  

Abstract

The speaker verification system developed in the VOXSEAL project is based on variations in formantfrequencies at stationary fragments and transient processes of vowels, the spectral features of fricative sounds, and theduration of speech segments. The best features are chosen for each word from the fixed list of Russian numerals rangingfrom zero to nine. The password phrase is randomly generated by the system at each verification. The compensation fordynamic noise and the counteraction with respect to interference using the reproduction of the intercepted and recorded speech are provided by the repeated reproduction of several words. The total error probabilities for male andfemale voices are 0.006 and 0.025%, respectively, for 30 million tests, 429 speakers, and a maximum length of the passwordphrase of 10 words. Note that the probabilities of false identification and false rejection are almost equal

Author - Prof. Victor Sorokin, R&D Director OOO Voxseal, Skolkovo-Moscow Russian national, MSc. from Moscow Aviation Institute, PhD (Engineering), Doctor of Sc. Physics and Mathematics (1987). Leading Researcher of the Institute for Information Transmission Problems of Russian Academy of Sciences, member of the Acoustical Society of America, board member of the Russian Acoustical Society, author of the monographs "Theory of Speech Production" and "Speech Synthesis", and about 150 publications, owner of 8 patents in speech technology.


Biography

The speaker verification system developed in the VOXSEAL project is based on variations in formantfrequencies at stationary fragments and transient processes of vowels, the spectral features of fricative sounds, and theduration of speech segments. The best features are chosen for each word from the fixed list of Russian numerals rangingfrom zero to nine. The password phrase is randomly generated by the system at each verification. The compensation fordynamic noise and the counteraction with respect to interference using the reproduction of the intercepted and recorded speech are provided by the repeated reproduction of several words. The total error probabilities for male andfemale voices are 0.006 and 0.025%, respectively, for 30 million tests, 429 speakers, and a maximum length of the passwordphrase of 10 words. Note that the probabilities of false identification and false rejection are almost equal

Building-up child-robot relationship for therapeutic purposes


Public
Date/time:

Nov 02, 2011 04:00 PM

Joan Pons  

Abstract

Socially assistive robots (SAR) have shown to be very promising in therapeutic programs with children. Health-related goals such as in-clinic rehabilitation or quality of life improvement have been achieved through social interaction. In this context, robot's effectiveness depends strongly in its ability to elicit long-term engagement in children. To explore the dynamics of social bonds emergence with robots a field study with 49 sixth grade scholars (aged 11-12 years) and 4 different robots was carried out at an elementary school. Children's preferences, expectations on functionality and communication, and interaction behavior were studied. The results showed that different robots appearance and performance elicit in children distinctive perceptions and interactive behavior, and affect social processes as role attribution and attachment. In a similar way, to explore the requirements of an effective human-robot interaction, a quiz game was developed. A NAO robot was used to play the popular game of the 20 questions to evaluate different interaction capabilities (i.e. face following, speech recognition, visual and audio queues, and personalization).


Biography

Joan Saez Pons did his PhD at the Mobile Machines and Vision Lab (MMVL), Sheffield Hallam University, UK with the topic of multi-robot systems to collaborate with humans. He was as well a Marie-Curie researcher at the Cognitive Neuroscience Department (KN) at University of Tuebingen, Germany. He has been working at the Technical Research Centre for Dependency Care and Autonomous Living (CETpD), UPC, BarcelonaTech, in the field of social robotics and human-robot interaction. His research interests include mobile robotics navigation, multi-robot systems, cognitive robotics and human-robot interaction.

Convex Relaxation Methods for Image Processing


Public
Date/time:

Sep 08, 2011 11:00 AM

Xavier Bresson  

Abstract

This talk will introduce recent methods to compute optimal solutions to fundamental problems in image processing. Several meaningful problems in imaging are usually defined as non-convex energy minimization problems, which are sensitive to initial condition and slow to minimize. The ultimate objective of our work is to overcome the bottleneck problem of non-convexity. In other words, our goal is to "convexify" the original problems to produce more robust and faster algorithms for real-world applications. Our approach consists in finding a convex relaxation of the original non-convex optimization problems and thresholding the relaxed solution to reach the solution of the original problem. We will show that this approach is able to convexify important and difficult image processing problems such as image segmentation based on the level set method and image registration. Our algorithms are not only guaranteed to find a global solution to the original problem, they are also at least as fast as graph-cuts combinatorial techniques while being more accurate. Finally, I will introduce recent promising extensions of this approach in machine learning.


Biography

Prof. Xavier Bresson received his B.A. of Physics from University of Marseille and his Master of Electrical Engineering from Ecole Superieure d'Electricite in Paris, France. He got his Ph.D. at the Swiss Federal Institute of Technology (EPFL) in 2005. From 2006 to 2010, he was a Postdoctoral Scholar in the Department of Mathematics at University of California, Los Angeles (UCLA). In 2010, he joined the Department of Computer Science at City University of Hong Kong as Tenure-Track Assistant Professor. His current research works are focused on convex relaxation methods and unified geometric methods in image processing and machine learning. He has published 38 papers in international journals and conferences.

Scalable multi-class/multi-view object detection


Public
Date/time:

May 13, 2011 02:30 PM

Mr. Nima Razavi  

Abstract

Scalability of object detectors with respect to the number of classes/views is a very important issue for applications where many object classes need to be detected. While combining single-class detectors yields a linear complexity for testing, multi-class detectors that localize all objects at once come often at the cost of a reduced detection accuracy. In this work, we present a scalable multi-class detection algorithm which scales sublinearly with the number of classes without compromising accuracy. To this end, a shared discriminative codebook of feature appearances is jointly trained for all classes and detection is also performed for all classes jointly. Based on the learned sharing distributions of features among classes, we build a taxonomy of object classes. The taxonomy is then exploited to further reduce the cost of multi-class object detection. Our method has linear training and sublinear detection complexity in the number of classes. We have evaluated our method on the challenging PASCAL VOC'06 and PASCAL VOC'07 datasets and show that scaling the system does not lead to a loss in accuracy.


Latent Feature Models for the Structure and Meaning of Text


Public
Date/time:

Mar 11, 2011 11:00 AM

James Henderson and Paola Merlo  

Abstract

Much of the meaning of text is reflected in individual words or phrases, but its full information content requires structured analyses of the syntax and semantics of natural language. Our work on methods for extracting such structured meaning representations from natural language has focused on the joint modelling of syntactic and semantic dependency structures. We have addressed this problem by using latent variables to model correlations between these two structures without strong prior assumptions about the nature of these correlations. These models have achieved state-of-the-art results in both syntactic parsing and semantic role labelling across several languages. We have also used them to exploit syntactic information in correcting semantic roles automatically transferred from translations.

Our use of latent variable models is in part motivated by the recognition that the supervised learning paradigm is becoming increasingly impractical as research in natural language processing moves to more complex, deeper levels of semantic analysis. By developing robust efficient methods for learning latent representations, we hope to be able to induce semantic representations from large quantities of data for weakly correlated tasks, such as machine translation. Our latent variable models use vectors of latent features for robust learning and exploit neural networks for efficient approximate inference, while still exploiting methods from dependency parsing for efficient decoding with sufficiently powerful models.

(Work with Ivan Titov, Lonneke van der Plas, Nikhil Garg, and Andrea Gesmundo.)


Face Recognition and Intelligent Video Surveillance


Public
Date/time:

Nov 03, 2010 02:00 PM

Prof Stan Z. Li  

Abstract

Face recognition and intelligent video surveillance are important areas for the next generation ID management and public security.

In this talk, challenges and recent advances and applications of face biometric and intelligent video surveillance technologies will be described.


Biography

Stan Z. Li received his B.Eng from Hunan University, China, M.Eng from National University of Defense Technology, China, and PhD degree from Surrey University, UK. He is currently a professor and the director of Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences (CASIA). He worked at Microsoft Research Asia as a researcher from 2000 to 2004. Prior to that, he was an associate Professor at Nanyang Technological University, Singapore. He was elevated to IEEE Fellow for his contributions to the fields of face recognition, pattern recognition and computer vision.

Social Sensing for Epidemiological Behavior Change


Public
Date/time:

Oct 01, 2010 04:00 PM

Anmol Madan  

Abstract

An important question in behavioral epidemiology and public health is to understand how individual behavior is affected by illness and stress. Although changes in individual behavior are intertwined with contagion, epidemiologists today do not have sensing or modeling tools to quantitatively measure its effects in real-world conditions. We propose a novel application of ubiquitous computing. We use mobile phone based co-location and communication sensing to measure characteristic behavior changes in symptomatic individuals, reflected in their total communication, interactions with respect to time of day (e.g., late night, early morning), diversity and entropy of face-to-face interactions and movement. Using these extracted mobile features, it is possible to predict the health status of an individual, without having actual health measurements from the subject. Finally, we estimate the temporal information flux and implied causality between physical symptoms, behavior and mental health.


Biography

Anmol Madan recently completed his PhD at the MIT Media Lab, with Prof. Alex Pentland. Currently, he is working as a post doctoral researcher at Northeastern University and Harvard University with Prof. David Lazer. He has received honors from the MIT 100k Competition and the MIT Enterprise Forum for various startup-related ideas. His research interests are in modeling human behavior using large-scale mobile phone sensor datasets, using applied machine learning and data mining methods. You might have also read about his research in popular media like CNN, BBC, New York Times, Wired, BusinessWeek and Slashdot.

Tell Me Where You have Lived, and I will Tell You What You Like: Adapting Interfaces to Cultural Preferences


Public
Date/time:

Sep 06, 2010 11:00 AM

Abraham Bernstein  

Abstract

Adapting user interfaces to cultural preferences has been shown to improve a user's performance, but is oftentimes foregone because of its time-consuming and costly procedure. Moreover, it is usually limited to producing one uniform user interface (UI) for each nation disregarding the intangible nature of cultural backgrounds. To overcome these problems, we exemplify a new approach with our culturally adaptive web application MOCCA, which is able to map information in a cultural user model onto adaptation rules in order to create personalized UIs. Apart from introducing the adaptation flexibility of MOCCA, the talk describes a study with 30 participants in which we compared UI preferences to MOCCA's automatically generated UIs. Another experiment with over 40 participants from 3 coutnries showed a performance improvement for culturally adapted UIs over Results confirm that automatically predicting cultural UI preferences is possible, paving the way for low-cost cultural UI adaptations.


Biography

Abraham Bernstein is a full professor of informatics at the University of Zurich, Switzerland. His current research focuses on various aspects of the semantic web, knowledge discovery, service discovery/matchmaking, and mobile/pervasive computing. His work is based on both social science (organizational psychology/sociology/economics) and technical (computer science, artificial intelligence) foundations. Mr. Bernstein is a Ph.D. from MIT and has a Diploma in Computer Science (comparable to a M.S.) from the Swiss Federal Institute in Zurich (ETH). He is the program chair of this year's ISWC and on the editorial board of the International Journal on Semantic Web and Information Systems, the Informatik Spektrum by Springer, Journal of the Association for Information Systems, and the newly approved ACM Transactions on Intelligent Interactive Systems.

Conjugate Mixture Models for Clustering and Tracking Multimodal Data.


Public
Date/time:

Jun 28, 2010 11:00 AM

Vassil Khalidov  

Abstract

The problem of multimodal tracking arises whenever the same objects are observed through time by different sensors. We address the general case when the observations from different modalities are not necessarily aligned, in the sense that there is no obvious way to associate or to compare them in some common space. Our objective is to construct a model that is able to estimate the number of objects and to cluster the data so that the clusters stay consistent across modalities through time. We use Bayesian treatment and present an approach, based on stochastic optimization and information criteria. The results are illustrated on a multiple audio-visual object tracking task with a ''robot head'' device, comprising a pair of stereoscopic cameras and a pair of microphones.


Statistical and knowledge-centric techniques in Natural Language Understanding: a valuable handshake?


Public
Date/time:

Mar 11, 2010 11:00 AM

Silvia Quateroni  

Abstract

In this talk, I will draw from my experience in Information Retrieval and Spoken Dialogue Systems to discuss a number of situations where statistical (e.g. machine learning) techniques shake hands with knowledge-centric approaches to meet user needs and account for domain knowledge. I will present examples particularly from the areas of Question Answering and Spoken Language Understanding, two research fields that exhibit a number of common points.


Biography

Silvia Quarteroni is a Senior Marie Curie Research Fellow involved in the ADAMACH project at the University of Trento. She received her MSc and BSc in Computer Engineering at the Swiss Federal Institute of Technology in Lausanne (EPFL) and her PhD in Computer Science at the University of York (UK). She has been working in several fields of Natural Language Processing, focusing on human-computer dialogue, information retrieval and personalization. She has published about 30 articles in international conferences and journals and is part of the programme committee of several of these.

Subband temporal envelopes of speech signal and their central role in speech recognition by humans and machines


Public
Date/time:

Mar 05, 2010 11:00 AM

Cong-Thanh Do  

Abstract

The subband temporal envelopes of speech signal have a central role in this presentation which can be split in three parts.

The first part of the presentation deals with the automatic recognition of cochlear implant-like spectrally reduced speech (SRS) [1]. The automatic speech recognition (ASR) system, which was trained on TI-digits database, is HMM-based and the speech feature vectors are the MFCCs along with the delta and acceleration coefficients. We show that from certain SRS spectral resolution, it is possible to achieve word accuracy as good as that attained with the original clean speech even though the SRS is synthesized only from subband temporal envelopes of the original clean speech [2]. This work motivated some perspectives on noise robust ASR and speech feature vector enhancement dedicated to ASR [3].

The human recognition of speech is addressed in the second part of the presentation. We present quantitative analyses on the speech fundamental frequency (F0) in the cochlear implant-like SRS which support the report of Zeng et al. 2005 [4], based on subjective tests, about the difficulty of cochlear implant users in identifying speakers. That is, the F0 distortion in state-of-the-art cochlear implant is great when the SRS, which is acoustic simulation of cochlear implant, is synthesized only from subband temporal envelopes [5]. The analyses revealed also a significant reduction of F0 distortion when the frequency modulation is integrated in cochlear implant, as proposed by Nie et al. 2005 [6]. On the other hand, the results of such quantitative analysis could be exploited to conduct subjective studies in cochlear implant research.

The third part of the presentation concerns the audio-visual speech processing in which a linear relationship between the subband temporal envelopes and the area of mouth opening was mathematically proposed [7]. This proposition is based on the pioneering research of Grant and Seitz [8] in which the author reported different degrees of correlation between acoustic envelopes and visible movements. Our mathematical model helps in estimating the area of mouth opening only from speech acoustics using blind deconvolution techniques [9]. The estimated area of mouth opening is sufficiently correlated with the manually measured ones with an average of correlation coefficients equals 0.73.


Biography

Cong-Thanh Do was born in Hanoi, Vietnam, in 1983. He received the Electrical Engineering degree from Hanoi University of Technology, Hanoi and Grenoble Institute of Technology, Grenoble, France, in 2006, through the Programme de Formation d'Ingénieurs d'Excellence au Vietnam (PFIEV). In 2007, he received the M.S degree in signal, image, speech, and telecommunication from the Grenoble Institute of Technology, Grenoble, France and performed a research internship in the Speech and Cognition Department of GIPSA-Lab, Grenoble, France. He is currently working toward the Ph.D. degree in the Signal and Communications Department, Insitut Télécom, Télécom Bretagne, UMR CNRS 3192 Lab-STICC, Technopôle Brest-Iroise, Brest, France. His current research interests include automatic speech recognition, audio-visual speech processing and statistical signal processing.

IDIAP Newcomers


Public
Date/time:

Jan 30, 2007 05:00 PM

Hervé Bourlard  

Abstract

If you are an IDIAP newcomer and we haven't had a chance to meet yet (e.g., at the previous

similar meeting), I would like to invite you for a meeting all together for informal introduction, discussions, and Q&As.


Dry-run of my PhD defense


Public
Date/time:

Nov 24, 2006 04:00 PM

G. Lathoud  

Abstract

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

Knowing the location of human speakers permits a wide spectrum of applications, including hearing aids, hands-free speech processing in cars, surveillance, intelligent homes and offices, autonomous robots. This thesis focuses on the use of microphone arrays to analyze spontaneous multi-party speech. This is a challenging task, because such speech contains many very short utterances, and people interrupt a lot each other (overlapped speech). Moreover, in order to build applications with the least possible constraints on the users, we use distant microphones only, for example on a meeting room table. Finally, the developed approaches are as unsupervised as possible, having in mind the dominant proportion of non-technical users. We targeted the development of an automatic system that can handle both moving and static speakers, in order to answer the question "Who spoke where and when?". Several issues were investigated, from the signal processing level (where? when?) to the speaker clustering level (who?). The techniques developed in the course of this research were successfully tested on a large variety of real indoor recordings, including cases with multiple moving speakers as well as seated speakers in meetings. The versatility of the proposed techniques is illustrated by a direct application to two related cases: hands-free speech acquisition in cars, and noise-robust speech recognition through telephones. Finally, a close analysis of the speaker clustering results leads to question the linearity of the transmission channel in a real indoor environment, when a speaker is a few meters away from a microphone.


A Music Discovery Engine based on Audio Similarities


Public
Date/time:

Jul 10, 2006 04:00 PM

Nicolas SCARINGELLA  

Abstract

A Music Discovery Engine based on Audio Similarities

In the context of Electronic Music Distribution, huge databases coming from both restoration of existing analog archives and new content have been created and are continuously growing. The biggest online services are now proposing around 2 millions tracks urging for efficient ways to browse collections. Providing the kind of robust access to the world's vast store of music that we currently provide for textual material has been the goal of the Music Information Retrieval (MIR) community over the past 10 years; however, it still remains a very challenging problem in the case of audio data.

Music information is indeed a multifaceted and sometimes complex data set that includes pitch, temporal (i.e. rhythm), harmonic, timbral (e.g. orchestration), textual (i.e. lyrics), symbolic, editorial, and metadata elements (without considering related visual elements). Music information is also extremely dynamic. That is, any given work can have its specific pitches altered, its rhythm modified, its harmony reset, its orchestration changed, its performance reinterpreted, and its performers arbitrarily chosen; yet, somehow, it remains the "same" piece of music as the "original". Within this extraordinarily fluid environment, the concept of "similarity" becomes particularly problematic while being crucial to design audio and music information retrieval systems.

In this talk, we will discuss the concept of similarity between music excerpts and propose possible research directions to build a music discovery engine based on audio analysis.


Prior Knowledge in Kernel Methods (PhD defense rehearsal)


Public
Date/time:

Jun 29, 2006 03:00 PM

Alexei Pozdnoukhov  

Abstract

Kernel Methods are one of the most successful branches of Machine Learning. They allow applying linear algorithms with well-founded properties such as generalization ability, to non-linear real-life problems. Support Vector Machine is a well-known example of a kernel method, which has found a wide range of applications in data analysis nowadays.

In many practical applications, some additional prior knowledge is often available. This can be the knowledge about the data domain, invariant transformations, inner geometrical structures in data, some properties of the underlying process, etc. If used smartly, this information can provide significant improvement to any data processing algorithm. Thus, it is important to develop methods for incorporating prior knowledge into data-dependent models.

The main objective of this thesis is to investigate approaches towards learning with kernel methods using prior knowledge. Invariant learning with kernel methods is considered in more details.


PhD defense Dry run:


Public
Date/time:

May 24, 2006 04:00 PM

Norman Poh  

Abstract

This thesis presentation is about combining multiple systems applied to biometric authentication. Its two-fold contribution is to provide a better understanding of the problem of fusion (w.r.t to correlation, performance strength of individual systems and noise) and to exploit the knowledge of claimed identity to improve the combined system performance. Conditioning on the claimed identity is difficult because one has to deals with a small learning sample size.


Using Auxiliary Sources of Knowledge for Automatic Speech Recognition


Public
Date/time:

May 27, 2005 04:00 PM

Mathew Magimai Doss  

Abstract

This is the second rehearsal of my PhD defense presentation. Your comments and suggestions would be of great help. Thank You!


ACM MultiMedia conference report


Public
Date/time:

Nov 24, 2003 11:00 AM

Florent Monay  

Abstract

I will describe some papers and demos from ACM MultiMedia 2003 and MIR2003 workshop (content-based multimedia information retrieval, home videos browsing/editing, home photos browsing, surveillance, sports video indexing, ...).

A discussion about the corresponding research directions will follow.