|
|
- Info
Talks that were given at Idiap
Title of the talk: Biosignals and Interfaces
Given By: Prof. Tanja Schultz (Karlsruhe Universi
Abstract: Abstract:
Human communication relies on signals like speech, mimics, or gestures and the interpretation of these signals seems to be innate to humans. In contrast, human interaction with machines and thus human communication mediated through machines is far from being natural. To date, it is restricted to few channels and the capabilities of machines to interpret human signals are still very limited.
At the Cognitive Systems Lab (CSL) we explore human-centered cognitive systems to improve human-machine interaction as well as machine-mediated human communication. We aim to benefit from the strength of machines by departing from just mimicking the human way of communication. Rather we focus on considering the full range of biosignals emitted from the human body, such as electrical biosignals like brain and muscle activity. These signals can be directly measured and interpreted by machines, leveraging emerging wearable, small and wireless sensor technologies. Using these biosignals offers an inside perspective on human mental activities, intentions, or needs and thus complement the traditional way of observing humans from the outside.
In my talk I will discuss ongoing research on â??Biosignals and Interfacesâ? at CSL, such as speech recognition, silent speech interfaces that rely on articulatory muscle movement, and interfaces that use brain activity to determine users' mental states, such as task activity, cognitive workload, attention, emotion, and personality. We hope that our research will lead to a new generation of human centered systems, which are completely aware of the users' needs and provide an intuitive, efficient, robust, and adaptive input mechanism to interaction and communication.
Bio:
Tanja Schultz received her Ph.D. and Masters in Computer Science from University Karlsruhe, Germany in 2000 and 1995 respectively and got a German Staatsexamen in Mathematics, Sports, and Educational Science from University of Heidelberg, in 1990. She joined Carnegie Mellon University in 2000 and became a Research Professor at the Language Technologies Institute. Since 2007 she is also a Full Professor at the Department of Informatics of the Karlsruhe Institute of Technology (KIT) in Germany. She is the director of the Cognitive Systems Lab, where her research activities focus on human-machine interfaces with a particular area of expertise in rapid adaptation of speech processing systems to new domains and languages. She co-edited a book on this subject and received several awards for this work. In 2001 she received the FZI price for an outstanding Ph.D. thesis. In 2002 she was awarded the Allen Newell Medal for Research Excellence from Carnegie Mellon for her contribution to Speech Translation and the ISCA best paper award for her publication on language independent acoustic modeling. In 2005 she received the Carnegie Mellon Language Technologies Institute Junior Faculty Chair.
Her recent research focuses on human-centered technologies and intuitive human-machine interfaces based on biosignals, by capturing, processing, and interpreting signals such as muscle and brain activities. Her development of silent speech interfaces based on myoelectric signals was in the top-ten most important attractions at CeBIT 2010, received best demo and paper awards in 2006 and 2013, and was awarded with the Alcatel-Lucent Research Award for Technical Communication in 2012. Tanja Schultz is the author of more than 250 articles published in books, journals, and proceedings. She is a member of the Society of Computer Science (GI) for more than 20 years, of the IEEE Computer Society, and the International Speech Communication Association ISCA, where she serves her second term as an elected ISCA Board member.
Day of the talk: Tuesday, 14 May 2013 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Perceptually motivated speech recognition and mispronunciation detection
Given By: Christos Koniaris, PhD.
Abstract: Chris will be presenting his doctoral thesis as the result of a research effort performed in two fields of
speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes.
Day of the talk: Wednesday, 12 Dec 2012 16:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Incorporation of phonetic constraints in acoustic- to-articulatory inversion
Given By: Blaise Potard, PhD.
Abstract: Blaise will be talking about his doctoral research on the acoustic-to-articulatory inversion problem.
The main aim of his Ph. D. was to investigate the use of additional constraints (phonetical and visual)
to improve the realism of the solutions found by an existing inversion framework. This research was
conducted in LORIA, Nancy, France, under the supervision of Yves Laprie.
Day of the talk: Monday, 10 Dec 2012 10:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Grapheme-to-Phoneme (G2P) Training and Conversion with WFSTs
Given By: Josef Novak
Abstract: The talk is of tutorial nature. Basically, a hands-on introduction to using some of the features of OpenFst-based G2P toolkit, Phonetisaurus, developed by Josef Novak with some high-level background information and a description of the features/shortcomings/goals of the toolkit.
The slides, a special tutorial distribution, and cut-and-paste terminal commands in wiki format can be found on the Phonetisaurus googlecode site,
Home page and code:
http://code.google.com/p/phonetisaurus/ (see the downloads' section of the lefthand sidebar)
Copy-and-paste tutorial companion:
http://code.google.com/p/phonetisaurus/wiki/FSMNLPTutorial
######
Short Bio:
Josef Novak is currently a Ph.D. student in Hirose-Minematsu laboratory, in the EEIC department at the University of Tokyo.
More information:
http://www.gavo.t.u-tokyo.ac.jp/~novakj/
Day of the talk: Monday, 30 Jul 2012 13:30:00
The talk will be given at Idiap: 405, Management Meeting Room (Gaugin)
|
|
Title of the talk: On the beauty of Online Selective Sampling
Given By: Francesco Orabona
Abstract: Online selective sampling is an active variant of online learning in
which the learner is allowed to adaptively subsample the labels of an
observed sequence of feature vectors. The learnerâ??s goal is to achieve
a good trade-off between mistakes rate and number of sampled labels.
This can viewed as an abstract protocol for interactive learning
applications. For example, a system for categorizing stories in a
newsfeed asks for human supervision whenever it feels that more
training examples are needed to keep the desired accuracy.
A formal theory, almost assumptionless, that allows to calculate exact
confidence values on the predictions will be presented. Using this
theory, two selective sampling algorithms that use regularized least
squares (RLS) as base classifier will be shown. These algorithms have
formal guarantees on the performance and the maximum number of labels
queried. Moreover the RLS is easy and efficient to implement and
empirical results will be shown as well to validate the theoretical
results.
Day of the talk: Wednesday, 02 May 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Fractal Marker Fields
Given By: Marketa Dubska
Abstract: Many augmented reality systems are using fiduciary markers to localize the camera in the 3D scene. One big disadvantage of the markers used today is that the camera motion is tightly limited: the marker (one of the markers) must be visible and it must be observed at a proper scale.
This talk presents a fractal structure of markers similar to matrix codes (such as QRcode or DataMatrix): the Fractal Marker Field. The FMF allows for embedding markers of a virtually unlimited number of scales. At the same time, for each of the scales it guarantees a constant density of markers at that scale. The talk sketches out construction of FMF and a baseline algorithm for detecting the markers.
Day of the talk: Friday, 20 Apr 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Overview of some research activities at Australia s Commonwealth Scientific and Industrial Research Organisation (CSIRO)
Given By: Eric Lehmann
Abstract: Abstract
CSIRO is Australia's national science agency and one of the largest and most diverse research organisations in the world. It employs over 6000 scientists at more than 50 centres throughout Australia and overseas. The core research undertaken at CSIRO focuses on the main challenges facing Australia at present time, and includes research areas such as health, agriculture and food supply, mineral resources and mining, information and communication technologies, understanding climate change, and sustainable management of the environment, the oceans and water resources. In this talk, I will present an overview of my recent research work at CSIRO, which involves aspects of Bayesian filtering and hierarchical modelling for applications related to environmental mapping and monitoring, and modelâ??data fusion for water resource assessment at continental scale.
About the presenter
Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Diploma in Electrical Engineering. He received the M.Phil. and Ph.D. degrees, both in Electrical Engineering, from the Australian National University (Canberra) in 2000 and 2004 respectively. Between 2004 and 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, WA, where he was active in the field of acoustics and array signal processing, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker tracking. He is now working as a Research Scientist for CSIRO in Perth, within the division of Mathematics, Informatics and Statistics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatio-temporal data for environmental mapping and monitoring. He also contributes to the scientific research on Bayesian hierarchical methods for the assimilation of soil moisture satellite data with modeled estimates (modelâ??data fusion) for water resource management.
Day of the talk: Friday, 20 Apr 2012 14:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Parallel Coordinates and Hough Transform
Given By: Marketa Dubska
Abstract: Parallel coordinates provide coordinate system used mostly or solely for high-dimensional data visualization. There exist only few applications which used them for computational tasks. We proposed new utilization of them - as a new line parametrization for Hough transform. This parameterization, called PClines, outperform the existing approaches in terms of accuracy. Besides, PClines are computationally extremely efficient, require no floating-point operations, and can be easily accelerated by different hardware architectures. What is more, regular patterns as grids and groups of parallel lines can be effectively detected by this parameterization.
Day of the talk: Thursday, 19 Apr 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Cost Minimization of WaldBoost Classifiers
Given By: Roman Juranek
Abstract: Detection of objects in computer vision is a complex task. One of most popular and well explored approaches is use of statistical classifiers and scanning windows. In this approach, classifiers learned by AdaBoost algorithm are often used as they achieve low error rates and high detection rates. Process of object detection can be implemented by various methods. For the purpose of acceleration, graphics hardware, multi-core architectures, SIMD or custom hardware can be used. In this talk I will present a method which enhance object detection performance with respect to an user defined cost function. The method balances computations of previously learned classifier between two or more different implementations in order to minimize the cost function. The method is verified on a basic example - division of classifier to a pre-processing unit implemented in FPGA, and a post-processing unit in a standard PC. The technique has its application mainly in the design of low power smart cameras.
Day of the talk: Wednesday, 18 Apr 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Recent work at Graph@FIT
Given By: Roman Juranek
Abstract: In this talk, I will present the ongoing work of the graphics and video processing groups on FIT BUT. In the past, we participated in several successful projects, such as Center of Computer Graphics or FP6/FP7 projects. Currently, we participate in Artemis JU projects R3COP (development of robotic systems), SMECY (algorithms and compilers for embedded systems) and RECOMP, FP7 projects, such as SRS or TA2, and projects funded from the structural funds of the EU, such as Center of Excellence IT4I (IT for Innovations). Our research topics include, for example, statistical classification based object detection and recognition, environment mapping for mobile robots, augumented reality, real-time rendering and more. I will shortly present important results of our research.
Day of the talk: Tuesday, 17 Apr 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: The magical, two-dimensional world of graphene
Given By: Prof. Philippe Jacquod
Abstract: Carbon comes into different forms: graphite and diamond have been known for centuries, while fullerenes, buckyballs and carbon nanotubes, were discovered in the second half of the twentieth century. A new allotrope of carbon was isolated in 2004: graphene, which is a one-atom thick, two-dimensional lattice of carbon atoms. The discovery of graphene generated an almost unprecedented hype in physics. As a matter of fact, graphene has proven to be the material of all superlatives. It is the thinnest, but also the strongest, the stiffest but also the most stretchable of all crystals. Its electronic properties, together with its dimensionality, make it a strong potential candidate for replacing silicon in information processors. In this colloquial presentation, I will make a general introduction to the wonder material graphene, stressing its exceptional electronic and mechanical properties, sketching the many surprises it gave us and discussing future potential applications. In the last part of my talk, I will summarize some of our recent investigations on the local topography and spectroscopy of graphene [Xue et al., Nature Materials 10, 282 (2011); Yankowitz et al., Nature Physics (in press, 2012)]. The presentation is intended to be pedagogical and directed at a general, nonspecialist audience of scientists.
Philippe Jacquod studied physics at the ETHZ and the University of Neuchatel, where he obtained his PhD in 1997. He was a postdoctoral associate at Yale University from 1997 to 2000 and at the University of Leiden from 2000 to 2003. He became assistant professor of theoretical physics at the University of Geneva in 2003. He joined the physics department at the University of Arizona in 2006, where he is now a professor of physics and optical sciences. His field of research is in condensed matter physics, with a focus on quantum transport and nanophysics.
Day of the talk: Friday, 09 Mar 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Extended Pen+ Tools for Multimodal Analysis and Interaction
Given By: Nadir Weibel
Abstract: Access to information is one of the most crucial aspects of everyday life. As computation becomes ubiquitous and our environment is enriched with new possibilities for communication and interaction, the existing infrastructure of science, business, and social interaction is confronted with the difficult challenges of supporting complex tasks, mediating networked interactions, and managing the increasing availability of digital information and technology. Despite the tremendous development in terms of both new digital devices and novel interaction techniques that we all witnessed during the last years, it is almost unbelievable how paper documents and pen-based interaction still represent a very important way of interacting with both physical and digital information spaces. In an effort of re-thinking what pen and paper user interfaces (PPUI) mean in a modern world, we are studying multi-modal interactions of pen+ a range of tangible devices at the intersection of the physical and the digital worlds.
In this talk I will present my latest research around pen- and paper-computing, looking at how multimodal interaction with this â??very oldâ? technology enables a range of novel affordances and supports communication and interaction.
In the first part of the talk, I will speak about the development of new systems and prototypes that encompasses pen and other modalities, such as speech and gestures, different devices, such as smart phones, tablets, high-resolution wall displays, as well as different domains such as healthcare, accessibility, data visualization and interaction, social networks, augmented office environments, and communication for early education, older adults and other specific populations. I will present some examples of the prototypes we developed and some brief extracts of the data we collected about their usage in the wild.
The second part of the talk will focus on pen- and paper-based techniques and tools to get richer access to multimodal data in various contexts. While a new generation of inexpensive digital recording devices and storage facilities is revolutionizing data collection in behavioral science, one of the main obstacles to fully capitalizing on this opportunity is the huge time investment required for analysis using current methods. To address this analysis bottleneck we developed ChronoViz, a system providing synchronized interactive visual representations of multiple data streams. By using two multimodal datasets (a recent study of pilot/co-pilot interaction in a Boeing 787 simulator, and an ongoing learning analytics research project), I will present how the analysis tool works and how the integration of paper-based annotations, analysis, and interactions as part of the tool itself enable the exploration of new exciting methods for observational research.
Brief Bio Dr. Nadir Weibel is a Post-doctoral fellow at the University of California San Diego, member of both the Distributed Cognition and Human-Computer Interaction Laboratory and the Ubiquitous Computing and Social Dynamics research group. He holds a Bachelor and Master in Computer Science from ETH Zurich (Dipl. Informatik-Ing. ETH), and a Ph.D. in Computer Science also from ETH Zurich. During his Ph.D, he explored new ways of enhancing a seemingly mundane, but ubiquitous, resource such as paper to support everyday work, interaction and collaboration as a member of the Global Information Systems research group at ETH.
His current research is situated at the intersection of computer science, communication, and social sciences, studying the cognitive consequences of the introduction and the deployment of interactive multimodal and tangible devices. His main interests ranges from software engineering to human computer interaction, including computer supported collaborative work, mobile and ubiquitous computing. In his work he is developing theory and methods, designing representations, implementing prototypes, and evaluating the effectiveness of interactive physical-digital systems in order to understand the broader design space in which they are situated. He is currently collaborating with researchers at UCSD, Stanford, Berkeley, Drexel University, Childrenâ??s Hospital in Washington DC, TU Darmstadt, INRIA Paris / Université Paris Sud and Telecom Paristech.
Day of the talk: Tuesday, 31 Jan 2012 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Combining Transcription-based and acoustic-based speaker identifications for Broadcast news
Given By: Sylvain Meignier, Le Maine University, F
Abstract: In this presentation, we consider the issue of speaker identification within audio records of broadcast news. The speaker identity information is extracted from both transcript-based and acoustic-based speaker identification systems. This information is combined in the belief functions framework, which makes coherent the knowledge representation of the problem. The Kuhn-Munkres algorithm is used to optimize the assignment problem of speaker identities and speaker clusters. Experiments carried out on French broadcast news from the French evaluation campaign ESTER show the efficiency of the proposed combination method.
keywords: speaker identification, speaker diarization, belief functions.
Day of the talk: Thursday, 22 Dec 2011 14:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Speaker Verification Using the Spectral and Time Parameters of Voice Signal
Given By: Prof. Victor Sorokin, R&D Director OOO V
Abstract: Abstract - The speaker verification system developed in the VOXSEAL project is based on variations in formantfrequencies at stationary fragments and transient processes of vowels, the spectral features of fricative sounds, and theduration of speech segments. The best features are chosen for each word from the fixed list of Russian numerals rangingfrom zero to nine. The password phrase is randomly generated by the system at each verification. The compensation fordynamic noise and the counteraction with respect to interference using the reproduction of the intercepted and recorded speech are provided by the repeated reproduction of several words. The total error probabilities for male andfemale voices are 0.006 and 0.025%, respectively, for 30 million tests, 429 speakers, and a maximum length of the passwordphrase of 10 words. Note that the probabilities of false identification and false rejection are almost equal
Author - Prof. Victor Sorokin, R&D Director OOO Voxseal, Skolkovo-Moscow Russian national, MSc. from Moscow Aviation Institute, PhD (Engineering), Doctor of Sc. Physics and Mathematics (1987). Leading Researcher of the Institute for Information Transmission Problems of Russian Academy of Sciences, member of the Acoustical Society of America, board member of the Russian Acoustical Society, author of the monographs "Theory of Speech Production" and "Speech Synthesis", and about 150 publications, owner of 8 patents in speech technology.
Day of the talk: Tuesday, 20 Dec 2011 14:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Building-up child-robot relationship for therapeutic purposes
Given By: Joan Pons
Abstract: Summary:
Socially assistive robots (SAR) have shown to be very promising in therapeutic programs with children.
Health-related goals such as in-clinic rehabilitation or quality of life improvement have been achieved
through social interaction. In this context, robotâ??s effectiveness depends strongly in its ability to
elicit long-term engagement in children. To explore the dynamics of social bondsâ?? emergence with robots
a field study with 49 sixth grade scholars (aged 11-12 years) and 4 different robots was carried out at
an elementary school. Childrenâ??s preferences, expectations on functionality and communication, and
interaction behavior were studied. The results showed that different robotsâ?? appearance and performance
elicit in children distinctive perceptions and interactive behavior, and affect social processes as role
attribution and attachment. In a similar way, to explore the requirements of an effective human-robot
interaction, a quiz game was developed. A NAO robot was used to play the popular game of the 20 questions
to evaluate different interaction capabilities (i.e. face following, speech recognition, visual and audio
queues, and personalization).
ShortBio:
Joan Saez Pons did his PhD at the Mobile Machines and Vision Lab (MMVL), Sheffield Hallam University,
UK with the topic of multi-robot systems to collaborate with humans. He was as well a Marie-Curie
researcher at the Cognitive Neuroscience Department (KN) at University of Tuebingen, Germany. He has been
working at the Technical Research Centre for Dependency Care and Autonomous Living (CETpD), UPC, BarcelonaTech,
in the field of social robotics and human-robot interaction. His research interests include mobile robotics
navigation, multi-robot systems, cognitive robotics and human-robot interaction.
Day of the talk: Wednesday, 02 Nov 2011 16:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Convex Relaxation Methods for Image Processing
Given By: Xavier Bresson
Abstract: This talk will introduce recent methods to compute optimal solutions to fundamental problems in image processing.
Several meaningful problems in imaging are usually defined as non-convex energy minimization problems, which are
sensitive to initial condition and slow to minimize. The ultimate objective of our work is to overcome the bottleneck
problem of non-convexity. In other words, our goal is to â??convexifyâ? the original problems to produce more robust and
faster algorithms for real-world applications. Our approach consists in finding a convex relaxation of the original
non-convex optimization problems and thresholding the relaxed solution to reach the solution of the original problem.
We will show that this approach is able to convexify important and difficult image processing problems such as image
segmentation based on the level set method and image registration. Our algorithms are not only guaranteed to find a
global solution to the original problem, they are also at least as fast as graph-cuts combinatorial techniques while
being more accurate. Finally, I will introduce recent promising extensions of this approach in machine learning.
Bio: Prof. Xavier Bresson received his B.A. of Physics from University of Marseille and his Master of Electrical Engineering
from Ecole Superieure d'Electricite in Paris, France. He got his Ph.D. at the Swiss Federal Institute of Technology (EPFL)
in 2005. From 2006 to 2010, he was a Postdoctoral Scholar in the Department of Mathematics at University of California,
Los Angeles (UCLA). In 2010, he joined the Department of Computer Science at City University of Hong Kong as
Tenure-Track Assistant Professor. His current research works are focused on convex relaxation methods and unified geometric
methods in image processing and machine learning. He has published 38 papers in international journals and conferences.
Day of the talk: Thursday, 08 Sep 2011 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Scalable multi-class/multi-view object detection
Given By: Mr. Nima Razavi
Abstract: Scalability of object detectors with respect to the number of classes/views is a very important issue for applications where
many object classes need to be detected. While combining single-class detectors yields a linear complexity for testing,
multi-class detectors that localize all objects at once come often at the cost of a reduced detection accuracy. In this work,
we present a scalable multi-class detection algorithm which scales sublinearly with the number of classes without compromising accuracy.
To this end, a shared discriminative codebook of feature appearances is jointly trained for all classes and detection is also performed
for all classes jointly. Based on the learned sharing distributions of features among classes, we build a taxonomy of object classes.
The taxonomy is then exploited to further reduce the cost of multi-class object detection. Our method has linear training and sublinear
detection complexity in the number of classes. We have evaluated our method on the challenging PASCAL VOCâ??06 and PASCAL VOCâ??07
datasets and show that scaling the system does not lead to a loss in accuracy.
Day of the talk: Friday, 13 May 2011 14:30:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Latent Feature Models for the Structure and Meaning of Text
Given By: James Henderson and Paola Merlo
Abstract: Much of the meaning of text is
reflected in individual words or
phrases, but its full information
content requires structured analyses of
the syntax and semantics of natural
language. Our work on methods for
extracting such structured meaning
representations from natural language
has focused on the joint modelling of
syntactic and semantic dependency
structures. We have addressed this
problem by using latent variables to
model correlations between these two
structures without strong prior
assumptions about the nature of these
correlations. These models have
achieved state-of-the-art results in
both syntactic parsing and semantic
role labelling across several
languages. We have also used them to
exploit syntactic information in
correcting semantic roles automatically
transferred from translations.
Our use of latent variable models is in
part motivated by the recognition that
the supervised learning paradigm is
becoming increasingly impractical as
research in natural language processing
moves to more complex, deeper levels of
semantic analysis. By developing
robust efficient methods for learning
latent representations, we hope to be
able to induce semantic representations
from large quantities of data for
weakly correlated tasks, such as machine
translation. Our latent variable
models use vectors of latent features
for robust learning and exploit neural
networks for efficient approximate
inference, while still exploiting
methods from dependency parsing for
efficient decoding with sufficiently
powerful models.
(Work with Ivan Titov, Lonneke van der
Plas, Nikhil Garg, and Andrea Gesmundo.)
Day of the talk: Friday, 11 Mar 2011 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Face Recognition and Intelligent Video Surveillance
Given By: Prof Stan Z. Li
Abstract: Face recognition and intelligent video surveillance are important areas for the next generation ID management and public security.
In this talk, challenges and recent advances and applications of face biometric and intelligent video surveillance technologies will be described.
Shot Bio:
Stan Z. Li received his B.Eng from Hunan University, China, M.Eng from National University of Defense Technology, China, and PhD degree from Surrey University, UK.
He is currently a professor and the director of Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences (CASIA).
He worked at Microsoft Research Asia as a researcher from 2000 to 2004. Prior to that, he was an associate Professor at Nanyang Technological University, Singapore.
He was elevated to IEEE Fellow for his contributions to the fields of face recognition, pattern recognition and computer vision.
Day of the talk: Wednesday, 03 Nov 2010 14:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Social Sensing for Epidemiological Behavior Change
Given By: Anmol Madan
Abstract: Abstract: An important question in behavioral epidemiology and public health is to understand how individual behavior is affected by illness and stress. Although changes in individual behavior are intertwined with contagion, epidemiologists today do not have sensing or modeling tools to quantitatively measure its effects in real-world conditions. We propose a novel application of ubiquitous computing. We use mobile phone based co-location and communication sensing to measure characteristic behavior changes in symptomatic individuals, reflected in their total communication, interactions with respect to time of day (e.g., late night, early morning), diversity and entropy of face-to-face interactions and movement. Using these extracted mobile features, it is possible to predict the health status of an individual, without having actual health measurements from the subject. Finally, we estimate the temporal information flux and implied causality between physical symptoms, behavior and mental health.
Bio: Anmol Madan recently completed his PhD at the MIT Media Lab, with Prof. Alex Pentland. Currently, he is working as a post doctoral researcher at Northeastern University and Harvard University with Prof. David Lazer. He has received honors from the MIT 100k Competition and the MIT Enterprise Forum for various startup-related ideas. His research interests are in modeling human behavior using large-scale mobile phone sensor datasets, using applied machine learning and data mining methods. You might have also read about his research in popular media like CNN, BBC, New York Times, Wired, BusinessWeek and Slashdot.
Day of the talk: Friday, 01 Oct 2010 16:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Tell Me Where You have Lived, and I will Tell You What You Like: Adapting Interfaces to Cultural Preferences
Given By: Abraham Bernstein
Abstract: Adapting user interfaces to cultural preferences has been shown to improve a user's performance, but is oftentimes foregone because of its time-consuming and costly procedure. Moreover, it is usually limited to producing one uniform user interface (UI) for each nation disregarding the intangible nature of cultural backgrounds. To overcome these problems, we exemplify a new approach with our culturally adaptive web application MOCCA, which is able to map information in a cultural user model onto adaptation rules in order to create personalized UIs. Apart from introducing the adaptation flexibility of MOCCA, the talk describes a study with 30 participants in which we compared UI preferences to MOCCA's automatically generated UIs. Another experiment with over 40 participants from 3 coutnries showed a performance improvement for culturally adapted UIs over Results confirm that automatically predicting cultural UI preferences is possible, paving the way for low-cost cultural UI adaptations.
Bio
Abraham Bernstein is a full professor of informatics at the University of Zurich, Switzerland. His current research focuses on various aspects of the semantic web, knowledge discovery, service discovery/matchmaking, and mobile/pervasive computing. His work is based on both social science (organizational psychology/sociology/economics) and technical (computer science, artificial intelligence) foundations. Mr. Bernstein is a Ph.D. from MIT and has a Diploma in Computer Science (comparable to a M.S.) from the Swiss Federal Institute in Zurich (ETH). He is the program chair of this year's ISWC and on the editorial board of the International Journal on Semantic Web and Information Systems, the Informatik Spektrum by Springer, Journal of the Association for Information Systems, and the newly approved ACM Transactions on Intelligent Interactive Systems.
Day of the talk: Monday, 06 Sep 2010 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Conjugate Mixture Models for Clustering and Tracking Multimodal Data.
Given By: Vassil Khalidov
Abstract: The problem of multimodal tracking arises whenever the same objects are
observed through time by different sensors. We address the general case
when the observations from different modalities are not necessarily
aligned, in the sense that there is no obvious way to associate or to
compare them in some common space. Our objective is to construct a model that is able to estimate the number of objects and to cluster the data so that the clusters stay consistent across modalities through time. We use Bayesian treatment and present an approach, based on stochastic optimization and information criteria. The results are illustrated on a multiple audio-visual object tracking task with a ``robot head'' device, comprising a pair of stereoscopic cameras and a pair of microphones.
Day of the talk: Monday, 28 Jun 2010 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Statistical and knowledge-centric techniques in Natural Language Understanding: a valuable handshake?
Given By: Silvia Quateroni
Abstract: In this talk, I will draw from my experience in Information Retrieval and Spoken Dialogue Systems to discuss a number of situations where statistical (e.g. machine learning) techniques shake hands with knowledge-centric approaches to meet user needs and account for domain knowledge. I will present examples particularly from the areas of Question Answering and Spoken Language Understanding, two research fields that exhibit a number of common points.
Short biography: Silvia Quarteroni is a Senior Marie Curie Research Fellow involved in the ADAMACH project at the University of Trento. She received her MSc and BSc in Computer Engineering at the Swiss Federal Institute of Technology in Lausanne (EPFL) and her PhD in Computer Science at the University of York (UK). She has been working in several fields of Natural Language Processing, focusing on human-computer dialogue, information retrieval and personalization. She has published about 30 articles in international conferences and journals and is part of the programme committee of several of these.
Day of the talk: Thursday, 11 Mar 2010 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: Subband temporal envelopes of speech signal and their central role in speech recognition by humans and machines
Given By: Cong-Thanh Do
Abstract: The subband temporal envelopes of speech signal have a central role in this presentation which can be split in three parts.
The first part of the presentation deals with the automatic recognition of cochlear implant-like spectrally reduced speech (SRS) [1]. The automatic speech recognition (ASR) system, which was trained on TI-digits database, is HMM-based and the speech feature vectors are the MFCCs along with the delta and acceleration coefficients. We show that from certain SRS spectral resolution, it is possible to achieve word accuracy as good as that attained with the original clean speech even though the SRS is synthesized only from subband temporal envelopes of the original clean speech [2]. This work motivated some perspectives on noise robust ASR and speech feature vector enhancement dedicated to ASR [3].
The human recognition of speech is addressed in the second part of the presentation. We present quantitative analyses on the speech fundamental frequency (F0) in the cochlear implant-like SRS which support the report of Zeng et al. 2005 [4], based on subjective tests, about the difficulty of cochlear implant users in identifying speakers. That is, the F0 distortion in state-of-the-art cochlear implant is great when the SRS, which is acoustic simulation of cochlear implant, is synthesized only from subband temporal envelopes [5]. The analyses revealed also a significant reduction of F0 distortion when the frequency modulation is integrated in cochlear implant, as proposed by Nie et al. 2005 [6]. On the other hand, the results of such quantitative analysis could be exploited to conduct subjective studies in cochlear implant research.
The third part of the presentation concerns the audio-visual speech processing in which a linear relationship between the subband temporal envelopes and the area of mouth opening was mathematically proposed [7]. This proposition is based on the pioneering research of Grant and Seitz [8] in which the author reported different degrees of correlation between acoustic envelopes and visible movements. Our mathematical model helps in estimating the area of mouth opening only from speech acoustics using blind deconvolution techniques [9]. The estimated area of mouth opening is sufficiently correlated with the manually measured ones with an average of correlation coefficients equals 0.73.
Biography: Cong-Thanh Do was born in Hanoi, Vietnam, in 1983. He received the Electrical Engineering degree from Hanoi University of Technology, Hanoi and Grenoble Institute of Technology, Grenoble, France, in 2006, through the Programme de Formation d'Ingénieurs d'Excellence au Vietnam (PFIEV). In 2007, he received the M.S degree in signal, image, speech, and telecommunication from the Grenoble Institute of Technology, Grenoble, France and performed a research internship in the Speech and Cognition Department of GIPSA-Lab, Grenoble, France.
He is currently working toward the Ph.D. degree in the Signal and Communications Department, Insitut Télécom, Télécom Bretagne, UMR CNRS 3192 Lab-STICC, Technopôle Brest-Iroise, Brest, France. His current research interests include automatic speech recognition, audio-visual speech processing and statistical signal processing.
Day of the talk: Friday, 05 Mar 2010 11:00:00
The talk will be given at Idiap: 106, Conference room
|
|
Title of the talk: IDIAP Newcomers
Given By: Hervé Bourlard
Abstract: If you are an IDIAP newcomer and we haven't had a chance to meet yet (e.g., at the previous
similar meeting), I would like to invite you for a meeting all together for informal introduction, discussions, and Q&As.
Day of the talk: Tuesday, 30 Jan 2007 17:00:00
The talk will be given at Idiap: Meeting Room (Villa Tissieres)
|
|
Title of the talk: Dry-run of my PhD defense
Given By: G. Lathoud
Abstract: Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays
Knowing the location of human speakers permits a wide spectrum of
applications, including hearing aids, hands-free speech processing in
cars, surveillance, intelligent homes and offices, autonomous
robots. This thesis focuses on the use of microphone arrays to analyze
spontaneous multi-party speech. This is a challenging task, because
such speech contains many very short utterances, and people interrupt
a lot each other (overlapped speech). Moreover, in order to build
applications with the least possible constraints on the users, we use
distant microphones only, for example on a meeting room table.
Finally, the developed approaches are as unsupervised as possible,
having in mind the dominant proportion of non-technical users. We
targeted the development of an automatic system that can handle both
moving and static speakers, in order to answer the question "Who spoke
where and when?". Several issues were investigated, from the signal
processing level (where? when?) to the speaker clustering level
(who?). The techniques developed in the course of this research were
successfully tested on a large variety of real indoor recordings,
including cases with multiple moving speakers as well as seated
speakers in meetings. The versatility of the proposed techniques is
illustrated by a direct application to two related cases: hands-free
speech acquisition in cars, and noise-robust speech recognition
through telephones. Finally, a close analysis of the speaker
clustering results leads to question the linearity of the transmission
channel in a real indoor environment, when a speaker is a few meters
away from a microphone.
Day of the talk: Friday, 24 Nov 2006 16:00:00
The talk will be given at Idiap: Main Conference Room (UBS)
|
|
Title of the talk: A Music Discovery Engine based on Audio Similarities
Given By: Nicolas SCARINGELLA
Abstract: A Music Discovery Engine based on Audio Similarities
In the context of Electronic Music Distribution, huge databases
coming from both restoration of existing analog archives and new
content have been created and are continuously growing. The biggest
online services are now proposing around 2 millions tracks urging for
efficient ways to browse collections. Providing the kind of robust
access to the worldâ??s vast store of music that we currently provide for
textual material has been the goal of the Music Information Retrieval
(MIR) community over the past 10 years; however, it still remains a
very challenging problem in the case of audio data.
Music information is indeed a multifaceted and sometimes complex
data set that includes pitch, temporal (i.e. rhythm), harmonic, timbral
(e.g. orchestration), textual (i.e. lyrics), symbolic, editorial, and
metadata elements (without considering related visual elements). Music
information is also extremely dynamic. That is, any given work can have
its specific pitches altered, its rhythm modified, its harmony reset,
its orchestration changed, its performance reinterpreted, and its
performers arbitrarily chosen; yet, somehow, it remains the â??sameâ?
piece of music as the â??originalâ?. Within this extraordinarily fluid
environment, the concept of â??similarityâ? becomes particularly
problematic while being crucial to design audio and music information
retrieval systems.
In this talk, we will discuss the concept of similarity between music
excerpts and propose possible research directions to build a music
discovery engine based on audio analysis.
Day of the talk: Monday, 10 Jul 2006 16:00:00
The talk will be given at Idiap: Main Conference Room (UBS)
|
|
Title of the talk: Prior Knowledge in Kernel Methods (PhD defense rehearsal)
Given By: Alexei Pozdnoukhov
Abstract: Kernel Methods are one of the most successful branches of Machine Learning.
They allow applying linear algorithms with well-founded properties
such as generalization ability, to non-linear real-life problems.
Support Vector Machine is a well-known example of a kernel method,
which has found a wide range of applications in data analysis nowadays.
In many practical applications, some additional prior knowledge
is often available. This can be the knowledge about the data domain,
invariant transformations, inner geometrical structures in data,
some properties of the underlying process, etc.
If used smartly, this information can provide significant
improvement to any data processing algorithm.
Thus, it is important to develop methods for incorporating
prior knowledge into data-dependent models.
The main objective of this thesis is to investigate approaches
towards learning with kernel methods using prior knowledge.
Invariant learning with kernel methods is considered in more details.
Day of the talk: Thursday, 29 Jun 2006 15:00:00
The talk will be given at Idiap: Main Conference Room (UBS)
|
|
Title of the talk: PhD defense Dry run:
Given By: Norman Poh
Abstract: This thesis presentation is about combining multiple systems applied to biometric authentication. Its two-fold contribution is to provide a better understanding of the problem of fusion (w.r.t to correlation, performance strength of individual systems and noise) and to exploit the knowledge of claimed identity to improve the combined system performance. Conditioning on the claimed identity is difficult because one has to deals with a small learning sample size.
Day of the talk: Wednesday, 24 May 2006 16:00:00
The talk will be given at Idiap: Main Conference Room (UBS)
|
|
Title of the talk: Using Auxiliary Sources of Knowledge for Automatic Speech Recognition
Given By: Mathew Magimai Doss
Abstract: This is the second rehearsal of my PhD defense presentation. Your comments and suggestions would be of great help. Thank You!
Day of the talk: Friday, 27 May 2005 16:00:00
The talk will be given at Idiap: Meeting Room (Villa Tissieres)
|
|
Title of the talk: ACM MultiMedia conference report
Given By: Florent Monay
Abstract: I will describe some papers and demos from ACM MultiMedia 2003 and MIR2003 workshop (content-based multimedia information retrieval, home videos browsing/editing, home photos browsing, surveillance, sports video indexing, ...).
A discussion about the corresponding research directions will follow.
Day of the talk: Monday, 24 Nov 2003 11:00:00
The talk will be given at Idiap: Smart Meeting Room (Pavillon)
|
|
|
|