Idiap on LinkedIn Idiap youtube channel Idiap on Twitter Idiap on Facebook
Personal tools
You are here: Home Research Idiap talks

Idiap Public Talks Archives

Title: The role of electrochemical energy storage systems in a Smart Grid
Speaker: Prof. Hubert Girault, , EPFL, Switzerland
Date: Wednesday, 18 Feb 2015 - 11:00:00

He shall present the demonstrator they are installing at the water treatment plant in Martigny. It is based on a redox flow battery able to produce hydrogen to maintain the battery at an optimum state of charge. He shall therefore explain how a redox flow battery works and discuss the advantages and disadvantages. Then, he shall present how our concept of service station for electric cars, with lithium batteries like the Tesla or with hydrogen fuel cells like the Hyundai ix35.

Title: Data Valorisation based on Linked (open) Data approaches
Speaker: Prof. Maria Sokhn, , HES-SO Valais, Switzerland
Date: Thursday, 12 Feb 2015 - 11:00:00

Maria will also present her group at the Hes-so Valais Wallis

Title: Speech technologies - going from the research labs to market
Speaker: Petr Schwarz, , Phonexia/Brno University of Technology, Czech Republic
Date: Wednesday, 12 Nov 2014 - 10:00:00

Several speech technologies like speech transcription, keyword spotting, language identification, speaker identification will be discussed from the architecture point of view. Then cases how these speech technologies are used in call centers, banks, by governmental agencies, or by broad cast service providers for speech data mining, voice analytic or voice biometry will be presented. Each client and use case has some specific requirements on technology, data handling and services. The requirements and its implication on technology development and research will be mentioned.

Title: Language identification@BUT
Speaker: Pavel Matejka, , Brno University of Technology, Czech Republic
Date: Wednesday, 12 Nov 2014 - 11:00:00

This talk presents an ongoing work in language identification for DARPA RATS programme. The talk will describe an application of Neural Network Bottleneck (BN) features in Language Identification (LID). BN features are generally used for Large Vocabulary Speech Recognition in conjunction with conventional acoustic features, such as MFCC or PLP. We compare the BN features to several common types of acoustic features used in the present-day state-of-the-art LID systems. The test set is from DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from highly degraded radio communication channels. On this type of noisy data, we show that in average, the BN features provide a 45% relative improvement in the Cavg or Equal Error Rate (EER) metrics across several test duration conditions, with respect to our single best acoustic features.

Title: On the use of multimodal cues for the modeling of group involvement and individual engagement in multiparty dialogue
Speaker: Catharine Oertel, , KTH, Sweden
Date: Thursday, 05 Jun 2014 - 10:30:00

Multiparty conversations are characterized by various degrees of participants’ engagement and group involvement. Humans are able to detect and interpret these degrees, basing their perception on multimodal cues. The automatic detection, in particular for bigger groups of people, poses however many challenges. In this talk, I will mainly focus on a study in which we analysed group-behaviour in an eight-party, multimodal corpus. We propose four features that summarize different aspects of eye-gaze patterns and allow us to describe individual engagement as well as group involvement in time. Our overall aim is to build a system which is able to foster group involvement. In addition, I will briefly comment on 2 studies in which we use the robot head Furhat to advance in this direction. Furhat is a robotic head that combines state-of-the-art facial animation with physical embodiment in order to facilitate multi-party dialogues with robots. Biograpy: Catharine Oertel is a PhD candidate at the Department of Speech, Music and Hearing at the Royal Institute of Technology (KTH) in Sweden since 2012. She is a member of the Speech group and is supervised by Prof. Joakim Gustafson. She received her Master's degree in Linguistics: Communication, Cognition and Speech Technology from Bielefeld University in 2010. From 2010-2012 she was a member of the Speech Communication Lab at Trinity College, Dublin. Her work has mainly been focused on the multi-modal modeling of conversational dynamics but she has also been active in the area of Human-Robot-Interaction.

Title: Anthropomorphic media design and attention modeling
Speaker: Dr. Tomoko Yonezawa and Ms. Yukari Nakat, , Japan, Kansai University
Date: Monday, 10 Mar 2014 - 11:00:00

In this talk, we would like to introduce our past trials on the human-robot / human-agent interactions especially focusing on the user's attention and the gaze communication. At first, in "Communication on Anthropomorphic Media", Dr. Tomoko Yonezawa will make a presentation on the past researches on gaze-communication and the robot's behaviors. Additionally, she will talk about her current research on touch interaction between human and wearable robot. Second, in "Presences with Avatars' Appearances Attached to Tex Communication in Twitter", Ms. Yukari Nakatani introduces her research theme on the representations of multiple virtual agents for sustainable communications in SNS. Finally we introduce the students' researches in our laboratory with some presentation movies.

Title: Building a Multilingual Heritage Corpus with Applications in Geo-Tagging and Machine Translation
Speaker: Martin Volk, , University of Zurich
Date: Monday, 03 Mar 2014 - 16:00:00

In this talk Martin Volk will present the Text+Berg project, an initiative to digitize and annotate all the yearbooks of the Swiss Alpine Club from its start in 1864 until today. The resulting corpus of 40 million words contains texts in the 4 official Swiss languages, with a large parallel part in German and French. Based on these translations Martin's group works on domain-specific machine translation systems, but also on search systems for word-aligned parallel corpora as a new resource for translators and linguists. Most of the yearbooks (more than 100'000 pages) were scanned and converted to text at the University of Zurich. Martin Volk will share his experiences on automatically correcting OCR errors as well as on dealing with tokenization, lemmatization and PoS-tagging issues in a corpus that spans 150 years and multiple languages. He will also report on the Text+Berg toponym detection and classification as well as person name recognition and tagging of temporal expressions. Recently the group has released Kokos, a system for collaborative correction of OCR errors in the yearbooks of the 19th century ( and asked the SAC members to join in creating a clean corpus. -- Martin Volk is Professor of Computational Linguistics at the University of Zurich. His research focuses on multilingual systems, in particular on Machine Translation. His group has been investigating domain adaptation techniques for statistical machine translation, hybrid machine translation for lesser resourced languages, and machine translation into sign language. He is also known for his work on machine translation of film and TV subtitles. Together with Noah Bubenhofer he is leading the Text+Berg project for the digitization and annotation of a large multilingual heritage document as a showcase in the Digital Humanities.

Title: Cost-effective, Autonomic and Adaptive Cloud Resource Management
Speaker: Thanasis Papaioannou, , Center for Research and Technology Hellas (CERTH)
Date: Wednesday, 18 Dec 2013 - 10:00:00

Current large scale web applications pose enormous and dynamic processing and storage requirements. Failures of any type are common in current datacenters, partly due to the higher scales of the data stored. As data scales up, its availability becomes more complex, while different availability levels per application or per data item may be required. At the same time, cloud infrastructures should be able to effectively deal with the elastic nature of these applications in an autonomic manner. To make things worse, as clients get increasingly averse to vendor lock-in and data unavailability risks, client data has to be efficiently split across clouds. In this talk, we briefly discuss three very effective cloud resource management solutions that deal with the different aforementioned requirements: Skute, Scarce and Scalia. Skute is a self-managed key-value store that dynamically allocates the resources of a data cloud to several applications in a cost-efficient and fair way. Scarce is a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. Scalia is a cloud storage brokerage solution that continuously adapts the placement of data based on its access pattern, subject to optimizations objectives and data placement constraints, such as storage costs and vendor lock-in avoidance. Short Bio: Dr. Thanasis G. Papaioannou is a senior researcher at the Information Technologies Institute of the Center for Research and Technology Hellas (CERTH). Formerly, he was a postdoctoral fellow at the Distributed Information Systems Laboratory of Ecole Polytechnique Fédérale de Lausanne (EPFL). He received his B.Sc. (1998) and M.Sc. (2000) in Networks and in Parallel/Distributed Systems from the Department of Computer Science, University of Crete, Greece, and his Ph.D. (2007) from the Department of Computer Science, Athens University of Economics and Business (AUEB). From spring 2007 to spring 2008, he was a Visiting Professor in the Department of Computer Science of AUEB, teaching i) Distributed Systems and ii) Networks - Network Security. He has over 45 publications in high quality journals and conferences including Springer Electronic Commerce Research, Elsevier Computer Networks Journal, INFOCOM'13, EDBT'13, CIKM'12, ACM SC'12 (SuperComputing), IEEE ICDE'10, ACM SOCC'10, IEEE CCGRID'11, INFOCOM'08, etc. He has been TPC member in over 25 conferences including SSDBM'14, ICDCS'13, SIGMOD Demo'13, SSDBM'13, SIGMOD Demo'12, SSDBM'12, ICDE'12, SocInfo'10, ICEC '07-09, Valuetools'08, etc.

Title: Statistical methods for environmental modelling and monitoring
Speaker: Dr. Eric A. Lehmann, , Commonwealth Scientific and Industrial Research Organisation
Date: Friday, 29 Nov 2013 - 10:00:00

The CSIRO Division of Computational Informatics (CCI) aims to transform information and decision making to enhance productivity, foster collaboration and deliver impact through services across a wide range of sectors. CCI researchers have in-depth expertise in applying statistical and mathematical methods in a variety of scien- tific fields including, among others, environmental and agricultural informatics, wireless sensor networks, in- formation and communication technologies for healthcare and clinical treatment, development of early screening tests for Alzheimer's disease (bioinformatics), computational and simulation sciences (high perform- ance computing), as well as statistical modelling for seasonal climate forecasting and complex biogeochemical systems (e.g. marine environments). This presentation will focus on some aspects of the research being carried out at CCI on applications of statisti- cal and computational methods for environmental modelling and natural resource management. In particular, I will present an overview of my recent work on the following topics: - multi-sensor integration of remote sensing data for large-scale vegetation mapping and monitoring, - data fusion methods for water resources assessment using ground-based and remote sensing data, and - spatial modelling of extreme weather events and associated risks in the context of a changing climate. These projects involve several aspects of multivariate Bayesian modelling and analysis (spatial and temporal), computational simulation methods (Markov chain Monte Carlo), issues of data quality and continuity, as well as scientific dissemination and stakeholder engagement. Short biography: Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Dipl. El.-Ing. ETH diploma (M.Sc. in Electrical Engineering). He received the M.Phil. and Ph.D. degrees, both in Elec- trical Engineering, from the Australian National University (Canberra) in 2000 and 2004, respectively. From 2004 to 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, where he was active in the field of acoustics, array signal processing and beamforming, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker localisation and tracking. He now works as a Research Scientist for CSIRO in Perth, within the division of Computational Informatics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatiotemporal data for environmental mapping and monitoring. He also contributes to the development of Bayesian hierarchical methods for natural resource management and climate modelling purposes.

Title: Robot learning by imitation and exploration with probabilistic dynamical systems
Speaker: Dr. Sylvain Calinon, , Italian Institute of Technology (IIT)
Date: Friday, 22 Nov 2013 - 10:00:00

Robots in current industrial settings reproduce repetitive movements in a stiff and precise manner, with sensory information often limited to the role of stopping the motion if a human or object enters the robot's workspace. The new developments in robot sensors and compliant actuators bring a new human-centric perspective to robotics. An increase of robots in small and medium-sized enterprises (SMEs) is predicted for the next few years. Products in SMEs are characterized by small batch sizes, short life-cycles and end-user driven customization, requiring frequent re-programming of the robot. SMEs also often involve confined spaces, so that the robots must work in safe collaboration with the users by generating natural movements and anticipating co-workers' movements with active perception and human activity understanding. Interestingly, these robots are much closer to human capabilities in terms of compliance, precision and repeatability. In contrast to previous technology, the planning, control, sensing and interfacing aspects must work hand-in-hand, where the robot is only one part of a broader robotics-based technology. The variety of signals to process and the richness of interaction with the users and the environment constitute a formidable area of research for machine learning. Current programming solutions used by the leading commercial robotics companies do not satisfy the new requirements of re-using the same robot for different tasks and interacting with multiple users. The representation of manipulation movements must be augmented with forces (for task execution, but also as a communication channel for collaborative manipulation), compliance and reactive behaviors. An attractive approach to the problem of transferring skills to robots is to take inspiration from the way humans learn by imitation and self-refinement. I will present a task-parametrized model based on dynamic movement primitives and Gaussian mixture regression to exploit the local correlations in the movement and the varying accuracy requirements of the task. The model is used to devise a controller for the robot that can adapt to new situations and that is safe for the surrounding users. Examples of applications with a compliant humanoid and with gravity-compensated manipulators will be showcased. Short bio: Dr Sylvain Calinon is Team Leader of the Learning and Interaction Lab at the Italian Institute of Technology (IIT), and a visiting researcher at the Learning Algorithms and Systems Laboratory (LASA), Ecole Polytechnique Fédérale de Lausanne (EPFL). He received a PhD on robot programming by demonstration in 2007 from LASA, EPFL, which was awarded by the Robotdalen Scientific Award, ABB Award and EPFL-Press Distinction. From 2007 to 2009, he was a postdoctoral research fellow at LASA, EPFL. His research interests cover robot learning by imitation, machine learning and human-robot interaction. Webpage:

Title: Quality in Face and Iris Research
Speaker: Dr. Stephanie Schuckers, , Clarkson University
Date: Wednesday, 20 Nov 2013 - 10:30:00

Because of limited resources (e.g. number and type of cameras, amount of time to focus on an individual, real-time processing power), using intelligence within standoff biometric capture systems can help in determining which individuals to focus on and for how long. Benchmark datasets available to the general research community are needed designing a stand-off multimodal biometric system. The overall goal of the research to investigate the fusion approaches to measure face, iris, and voice through experiments for identity at distances from 10 to 25 meters. This research includes a growing corpus of data, entitled Quality in Face and Iris Research Ensemble-Q-FIRE dataset which includes the following: (1) Q-FIRE Release 1 (made available in early 2010) is composed of 4T of face and iris video for 90 subjects out to 8.3meters (25 feet) with controlled quality degradation. (2) Release 2 is an additional 83 subjects with same collection specifications. Release 1 and 2 were used by NIST in IREX II: Iris Quality Calibration and Evaluation (IQCE). (3) Last, an extension of the dataset has been collected with unconstrained behavior of subjects on the same set of subjects, entitled Q-FIRE Phase II Unconstrained out to 8.3 meters. In this talk, the datasets will be described as well as results of experiments fusing face and iris scores with quality.

Title: Multimodal Interaction with Humanoid Robots
Speaker: Prof. Kristiina Jokinen, , University of Helsinki
Date: Tuesday, 19 Nov 2013 - 10:00:00

In this talk I will discuss issues related to multimodal interaction with intelligent agents, and in particular, present the Nao Wikitalk, an application that enables the user to query Wikipedia via the Nao robot. The robot can talk about an unlimited range of topics, so it supports open-domain conversations using Wikipedia as a knowledge source. The robot suggests some topics to start with, and the user can shift to related topics by speaking the topic names after the robot mentions them. The user can also switch to a totally new topic by spelling the first few letters. The challenge in presenting Wikipedia information is how to convey its structure to the user so that she can understand what is new information, and how to navigate in the topic structure. In Wikipedia, new relevant information is marked with hyperlinks to other entries, and the robot's interaction capabilities have been extended so that it signals these links non-verbally while reading the text. As well as speaking, the robot uses gestures, nods and other multimodal signals to enable clear and rich interaction. Gesture and posture changes can also be used to manage turn-taking, and to add liveliness to the interaction in general. To manage the interaction in a smooth way, it is also important to capture the user's emotional and attentional state. For this, we have experimented with gazing and face tracking to infer the user's interest level. The Nao WikiTalk system was evaluated by comparing the users' expectations with their experience of the robot interaction. In many respects the users had high expectations regarding the robot's interaction capabilities, but they were impressed by the robot's lively appearance and natural gesturing. Short bio Kristiina Jokinen is Adjunct Professor and Research Manager at University of Helsinki, and she is also Adjunct Professor of Interaction Technology at University of Tampere, Finland, and Visiting Professor at University of Tartu, Estonia. She received her PhD from University of Manchester, UK, and was alltogether four years as a post-doc at NAIST and as an invited researcher at ATR in Japan. In 2009-2010 she was Visiting Professor at Doshisha University in Kyoto. Her research focuses on spoken dialogue modelling, multimodal interaction management (especially gestures and eye gaze), natural language communication, and human-machine interaction. She has published many papers and articles, and three books: "Constructive Dialogue Modelling - Speech Interaction and Rational Agents" (John Wiley), "Spoken Dialogue Systems" (together with M. McTear; Morgan & Claypool), and "New Trends in Speech-based Interactive Systems" (edited together with F. Chen; Springer). She has been invited speaker e.g. at IWSDS 2010 and Multimodal Symposium in 2013. She organised the Nordic Research Training Course "Feedback, Communicative Gesturing, and Gazing" in Helsinki in 2011, and led the summer workshop "Speech, gaze and gesturing - multimodal conversational interaction with the Nao robot" in Metz, together with Graham Wilcock, in 2012. She has had several national and international cooperation projects and served in several programme and review committees. She is Programme Chair for the 2013 International Conference of Multimodal Interaction (ICMI), and she is Secretary-Treasurer of SIGDial, the ACL/ISCA Special Interest Group for Discourse and Dialogue.

Title: Advancing bio-microscopy with the help of image processing
Speaker: Prof. Michael Liebling, , UC Santa Barbara
Date: Monday, 18 Nov 2013 - 10:00:00

Image processing in bio-microscopy is no longer confined to the post-processing stage, but has gained wide acceptance as an integral part of the image acquisition process itself, as it allows overcoming hard limits set by instrumentation and biology. In this talk, I will present my lab's efforts to image dim and highly dynamic biological samples by boosting the temporal and spatial resolution of optical microscopes via software solutions and modified imaging protocols. Focusing on spatio-temporal image registration strategies to build 3D+time models of samples with repetitive motions, a superresolution algorithm to reconstruct image sequences from multiple low temporal resolution acquisitions, and a fast multi-channel deconvolution algorithm for multi-view imaging, I will illustrate the central role signal processing can play to advance bio-imaging. I will share the approaches we implemented in my group to rapidly bring new ideas from theory to full deployment in remote biology labs , where our tools can be applied with a variety of microscopy types. Finally, I will speculate on the future of image processing in bio-microscopy and suggest areas where efforts may be most rewarding. Bio: Michael Liebling is an Associate Professor of Electrical and Computer Engineering at the Universitz of California, Santa Barbara (UCSB). He received the MS in Physics (2000) and PhD in image processing (2004) from EPFL. From 2004 to 2007, he was a Postdoctoral Scholar in Biology at the California Institute of Technology, before joining the faculty in the department of Electrical and Computer Engineering in 2007, first as an Assistant Professor and, since Summer 2013, as an Associate Professor. His research interests include biological microscopy and image processing for the study of dynamic biological processes and, more generally, computational methods for optical imaging. He teaches both at the graduate and undergraduate level in the areas of signal processing, image processing and biological microscopy. Michael Liebling is a recipient of prospective and advanced researcher fellowships from the Swiss National Science Foundation and a 2011 Hellman Family Faculty Fellowship. He is v ice-chair (2014 Chair-elect) of the IEEE Signal Processing Society's Bio-Imaging and Signal Processing technical committee and was Technical Program co-chair of the IEEE International Symposium on Biomedical Imaging in 2011 and 2013.

Title: Human-Centered Computing for Critical Multimodal Cyber-Physical Environments
Speaker: Dr. Nadir Weibel, , University of California San Diego (UCSD)
Date: Tuesday, 05 Nov 2013 - 11:00:00

Critical cyber-physical environments such as the ones found in many healthcare settings or on the flight deck of modern airplanes are built on complex systems characterized by important properties spanning the physical and digital world, and centered on human activity. In order to properly understand this critical activity, researchers need to first understand the context and environment in which the activity is situated. Central in those environments is often interaction with the available technology and the communication between the individuals, both of which often involve multiple parallel modalities. Only an in-depth understanding of the properties of these multimodal distributed environments can inform the design and development of multimodal human-centered computing. After presenting an overview of my current research in human-centered computing, this talk will present some of the challenges and proposed solutions in terms of technologies and theoretical frameworks for collecting and making sense of rich multimodal data in two critical cyber-physical environments: the cockpit of a Boeing 787 airplane, and the medical office. The talk will explain how the combination of a range of data collection devices such as depth cameras, eye tracking, digital-pens, and HD video cameras, combined with powerful data visualization and a flexible analysis suite, allows in-depth understanding of those complex environments. I will end with a discussion of cutting-edge multimodal technology and how devices such as depth cameras and wearable augmented reality glasses open up a range of opportunities to develop new technology for knowledge workers of critical cyber-physical environments. BIO: Dr. Nadir Weibel is a Research Assistant Professor in the Department of Computer Science and Engineering at the University of California San Diego (UCSD), where he is teaching human-computer interaction and ubiquitous computing. His research is situated at the intersection of computer science, cognitive science, communication, health and social sciences. Dr. Weibel investigates tools, techniques and infrastructure supporting the deployment of innovative interactive multimodal and tangible devices in context, and studies the cognitive consequences of the introduction of this technology in the everyday life. Current work focuses on interactive physical-digital systems that exploit pen-based and touch-based devices, depth-cameras, wearable and mobile devices, in the setting of critical populations such as healthcare and education. Dr. Weibel is author of more than 45 publications on these topics. His work has been funded by the Swiss National Science Foundation, the European Union, Boeing, the US NSF, NIH and AHRQ.

Title: Technology Innovation and Related Partnerships – Case Idiap and Nokia
Speaker: Dr. Juha K. Laurila, , Nokia
Date: Thursday, 10 Oct 2013 - 10:45:00

This talk focuses on technology related innovation within companies like Nokia – and covers the flow from early phase ideas towards the technology transfer and productization. Further, the role of research partnerships as a part of the overall innovation process is discussed. More specifically, various modes of industry-academia collaboration and related drivers for each of them are briefly covered. Aspects like, technology licensing are touched briefly too. More particularly this presentation focuses on collaboration between Idiap and Nokia as a case study and investigates the role of Idiap-Nokia interactions from the perspective of overall innovation chain. This part covers e.g. Idiap’s contribution on Nokia’s Call for Research Proposals in 2008, joint initiatives around mobile data (Lausanne Data Collection Campaign 2009-2012 and Mobile Data Challenge 2011-2012) as well as bi-lateral research projects.

Title: Detecting Conversing Groups in Still Images
Speaker: Hayley Hung, , Technical University of Delft
Date: Friday, 13 Sep 2013 - 11:00:00

In our daily lives, we cannot help but communicate with people. Aside from organised and more structured communication like emails, meetings, or phone calls, we communicate instantaneously and often in adhoc, freely formed groups where it is not known beforehand how long the conversation will last for, who will be in the conversation, or what it will be about. In crowded settings like a conference, for example, this type of conversing group exists and who gravitates towards whom tells us a lot about the relationship between the members of the group. In this talk, I will discuss the challenges of this problem, solutions, and open questions of this emerging topic.

Title: The LiveLabs Urban LifeStyle Innovation Platform : Opportunities, Challenges, and Current Results
Speaker: Rajesh K. Balan, , School of Information Systems, SMU (Singapore)
Date: Friday, 13 Sep 2013 - 15:00:00

A central question in mobile computing is how do you test mobile applications, that depend on real context, in real environments with real users? User studies done in lab environments are frequently insufficient to understand the real-world interactions between user context, environmental factors, application behaviour, and performance results. In this talk, I will describe LiveLabs, a new 5 year project that started at the Singapore Management University in early 2012. The goal of LiveLabs is to convert four real environments, the entire Singapore Management University campus, a popular resort island, a large airport, and a popular shopping mall, into living testbeds where we instrument both the environment and the cell phones of opted-in participants (drawn from the student population and members of the public). We can then provide 3rd party companies, and researchers the opportunity to test their mobile applications and scenarios on the opted-in participants -- on their real phones in the four real environments described above. LiveLabs will provide the software necessary to collect network statistics and any necessary context information. In addition, LiveLabs will provide software and mechanisms to ensure that privacy, proper participant selection, resource management, and experimental results and data are maintained and provided on a need-to-know basis to the appropriate parties. I will describe the broad LiveLabs vision and identify the key research challenges and opportunities. In particular, I will highlight our current insight into indoor location tracking, dynamic group and queue detection, and energy aware context sensing for mobile phones.

Title: Signal Analysis using Autoregressive Models of Amplitude Modulation
Speaker: Dr. Sriram Ganapathy, , IBM T. J. Watson Research Center, USA
Date: Friday, 23 Aug 2013 - 11:00:00

Conventional speech analysis techniques are based on estimating the spectral content of relatively short (about 10-20 ms) segments of the signal. However, an alternate way to describe a speech signal is a long-term summation of amplitude modulated frequency bands, where each frequency band consists of a smooth envelope (gross structure) modulating a carrier signal (fine structure). We develop an auto-regressive (AR) modeling approach for estimating the smooth envelope of the sub-band signal. This model, referred to as frequency domain linear prediction (FDLP), is based on the application of linear prediction on discrete cosine transform of the signal and it describes the perceptually dominant peaks in the signal while removing the finer details. This suppression of detail is useful for developing a parametric representation of speech/audio signals. In this talk, I will also show several applications of the FDLP model for speech and audio processing systems. In the last leg of the talk, I will focus on our recent efforts at IBM for speech analysis in noisy radio communication channels. This will highlight the challenges involved along with a few solutions addressing parts of the problem. Short Biography: Sriram Ganapathy received his Doctor of Philosophy from the Center of Language and Speech Processing, Johns Hopkins University in January 2012. Prior to this, he obtained his Bachelor of Technology from College of Engineering, Trivandrum, India in 2004 and Master of Engg. from Indian Institute of Science, Bangalore in 2006. He has worked as a Research Assistant in Idiap Research Institute, Switzerland from 2006 to 2008 working on speech and audio projects. Currently, he is a post-doctoral researcher at IBM T.J. Watson Research Center working on signal analysis methods for radio communication speech in highly degraded environments. His research interests include signal processing, machine learning and robust methodologies for speech and speaker recognition.

Title: Three Factor Authentication for Commodity Hand-Held Communication Devices
Speaker: Prof Brian C. Lovell, , The University of Queensland (UQ) St. Lucia, Brisbane QLD 40
Date: Wednesday, 17 Jul 2013 - 14:00:00

User authentication to online services is at a cross-roads. Attacks are increasing, and current authentication schemes are no longer able to provide adequate protection. The time has come to include the third factor of authentication, and start using biometrics to authenticate people. However, despite signficant progress in biometrics, they still suffer from a major mode of attack: replay attacks, where biometric signals may be captured previously and reused. Replay attacks defeat all current liveness tests. Current literature recognises replay attacks as a significant issue, but there are no practical and tested solu- tions available today. The purpose of this research is to improve authentication to online services by including a face recognition biometric, as well as providing one solution to the replay attack problem for the proposed face recognition system. If this research is success- ful, it will enable the use of enhanced authentication mechanisms on mobile devices, and open new research into methods of addressing biometric replay attacks. Speaker Biography Brian C. Lovell was born in Brisbane, Australia in 1960. He received a BE in electrical engineering Honours I) in 1982, a BSc in computer science in 1983, and a PhD in signal processing in 1991: all from the University of Queensland (UQ). Professor Lovell is Project Leader of the Advanced Surveillance Group in the School of ITEE, UQ. He served as President of the International Association of Pattern Recognition 2008-2010, and is a Fellow of the IAPR, Senior Member of the IEEE, Fellow of the IEAust, and voting member for Australia on the Governing Board of the International Association for Pattern Recognition since 1998. Professor Lovell was Program Co-Chair of ICPR2008 in Tampa, Florida, and was General Co-Chair of ACPR2011 in Beijing, and General Co-Chair of ICIP2013 in Melbourne. His Advanced Surveillance Group works with port, rail and airport organizations as well as several national and international agencies to identify and develop solutions addressing operational and security concerns.

Title: Biosignals and Interfaces
Speaker: Prof. Tanja Schultz, , Karlsruhe University
Date: Tuesday, 14 May 2013 - 11:00:00

Human communication relies on signals like speech, mimics, or gestures and the interpretation of these signals seems to be innate to humans. In contrast, human interaction with machines and thus human communication mediated through machines is far from being natural. To date, it is restricted to few channels and the capabilities of machines to interpret human signals are still very limited. At the Cognitive Systems Lab (CSL) we explore human-centered cognitive systems to improve human-machine interaction as well as machine-mediated human communication. We aim to benefit from the strength of machines by departing from just mimicking the human way of communication. Rather we focus on considering the full range of biosignals emitted from the human body, such as electrical biosignals like brain and muscle activity. These signals can be directly measured and interpreted by machines, leveraging emerging wearable, small and wireless sensor technologies. Using these biosignals offers an inside perspective on human mental activities, intentions, or needs and thus complement the traditional way of observing humans from the outside. In my talk I will discuss ongoing research on "Biosignals and Interfaces" at CSL, such as speech recognition, silent speech interfaces that rely on articulatory muscle movement, and interfaces that use brain activity to determine users' mental states, such as task activity, cognitive workload, attention, emotion, and personality. We hope that our research will lead to a new generation of human centered systems, which are completely aware of the users' needs and provide an intuitive, efficient, robust, and adaptive input mechanism to interaction and communication. Bio: Tanja Schultz received her Ph.D. and Masters in Computer Science from University Karlsruhe, Germany in 2000 and 1995 respectively and got a German Staatsexamen in Mathematics, Sports, and Educational Science from University of Heidelberg, in 1990. She joined Carnegie Mellon University in 2000 and became a Research Professor at the Language Technologies Institute. Since 2007 she is also a Full Professor at the Department of Informatics of the Karlsruhe Institute of Technology (KIT) in Germany. She is the director of the Cognitive Systems Lab, where her research activities focus on human-machine interfaces with a particular area of expertise in rapid adaptation of speech processing systems to new domains and languages. She co-edited a book on this subject and received several awards for this work. In 2001 she received the FZI price for an outstanding Ph.D. thesis. In 2002 she was awarded the Allen Newell Medal for Research Excellence from Carnegie Mellon for her contribution to Speech Translation and the ISCA best paper award for her publication on language independent acoustic modeling. In 2005 she received the Carnegie Mellon Language Technologies Institute Junior Faculty Chair. Her recent research focuses on human-centered technologies and intuitive human-machine interfaces based on biosignals, by capturing, processing, and interpreting signals such as muscle and brain activities. Her development of silent speech interfaces based on myoelectric signals was in the top-ten most important attractions at CeBIT 2010, received best demo and paper awards in 2006 and 2013, and was awarded with the Alcatel-Lucent Research Award for Technical Communication in 2012. Tanja Schultz is the author of more than 250 articles published in books, journals, and proceedings. She is a member of the Society of Computer Science (GI) for more than 20 years, of the IEEE Computer Society, and the International Speech Communication Association ISCA, where she serves her second term as an elected ISCA Board member.

Title: Perceptually motivated speech recognition and mispronunciation detection
Speaker: Christos Koniaris, PhD., , Idiap, Switzerland
Date: Wednesday, 12 Dec 2012 - 16:00:00

Chris will be presenting his doctoral thesis as the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes.

Title: Incorporation of phonetic constraints in acoustic- to-articulatory inversion
Speaker: Blaise Potard, PhD., , Idiap, Switzerland
Date: Monday, 10 Dec 2012 - 10:00:00

Blaise will be talking about his doctoral research on the acoustic-to-articulatory inversion problem. The main aim of his Ph. D. was to investigate the use of additional constraints (phonetical and visual) to improve the realism of the solutions found by an existing inversion framework. This research was conducted in LORIA, Nancy, France, under the supervision of Yves Laprie.

Title: Grapheme-to-Phoneme (G2P) Training and Conversion with WFSTs
Speaker: Josef Novak, , University of Tokyo, Japan
Date: Monday, 30 Jul 2012 - 13:30:00

The talk is of tutorial nature. Basically, a hands-on introduction to using some of the features of OpenFst-based G2P toolkit, Phonetisaurus, developed by Josef Novak with some high-level background information and a description of the features/shortcomings/goals of the toolkit. The slides, a special tutorial distribution, and cut-and-paste terminal commands in wiki format can be found on the Phonetisaurus googlecode site, Home page and code: (see the downloads' section of the lefthand sidebar) Copy-and-paste tutorial companion: ###### Short Bio: Josef Novak is currently a Ph.D. student in Hirose-Minematsu laboratory, in the EEIC department at the University of Tokyo. More information:

Title: On the beauty of Online Selective Sampling
Speaker: Francesco Orabona, , Toyota Technological Institute, Chicago, US
Date: Wednesday, 02 May 2012 - 11:00:00

Online selective sampling is an active variant of online learning in which the learner is allowed to adaptively subsample the labels of an observed sequence of feature vectors. The learner's goal is to achieve a good trade-off between mistakes rate and number of sampled labels. This can viewed as an abstract protocol for interactive learning applications. For example, a system for categorizing stories in a newsfeed asks for human supervision whenever it feels that more training examples are needed to keep the desired accuracy. A formal theory, almost assumptionless, that allows to calculate exact confidence values on the predictions will be presented. Using this theory, two selective sampling algorithms that use regularized least squares (RLS) as base classifier will be shown. These algorithms have formal guarantees on the performance and the maximum number of labels queried. Moreover the RLS is easy and efficient to implement and empirical results will be shown as well to validate the theoretical results.

Title: Fractal Marker Fields
Speaker: Marketa Dubska, , Faculty of Information Technology, Brno University of Techno
Date: Friday, 20 Apr 2012 - 11:00:00

Many augmented reality systems are using fiduciary markers to localize the camera in the 3D scene. One big disadvantage of the markers used today is that the camera motion is tightly limited: the marker (one of the markers) must be visible and it must be observed at a proper scale. This talk presents a fractal structure of markers similar to matrix codes (such as QRcode or DataMatrix): the Fractal Marker Field. The FMF allows for embedding markers of a virtually unlimited number of scales. At the same time, for each of the scales it guarantees a constant density of markers at that scale. The talk sketches out construction of FMF and a baseline algorithm for detecting the markers.

Title: Overview of some research activities at Australia s Commonwealth Scientific and Industrial Research Organisation (CSIRO)
Speaker: Eric Lehmann, , CSIRO in Perth
Date: Friday, 20 Apr 2012 - 14:00:00

Abstract: CSIRO is Australia's national science agency and one of the largest and most diverse research organisations in the world. It employs over 6000 scientists at more than 50 centres throughout Australia and overseas. The core research undertaken at CSIRO focuses on the main challenges facing Australia at present time, and includes research areas such as health, agriculture and food supply, mineral resources and mining, information and communication technologies, understanding climate change, and sustainable management of the environment, the oceans and water resources. In this talk, I will present an overview of my recent research work at CSIRO, which involves aspects of Bayesian filtering and hierarchical modelling for applications related to environmental mapping and monitoring, and model-data fusion for water resource assessment at continental scale. About the presenter: Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Diploma in Electrical Engineering. He received the M.Phil. and Ph.D. degrees, both in Electrical Engineering, from the Australian National University (Canberra) in 2000 and 2004 respectively. Between 2004 and 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, WA, where he was active in the field of acoustics and array signal processing, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker tracking. He is now working as a Research Scientist for CSIRO in Perth, within the division of Mathematics, Informatics and Statistics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatio-temporal data for environmental mapping and monitoring. He also contributes to the scientific research on Bayesian hierarchical methods for the assimilation of soil moisture satellite data with modeled estimates (model-data fusion) for water resource management.

Title: Parallel Coordinates and Hough Transform
Speaker: Marketa Dubska, , Faculty of Information Technology, Brno University of Techno
Date: Thursday, 19 Apr 2012 - 11:00:00

Parallel coordinates provide coordinate system used mostly or solely for high-dimensional data visualization. There exist only few applications which used them for computational tasks. We proposed new utilization of them - as a new line parametrization for Hough transform. This parameterization, called PClines, outperform the existing approaches in terms of accuracy. Besides, PClines are computationally extremely efficient, require no floating-point operations, and can be easily accelerated by different hardware architectures. What is more, regular patterns as grids and groups of parallel lines can be effectively detected by this parameterization.

Title: Cost Minimization of WaldBoost Classifiers
Speaker: Roman Juranek, , Faculty of Information Technology, Brno University of Techno
Date: Wednesday, 18 Apr 2012 - 11:00:00

Detection of objects in computer vision is a complex task. One of most popular and well explored approaches is use of statistical classifiers and scanning windows. In this approach, classifiers learned by AdaBoost algorithm are often used as they achieve low error rates and high detection rates. Process of object detection can be implemented by various methods. For the purpose of acceleration, graphics hardware, multi-core architectures, SIMD or custom hardware can be used. In this talk I will present a method which enhance object detection performance with respect to an user defined cost function. The method balances computations of previously learned classifier between two or more different implementations in order to minimize the cost function. The method is verified on a basic example - division of classifier to a pre-processing unit implemented in FPGA, and a post-processing unit in a standard PC. The technique has its application mainly in the design of low power smart cameras.

Title: Recent work at Graph@FIT
Speaker: Roman Juranek , , Faculty of Information Technology, Brno University of Techno
Date: Tuesday, 17 Apr 2012 - 11:00:00

In this talk, I will present the ongoing work of the graphics and video processing groups on FIT BUT. In the past, we participated in several successful projects, such as Center of Computer Graphics or FP6/FP7 projects. Currently, we participate in Artemis JU projects R3COP (development of robotic systems), SMECY (algorithms and compilers for embedded systems) and RECOMP, FP7 projects, such as SRS or TA2, and projects funded from the structural funds of the EU, such as Center of Excellence IT4I (IT for Innovations). Our research topics include, for example, statistical classification based object detection and recognition, environment mapping for mobile robots, augumented reality, real-time rendering and more. I will shortly present important results of our research.

Title: The magical, two-dimensional world of graphene
Speaker: Prof. Philippe Jacquod, , University of Arizona
Date: Friday, 09 Mar 2012 - 11:00:00

Carbon comes into different forms: graphite and diamond have been known for centuries, while fullerenes, buckyballs and carbon nanotubes, were discovered in the second half of the twentieth century. A new allotrope of carbon was isolated in 2004: graphene, which is a one-atom thick, two-dimensional lattice of carbon atoms. The discovery of graphene generated an almost unprecedented hype in physics. As a matter of fact, graphene has proven to be the material of all superlatives. It is the thinnest, but also the strongest, the stiffest but also the most stretchable of all crystals. Its electronic properties, together with its dimensionality, make it a strong potential candidate for replacing silicon in information processors. In this colloquial presentation, I will make a general introduction to the wonder material graphene, stressing its exceptional electronic and mechanical properties, sketching the many surprises it gave us and discussing future potential applications. In the last part of my talk, I will summarize some of our recent investigations on the local topography and spectroscopy of graphene [Xue et al., Nature Materials 10, 282 (2011); Yankowitz et al., Nature Physics (in press, 2012)]. The presentation is intended to be pedagogical and directed at a general, nonspecialist audience of scientists. Philippe Jacquod studied physics at the ETHZ and the University of Neuchatel, where he obtained his PhD in 1997. He was a postdoctoral associate at Yale University from 1997 to 2000 and at the University of Leiden from 2000 to 2003. He became assistant professor of theoretical physics at the University of Geneva in 2003. He joined the physics department at the University of Arizona in 2006, where he is now a professor of physics and optical sciences. His field of research is in condensed matter physics, with a focus on quantum transport and nanophysics.

Title: Extended Pen+ Tools for Multimodal Analysis and Interaction
Speaker: Nadir Weibel, , University of California San Diego
Date: Tuesday, 31 Jan 2012 - 11:00:00

Access to information is one of the most crucial aspects of everyday life. As computation becomes ubiquitous and our environment is enriched with new possibilities for communication and interaction, the existing infrastructure of science, business, and social interaction is confronted with the difficult challenges of supporting complex tasks, mediating networked interactions, and managing the increasing availability of digital information and technology. Despite the tremendous development in terms of both new digital devices and novel interaction techniques that we all witnessed during the last years, it is almost unbelievable how paper documents and pen-based interaction still represent a very important way of interacting with both physical and digital information spaces. In an effort of re-thinking what pen and paper user interfaces (PPUI) mean in a modern world, we are studying multi-modal interactions of pen+ a range of tangible devices at the intersection of the physical and the digital worlds.

In this talk I will present my latest research around pen- and paper-computing, looking at how multimodal interaction with this "very old" technology enables a range of novel affordances and supports communication and interaction.
In the first part of the talk, I will speak about the development of new systems and prototypes that encompasses pen and other modalities, such as speech and gestures, different devices, such as smart phones, tablets, high-resolution wall displays, as well as different domains such as healthcare, accessibility, data visualization and interaction, social networks, augmented office environments, and communication for early education, older adults and other specific populations. I will present some examples of the prototypes we developed and some brief extracts of the data we collected about their usage in the wild.

The second part of the talk will focus on pen- and paper-based techniques and tools to get richer access to multimodal data in various contexts. While a new generation of inexpensive digital recording devices and storage facilities is revolutionizing data collection in behavioral science, one of the main obstacles to fully capitalizing on this opportunity is the huge time investment required for analysis using current methods. To address this analysis bottleneck we developed ChronoViz, a system providing synchronized interactive visual representations of multiple data streams. By using two multimodal datasets (a recent study of pilot/co-pilot interaction in a Boeing 787 simulator, and an ongoing learning analytics research project), I will present how the analysis tool works and how the integration of paper-based annotations, analysis, and interactions as part of the tool itself enable the exploration of new exciting methods for observational research.

Brief Bio Dr. Nadir Weibel is a Post-doctoral fellow at the University of California San Diego, member of both the Distributed Cognition and Human-Computer Interaction Laboratory and the Ubiquitous Computing and Social Dynamics research group. He holds a Bachelor and Master in Computer Science from ETH Zurich (Dipl. Informatik-Ing. ETH), and a Ph.D. in Computer Science also from ETH Zurich. During his Ph.D, he explored new ways of enhancing a seemingly mundane, but ubiquitous, resource such as paper to support everyday work, interaction and collaboration as a member of the Global Information Systems research group at ETH.
His current research is situated at the intersection of computer science, communication, and social sciences, studying the cognitive consequences of the introduction and the deployment of interactive multimodal and tangible devices. His main interests ranges from software engineering to human computer interaction, including computer supported collaborative work, mobile and ubiquitous computing. In his work he is developing theory and methods, designing representations, implementing prototypes, and evaluating the effectiveness of interactive physical-digital systems in order to understand the broader design space in which they are situated. He is currently collaborating with researchers at UCSD, Stanford, Berkeley, Drexel University, Children's Hospital in Washington DC, TU Darmstadt, INRIA Paris / Université Paris Sud and Telecom Paristech.

Title: Combining Transcription-based and acoustic-based speaker identifications for Broadcast news
Speaker: Sylvain Meignier, Le Maine University, F, , Idiap, Switzerland
Date: Thursday, 22 Dec 2011 - 14:00:00

In this presentation, we consider the issue of speaker identification within audio records of broadcast news. The speaker identity information is extracted from both transcript-based and acoustic-based speaker identification systems. This information is combined in the belief functions framework, which makes coherent the knowledge representation of the problem. The Kuhn-Munkres algorithm is used to optimize the assignment problem of speaker identities and speaker clusters. Experiments carried out on French broadcast news from the French evaluation campaign ESTER show the efficiency of the proposed combination method. keywords: speaker identification, speaker diarization, belief functions.

Title: Speaker Verification Using the Spectral and Time Parameters of Voice Signal
Speaker: Prof. Victor Sorokin, R&D Director OOO V, , Idiap, Switzerland
Date: Tuesday, 20 Dec 2011 - 14:00:00

Abstract - The speaker verification system developed in the VOXSEAL project is based on variations in formantfrequencies at stationary fragments and transient processes of vowels, the spectral features of fricative sounds, and theduration of speech segments. The best features are chosen for each word from the fixed list of Russian numerals rangingfrom zero to nine. The password phrase is randomly generated by the system at each verification. The compensation fordynamic noise and the counteraction with respect to interference using the reproduction of the intercepted and recorded speech are provided by the repeated reproduction of several words. The total error probabilities for male andfemale voices are 0.006 and 0.025%, respectively, for 30 million tests, 429 speakers, and a maximum length of the passwordphrase of 10 words. Note that the probabilities of false identification and false rejection are almost equal Author - Prof. Victor Sorokin, R&D Director OOO Voxseal, Skolkovo-Moscow Russian national, MSc. from Moscow Aviation Institute, PhD (Engineering), Doctor of Sc. Physics and Mathematics (1987). Leading Researcher of the Institute for Information Transmission Problems of Russian Academy of Sciences, member of the Acoustical Society of America, board member of the Russian Acoustical Society, author of the monographs "Theory of Speech Production" and "Speech Synthesis", and about 150 publications, owner of 8 patents in speech technology.

Title: Building-up child-robot relationship for therapeutic purposes
Speaker: Joan Pons, , UPC, Barcelona
Date: Wednesday, 02 Nov 2011 - 16:00:00

Summary: Socially assistive robots (SAR) have shown to be very promising in therapeutic programs with children. Health-related goals such as in-clinic rehabilitation or quality of life improvement have been achieved through social interaction. In this context, robot's effectiveness depends strongly in its ability to elicit long-term engagement in children. To explore the dynamics of social bonds emergence with robots a field study with 49 sixth grade scholars (aged 11-12 years) and 4 different robots was carried out at an elementary school. Children's preferences, expectations on functionality and communication, and interaction behavior were studied. The results showed that different robots appearance and performance elicit in children distinctive perceptions and interactive behavior, and affect social processes as role attribution and attachment. In a similar way, to explore the requirements of an effective human-robot interaction, a quiz game was developed. A NAO robot was used to play the popular game of the 20 questions to evaluate different interaction capabilities (i.e. face following, speech recognition, visual and audio queues, and personalization). ShortBio: Joan Saez Pons did his PhD at the Mobile Machines and Vision Lab (MMVL), Sheffield Hallam University, UK with the topic of multi-robot systems to collaborate with humans. He was as well a Marie-Curie researcher at the Cognitive Neuroscience Department (KN) at University of Tuebingen, Germany. He has been working at the Technical Research Centre for Dependency Care and Autonomous Living (CETpD), UPC, BarcelonaTech, in the field of social robotics and human-robot interaction. His research interests include mobile robotics navigation, multi-robot systems, cognitive robotics and human-robot interaction.

Title: Convex Relaxation Methods for Image Processing
Speaker: Xavier Bresson, , Department of Computer Science at City University of Hong Ko
Date: Thursday, 08 Sep 2011 - 11:00:00

This talk will introduce recent methods to compute optimal solutions to fundamental problems in image processing. Several meaningful problems in imaging are usually defined as non-convex energy minimization problems, which are sensitive to initial condition and slow to minimize. The ultimate objective of our work is to overcome the bottleneck problem of non-convexity. In other words, our goal is to "convexify" the original problems to produce more robust and faster algorithms for real-world applications. Our approach consists in finding a convex relaxation of the original non-convex optimization problems and thresholding the relaxed solution to reach the solution of the original problem. We will show that this approach is able to convexify important and difficult image processing problems such as image segmentation based on the level set method and image registration. Our algorithms are not only guaranteed to find a global solution to the original problem, they are also at least as fast as graph-cuts combinatorial techniques while being more accurate. Finally, I will introduce recent promising extensions of this approach in machine learning. Bio: Prof. Xavier Bresson received his B.A. of Physics from University of Marseille and his Master of Electrical Engineering from Ecole Superieure d'Electricite in Paris, France. He got his Ph.D. at the Swiss Federal Institute of Technology (EPFL) in 2005. From 2006 to 2010, he was a Postdoctoral Scholar in the Department of Mathematics at University of California, Los Angeles (UCLA). In 2010, he joined the Department of Computer Science at City University of Hong Kong as Tenure-Track Assistant Professor. His current research works are focused on convex relaxation methods and unified geometric methods in image processing and machine learning. He has published 38 papers in international journals and conferences.

Title: Scalable multi-class/multi-view object detection
Speaker: Mr. Nima Razavi, , ETH Zurich , Switzerland
Date: Friday, 13 May 2011 - 14:30:00

Scalability of object detectors with respect to the number of classes/views is a very important issue for applications where many object classes need to be detected. While combining single-class detectors yields a linear complexity for testing, multi-class detectors that localize all objects at once come often at the cost of a reduced detection accuracy. In this work, we present a scalable multi-class detection algorithm which scales sublinearly with the number of classes without compromising accuracy. To this end, a shared discriminative codebook of feature appearances is jointly trained for all classes and detection is also performed for all classes jointly. Based on the learned sharing distributions of features among classes, we build a taxonomy of object classes. The taxonomy is then exploited to further reduce the cost of multi-class object detection. Our method has linear training and sublinear detection complexity in the number of classes. We have evaluated our method on the challenging PASCAL VOC'06 and PASCAL VOC'07 datasets and show that scaling the system does not lead to a loss in accuracy.

Title: Latent Feature Models for the Structure and Meaning of Text
Speaker: James Henderson and Paola Merlo, , CLCL, University of Geneva
Date: Friday, 11 Mar 2011 - 11:00:00

Much of the meaning of text is reflected in individual words or phrases, but its full information content requires structured analyses of the syntax and semantics of natural language. Our work on methods for extracting such structured meaning representations from natural language has focused on the joint modelling of syntactic and semantic dependency structures. We have addressed this problem by using latent variables to model correlations between these two structures without strong prior assumptions about the nature of these correlations. These models have achieved state-of-the-art results in both syntactic parsing and semantic role labelling across several languages. We have also used them to exploit syntactic information in correcting semantic roles automatically transferred from translations. Our use of latent variable models is in part motivated by the recognition that the supervised learning paradigm is becoming increasingly impractical as research in natural language processing moves to more complex, deeper levels of semantic analysis. By developing robust efficient methods for learning latent representations, we hope to be able to induce semantic representations from large quantities of data for weakly correlated tasks, such as machine translation. Our latent variable models use vectors of latent features for robust learning and exploit neural networks for efficient approximate inference, while still exploiting methods from dependency parsing for efficient decoding with sufficiently powerful models. (Work with Ivan Titov, Lonneke van der Plas, Nikhil Garg, and Andrea Gesmundo.)

Title: Face Recognition and Intelligent Video Surveillance
Speaker: Prof Stan Z. Li, , Chinese Academy of Sciences
Date: Wednesday, 03 Nov 2010 - 14:00:00

Face recognition and intelligent video surveillance are important areas for the next generation ID management and public security. In this talk, challenges and recent advances and applications of face biometric and intelligent video surveillance technologies will be described. Shot Bio: Stan Z. Li received his B.Eng from Hunan University, China, M.Eng from National University of Defense Technology, China, and PhD degree from Surrey University, UK. He is currently a professor and the director of Center for Biometrics and Security Research (CBSR), Institute of Automation, Chinese Academy of Sciences (CASIA). He worked at Microsoft Research Asia as a researcher from 2000 to 2004. Prior to that, he was an associate Professor at Nanyang Technological University, Singapore. He was elevated to IEEE Fellow for his contributions to the fields of face recognition, pattern recognition and computer vision.

Title: Social Sensing for Epidemiological Behavior Change
Speaker: Anmol Madan, , Northeastern University and Harvard University
Date: Friday, 01 Oct 2010 - 16:00:00

An important question in behavioral epidemiology and public health is to understand how individual behavior is affected by illness and stress. Although changes in individual behavior are intertwined with contagion, epidemiologists today do not have sensing or modeling tools to quantitatively measure its effects in real-world conditions. We propose a novel application of ubiquitous computing. We use mobile phone based co-location and communication sensing to measure characteristic behavior changes in symptomatic individuals, reflected in their total communication, interactions with respect to time of day (e.g., late night, early morning), diversity and entropy of face-to-face interactions and movement. Using these extracted mobile features, it is possible to predict the health status of an individual, without having actual health measurements from the subject. Finally, we estimate the temporal information flux and implied causality between physical symptoms, behavior and mental health.

Bio: Anmol Madan recently completed his PhD at the MIT Media Lab, with Prof. Alex Pentland. Currently, he is working as a post doctoral researcher at Northeastern University and Harvard University with Prof. David Lazer. He has received honors from the MIT 100k Competition and the MIT Enterprise Forum for various startup-related ideas. His research interests are in modeling human behavior using large-scale mobile phone sensor datasets, using applied machine learning and data mining methods. You might have also read about his research in popular media like CNN, BBC, New York Times, Wired, BusinessWeek and Slashdot.

Title: Tell Me Where You have Lived, and I will Tell You What You Like: Adapting Interfaces to Cultural Preferences
Speaker: Abraham Bernstein, , University of Zurich
Date: Monday, 06 Sep 2010 - 11:00:00

Adapting user interfaces to cultural preferences has been shown to improve a user's performance, but is oftentimes foregone because of its time-consuming and costly procedure. Moreover, it is usually limited to producing one uniform user interface (UI) for each nation disregarding the intangible nature of cultural backgrounds. To overcome these problems, we exemplify a new approach with our culturally adaptive web application MOCCA, which is able to map information in a cultural user model onto adaptation rules in order to create personalized UIs. Apart from introducing the adaptation flexibility of MOCCA, the talk describes a study with 30 participants in which we compared UI preferences to MOCCA's automatically generated UIs. Another experiment with over 40 participants from 3 coutnries showed a performance improvement for culturally adapted UIs over Results confirm that automatically predicting cultural UI preferences is possible, paving the way for low-cost cultural UI adaptations. Bio Abraham Bernstein is a full professor of informatics at the University of Zurich, Switzerland. His current research focuses on various aspects of the semantic web, knowledge discovery, service discovery/matchmaking, and mobile/pervasive computing. His work is based on both social science (organizational psychology/sociology/economics) and technical (computer science, artificial intelligence) foundations. Mr. Bernstein is a Ph.D. from MIT and has a Diploma in Computer Science (comparable to a M.S.) from the Swiss Federal Institute in Zurich (ETH). He is the program chair of this year's ISWC and on the editorial board of the International Journal on Semantic Web and Information Systems, the Informatik Spektrum by Springer, Journal of the Association for Information Systems, and the newly approved ACM Transactions on Intelligent Interactive Systems.

Title: Conjugate Mixture Models for Clustering and Tracking Multimodal Data.
Speaker: Vassil Khalidov, , INRIA Grenoble, Perception team
Date: Monday, 28 Jun 2010 - 11:00:00

The problem of multimodal tracking arises whenever the same objects are observed through time by different sensors. We address the general case when the observations from different modalities are not necessarily aligned, in the sense that there is no obvious way to associate or to compare them in some common space. Our objective is to construct a model that is able to estimate the number of objects and to cluster the data so that the clusters stay consistent across modalities through time. We use Bayesian treatment and present an approach, based on stochastic optimization and information criteria. The results are illustrated on a multiple audio-visual object tracking task with a ''robot head'' device, comprising a pair of stereoscopic cameras and a pair of microphones.

Title: Statistical and knowledge-centric techniques in Natural Language Understanding: a valuable handshake?
Speaker: Silvia Quateroni, , University of Trento
Date: Thursday, 11 Mar 2010 - 11:00:00

In this talk, I will draw from my experience in Information Retrieval and Spoken Dialogue Systems to discuss a number of situations where statistical (e.g. machine learning) techniques shake hands with knowledge-centric approaches to meet user needs and account for domain knowledge. I will present examples particularly from the areas of Question Answering and Spoken Language Understanding, two research fields that exhibit a number of common points. Short biography: Silvia Quarteroni is a Senior Marie Curie Research Fellow involved in the ADAMACH project at the University of Trento. She received her MSc and BSc in Computer Engineering at the Swiss Federal Institute of Technology in Lausanne (EPFL) and her PhD in Computer Science at the University of York (UK). She has been working in several fields of Natural Language Processing, focusing on human-computer dialogue, information retrieval and personalization. She has published about 30 articles in international conferences and journals and is part of the programme committee of several of these.

Title: Subband temporal envelopes of speech signal and their central role in speech recognition by humans and machines
Speaker: Cong-Thanh Do, , Institut Télécom, Télécom Bretagne, Brest, France
Date: Friday, 05 Mar 2010 - 11:00:00

The subband temporal envelopes of speech signal have a central role in this presentation which can be split in three parts. The first part of the presentation deals with the automatic recognition of cochlear implant-like spectrally reduced speech (SRS) [1]. The automatic speech recognition (ASR) system, which was trained on TI-digits database, is HMM-based and the speech feature vectors are the MFCCs along with the delta and acceleration coefficients. We show that from certain SRS spectral resolution, it is possible to achieve word accuracy as good as that attained with the original clean speech even though the SRS is synthesized only from subband temporal envelopes of the original clean speech [2]. This work motivated some perspectives on noise robust ASR and speech feature vector enhancement dedicated to ASR [3]. The human recognition of speech is addressed in the second part of the presentation. We present quantitative analyses on the speech fundamental frequency (F0) in the cochlear implant-like SRS which support the report of Zeng et al. 2005 [4], based on subjective tests, about the difficulty of cochlear implant users in identifying speakers. That is, the F0 distortion in state-of-the-art cochlear implant is great when the SRS, which is acoustic simulation of cochlear implant, is synthesized only from subband temporal envelopes [5]. The analyses revealed also a significant reduction of F0 distortion when the frequency modulation is integrated in cochlear implant, as proposed by Nie et al. 2005 [6]. On the other hand, the results of such quantitative analysis could be exploited to conduct subjective studies in cochlear implant research. The third part of the presentation concerns the audio-visual speech processing in which a linear relationship between the subband temporal envelopes and the area of mouth opening was mathematically proposed [7]. This proposition is based on the pioneering research of Grant and Seitz [8] in which the author reported different degrees of correlation between acoustic envelopes and visible movements. Our mathematical model helps in estimating the area of mouth opening only from speech acoustics using blind deconvolution techniques [9]. The estimated area of mouth opening is sufficiently correlated with the manually measured ones with an average of correlation coefficients equals 0.73. Biography: Cong-Thanh Do was born in Hanoi, Vietnam, in 1983. He received the Electrical Engineering degree from Hanoi University of Technology, Hanoi and Grenoble Institute of Technology, Grenoble, France, in 2006, through the Programme de Formation d'Ingénieurs d'Excellence au Vietnam (PFIEV). In 2007, he received the M.S degree in signal, image, speech, and telecommunication from the Grenoble Institute of Technology, Grenoble, France and performed a research internship in the Speech and Cognition Department of GIPSA-Lab, Grenoble, France. He is currently working toward the Ph.D. degree in the Signal and Communications Department, Insitut Télécom, Télécom Bretagne, UMR CNRS 3192 Lab-STICC, Technopôle Brest-Iroise, Brest, France. His current research interests include automatic speech recognition, audio-visual speech processing and statistical signal processing.

Title: IDIAP Newcomers
Speaker: Hervé Bourlard, , Idiap, Switzerland
Date: Tuesday, 30 Jan 2007 - 17:00:00

If you are an IDIAP newcomer and we haven't had a chance to meet yet (e.g., at the previous similar meeting), I would like to invite you for a meeting all together for informal introduction, discussions, and Q&As.

Title: Dry-run of my PhD defense
Speaker: G. Lathoud, , IDIAP, Switzerland
Date: Friday, 24 Nov 2006 - 16:00:00

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays Knowing the location of human speakers permits a wide spectrum of applications, including hearing aids, hands-free speech processing in cars, surveillance, intelligent homes and offices, autonomous robots. This thesis focuses on the use of microphone arrays to analyze spontaneous multi-party speech. This is a challenging task, because such speech contains many very short utterances, and people interrupt a lot each other (overlapped speech). Moreover, in order to build applications with the least possible constraints on the users, we use distant microphones only, for example on a meeting room table. Finally, the developed approaches are as unsupervised as possible, having in mind the dominant proportion of non-technical users. We targeted the development of an automatic system that can handle both moving and static speakers, in order to answer the question "Who spoke where and when?". Several issues were investigated, from the signal processing level (where? when?) to the speaker clustering level (who?). The techniques developed in the course of this research were successfully tested on a large variety of real indoor recordings, including cases with multiple moving speakers as well as seated speakers in meetings. The versatility of the proposed techniques is illustrated by a direct application to two related cases: hands-free speech acquisition in cars, and noise-robust speech recognition through telephones. Finally, a close analysis of the speaker clustering results leads to question the linearity of the transmission channel in a real indoor environment, when a speaker is a few meters away from a microphone.

Title: A Music Discovery Engine based on Audio Similarities
Speaker: Nicolas SCARINGELLA, , EPFL
Date: Monday, 10 Jul 2006 - 16:00:00

A Music Discovery Engine based on Audio Similarities In the context of Electronic Music Distribution, huge databases coming from both restoration of existing analog archives and new content have been created and are continuously growing. The biggest online services are now proposing around 2 millions tracks urging for efficient ways to browse collections. Providing the kind of robust access to the world's vast store of music that we currently provide for textual material has been the goal of the Music Information Retrieval (MIR) community over the past 10 years; however, it still remains a very challenging problem in the case of audio data. Music information is indeed a multifaceted and sometimes complex data set that includes pitch, temporal (i.e. rhythm), harmonic, timbral (e.g. orchestration), textual (i.e. lyrics), symbolic, editorial, and metadata elements (without considering related visual elements). Music information is also extremely dynamic. That is, any given work can have its specific pitches altered, its rhythm modified, its harmony reset, its orchestration changed, its performance reinterpreted, and its performers arbitrarily chosen; yet, somehow, it remains the "same" piece of music as the "original". Within this extraordinarily fluid environment, the concept of "similarity" becomes particularly problematic while being crucial to design audio and music information retrieval systems. In this talk, we will discuss the concept of similarity between music excerpts and propose possible research directions to build a music discovery engine based on audio analysis.

Title: Prior Knowledge in Kernel Methods (PhD defense rehearsal)
Speaker: Alexei Pozdnoukhov, , IDIAP, Switzerland
Date: Thursday, 29 Jun 2006 - 15:00:00

Kernel Methods are one of the most successful branches of Machine Learning. They allow applying linear algorithms with well-founded properties such as generalization ability, to non-linear real-life problems. Support Vector Machine is a well-known example of a kernel method, which has found a wide range of applications in data analysis nowadays. In many practical applications, some additional prior knowledge is often available. This can be the knowledge about the data domain, invariant transformations, inner geometrical structures in data, some properties of the underlying process, etc. If used smartly, this information can provide significant improvement to any data processing algorithm. Thus, it is important to develop methods for incorporating prior knowledge into data-dependent models. The main objective of this thesis is to investigate approaches towards learning with kernel methods using prior knowledge. Invariant learning with kernel methods is considered in more details.

Title: PhD defense Dry run:
Speaker: Norman Poh, , IDIAP, Switzerland
Date: Wednesday, 24 May 2006 - 16:00:00

This thesis presentation is about combining multiple systems applied to biometric authentication. Its two-fold contribution is to provide a better understanding of the problem of fusion (w.r.t to correlation, performance strength of individual systems and noise) and to exploit the knowledge of claimed identity to improve the combined system performance. Conditioning on the claimed identity is difficult because one has to deals with a small learning sample size.

Title: Using Auxiliary Sources of Knowledge for Automatic Speech Recognition
Speaker: Mathew Magimai Doss, , IDIAP, Switzerland
Date: Friday, 27 May 2005 - 16:00:00

This is the second rehearsal of my PhD defense presentation. Your comments and suggestions would be of great help. Thank You!

Title: ACM MultiMedia conference report
Speaker: Florent Monay, , IDIAP, Switzerland
Date: Monday, 24 Nov 2003 - 11:00:00

I will describe some papers and demos from ACM MultiMedia 2003 and MIR2003 workshop (content-based multimedia information retrieval, home videos browsing/editing, home photos browsing, surveillance, sports video indexing, ...). A discussion about the corresponding research directions will follow.