Idiap EVENTS
Idiap is organizing various events open to the public, such as conferences and workshops. You will fin below the list of upcoming and past events.
When recorded, past events can be watched on our Youtube Channel or on our spin-off's platform Klewel that includes searchable slides
UP COMING EVENTS
Idiap Create Challenge 2023
Aug 16, 2023 09:00 AM

Idiap Research Institute
Ever been frustrated by a partially achieved prototype created during a hackathon? As engineers, researchers, and creators, we experienced this limitation first-hand. Therefore we came up with our own unique 9-day AI SUPER HACKATHON! The best way to transform your ideas into prototypes!
Registrations are possible until June 15th 2023.
More information:
PAST EVENTS
EAB & CITeR Biometrics Workshop
Apr 18, 2023 09:00 AM

Idiap Research Institute
The European Association for Biometrics (EAB) and the Center for Identification Technology Research (CITeR) are organising a 2-day biometrics workshop, hosted by the Idiap Research Institute in Martigny Switzerland on April 18-19 2023, on the topics of Bias mitigation, Template Security, Presentation Attack Detection and Deepfakes. This workshop is a second edition of the EAB workshop on Presentation Attack Detection previously hosted at the Idiap Research Institute in 2020.
The workshop will be co-located with the CITeR Spring 2023 Program Review and training events of the TReSPAsS-ETN and PriMa-ITN projects.
Entrance fees: normal: 360,00 EU / EAB members: 240,00 EU / CITeR member and/or CITeR speaker (in person): 90,00 EU / EAB Member virtual | CITeR member virtual: 100,00 EU / non-EAB/CITeR member virtual | (non)-EAB speakers: 200,00 EU
More:
Organizers: The European Association for Biometrics (EAB) in collaboration with the Center for Identification Technology Research (CITeR) and IDIAP
Automatic analysis of Parkinson's disease: unimodal and multimodal perspectives
Mar 23, 2023 03:00 PM
Prof. Juan Rafael Orozco-Arroyave
Parkinson's disease (PD) is a (mainly) movement disorder and appears due to the progressive death of dopaminergic neurons in the substantia nigra of the midbrain (part of the basal ganglia). Diagnosis and monitoring of PD patients are still highly subjective, time-consuming, and expensive. Existing medical scales used to evaluate the neurological state of PD patients cover many different aspects, including activities of daily living, motor skills, speech, and depression. This makes the task of automatically reproducing experts' evaluations very difficult because several bio-signals and methods are required to produce clinically acceptable/practical results.
This talk tries to show how different bio-signals (e.g., speech, gait, handwriting, and facial expressions) can be used on the way to find suitable models for PD diagnosis and monitoring. Results with classical feature extraction and classification methods will be presented along with CNN and GRU -based architectures.
Juan Rafael Orozco-Arroyave was born in Medellín, Colombia in 1981. He is an Electronics Engineer from the University of Antioquia (2004). From 2004 to 2009 he was working for a telco company in Medellín, Colombia. In 2011 he finished the MSc. degree in Telecommunications from the Universidad de Antioquia. In 2015 he finished his PhD in Computer Science in a double degree program between the University of Erlangen (Germany) and the University of Antioquia (Colombia). Currently, Juan Rafael Orozco-Arroyave is a Full Professor at the University of Antioquia and an adjunct researcher at the Pattern Recognition Lab at the University of Erlangen.
Making Sense of ChatGPT
Understanding the technology, its limitations and opportunities
Recent Artificial Intelligence (AI) models such as ChatGPT have captured the imagination of the public on the recent advances of the field, sparking a debate on the potential applications and implications of this technology.
The event is dedicated to the discussion of Large Neural Language Models, the underlying framework behind systems such as ChatPGT. In the workshop we will aim to answer some critical questions about the principles behind these models and the associated challenges, emerging applications and opportunities that this technology brings.
The workshop is targeted towards academic and industrial participants willing to understand how these models work, their strengths, limitations and general societal impact.
The detailed program is available here:
REGISTRATIONS ARE CLOSED!
Regularized information geometric and optimal transport distances for Gaussian processes
Mar 07, 2023 11:00 AM

Dr Minh Ha Quang (RIKEN AIP)
Information geometry (IG) and Optimal transport (OT) have been attracting much research attention in various fields, in particular machine learning and statistics. In this talk, we present results on the generalization of IG and OT distances for finite-dimensional Gaussian measures to the setting of infinite-dimensional Gaussian measures and Gaussian processes. Our focus is on the Entropic Regularization of the 2-Wasserstein distance and the generalization of the Fisher-Rao distance and related quantities. In both settings, regularization leads to many desirable theoretical properties, including in particular dimension-independent convergence and sample complexity. The mathematical formulation involves the interplay of IG and OT with Gaussian processes and the methodology of reproducing kernel Hilbert spaces (RKHS). All of the presented formulations admit closed form expressions that can be efficiently computed and applied practically.
Minh Ha Quang is currently a unit leader at RIKEN AIP (Advanced Intelligence Project) in Tokyo, Japan, where he leads the Functional Analytic Learning Unit. He received the PhD degree in Mathematics from Brown University (RI, USA) under the supervision of Stephen Smale. Before joining AIP, he was a researcher at the Italian Institute of Technology in Genova, Italy. His current research focuses on functional analytic and geometrical methods in machine learning and statistics. More information: <a href="https://aip.riken.jp/labs/generic_tech/funct_anl_learn/" target="_blank">Center for Advanced Intelligence Project</a>
Understanding Neural Speech Embeddings for Speech Assessment
Jan 20, 2023 10:00 AM
Prof. Elmar Nöth
In this talk, we present preliminary results on experiments which were performed in order to understand, what information is represented in which layer of deep neural networks. We will motivate our experiments with an image processing problem (identification of orca individuals based on the dorsal fin), where we show that the result of unsupervised clustering of previously unseen individuals strongly depends on the underlying embedding and for what that embedding was trained in a supervised manner. We then present preliminary results on t-SNE projections of different pathologic an control corpora based on the different layers of a pre-trained wav2vec2 module and end with an outlook to current and future research.
Elmar Nöth is a Professor for Applied Computer Science at the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) in Germany. He works at the Pattern Recognition Lab of the FAU and is the head of the speech group. He is author or co-author of more than 500 articles. His current interests are prosody, analysis of pathologic speech, computer aided language learning, emotion analysis, and the analysis of animal communication.
Action Recognition for People Monitoring
Aug 17, 2022 11:00 AM

François Brémond – STARS – INRIA – Sophia Antipolis
In this talk, we will discuss how Video Analytics can be applied to human monitoring using as input a video stream. Existing work has either focused on simple activities in real-life scenarios, or on the recognition of more complex (in terms of visual variabilities) activities in hand-clipped videos with well-defined temporal boundaries. We still lack methods that can retrieve multiple instances of complex human activity in a continuous video (untrimmed) flow of data in real-world settings.
Therefore, we will first review few existing activity recognition/detection algorithms. Then, we will present several novel techniques for the recognition of ADLs (Activities of Daily Living) from 2D video cameras. We will illustrate the proposed activity monitoring approaches through several home care application datasets: Toyota SmartHome, NTU-RGB+D, Charades and Northwestern UCLA. We will end the talk by presenting some results on home care applications.
Keywords: people tracking, behavior understanding, activity monitoring.
François Brémond is a Research Director at Inria Sophia Antipolis-Méditerranée, where he created the STARS team in 2012. He has pioneered the combination of Artificial Intelligence, Machine Learning and Computer Vision for Video Understanding since 1993, both at Sophia-Antipolis and at USC (University of Southern California), LA. In 1997 he obtained his PhD degree in video understanding and pursued this work at USC on the interpretation of videos taken from UAV (Unmanned Airborne Vehicle). In 2000, recruited as a researcher at Inria, he modeled human behavior for Scene Understanding: perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition. He is a co-founder of Keeneo, Ekinnox and Neosensys, three companies in intelligent video monitoring and business intelligence. He also co-founded the CoBTek team from Nice University in January 2012 with Prof. P. Robert from Nice Hospital on the study of behavioral disorders for older adults suffering from dementia. He is author or co-author of more than 250 scientific papers published in international journals or conferences in video understanding. He has (co)- supervised 20 PhD theses. More information is available at: https://www-sop.inria.fr/members/Francois.Bremond/
The e-David project: Painting strategies and their influence on robotic painting
Aug 02, 2022 02:00 PM

Prof. Dr. Oliver Deussen, University of Konstanz
Our drawing robot e-David is able to create paintings using visual feedback. So far, our paintings have been created using a stroke-based metaphor. In my talk I will speak about the development of a n umber of stroke-based styles. However, being in close contact with artists we realized at some point that painting can much better be modeled by interacting and contrasting areas instead of strokes - which are more the basis of drawings. This paradigm shift allows us to construct paintings from a different perspective; the interaction between areas enables us to model different forms of abstraction and reshape areas according to style settings. We will also be able to integrate machine-learning based tools for analyzing and deconstructing input images. This enhances our creative space and will allow us to find our own forms of machine abstraction and representation.
Prof. Deussen graduated at Karlsruhe Institute of Technology and is professor for visual computing at University of Konstanz (Germany) and visiting professor at the Shenzhen Institute of Applied Technology (Chinese Academy of Science). He is one of the speakers of the Excellence Cluster "Centre for the Advanced Study of Collective Behavior" and built the SFB Transregio "Quantitative Methods for Visual Computing", a large research project conducted together with University of Stuttgart. Until 2021 he was President of the Eurographics Association and served as Co-Editor in Chief of Computer Graphics Forum from 2012 to 2015. His areas of interest are modeling and rendering of complex biological systems, non-photorealistic rendering as well as Information Visualization. He also contributed papers to geometry processing, sampling methods and image-based modelling.
Artificial Intelligence meets Digital Forensics: a panorama
Jul 14, 2022 02:00 PM

Professor Anderson Rocha
In this talk, we will discuss a panoramic view of digital forensics in the last 10 years and how it needed to evolve from basic computer vision and simple natural language processing techniques to powerful AI-driven methods to deal with the signs of the new age. We will discuss tampering detection, fact-checking, deepfakes, and authorship analysis as well as recent advances in self-supervised learning to deal with large-scale search in some forensics problems.
Anderson Rocha is a full professor of Artificial Intelligence and Digital Forensics at the Institute of Computing, University of Campinas (Unicamp), Brazil. He is the Director of the Artificial Intelligence Lab., Recod,ai, and also the Director of the Institute for the 2019-2023 term. He has actively worked as an associate editor of important international journals such as the IEEE Transactions on Information Forensics and Security (T.IFS), Elsevier Journal of Visual Communication and Image Representation (JVCI), and IEEE Signal Processing Letters (SPL), and the IEEE Security & Privacy Magazine. He is an elected affiliate member of the Brazilian Academy of Sciences (ABC) and the Brazilian Academy of Forensic Sciences (ABC). He is a two-term elected member of the IEEE Information Forensics and Security Technical Committee (IFS-TC) and its chair for the 2019-2020 term. He is a Microsoft Research and a Google Research Faculty Fellow, important academic recognitions bestowed to researchers by Microsoft Research and Google, respectively. In addition, in 2016, he has been awarded the Tan Chin Tuan (TCT) Fellowship, a recognition promoted by the Tan Chin Tuan Foundation in Singapore. Finally, he is ranked Top-2% among the most influential scientists worldwide, according to recent studies from Research.com and Standford/PlosOne.
Physics-based modeling and the quest for intelligent robots
Jun 30, 2022 02:00 PM

Prof. Stelian Coros, Computational Robotics Lab, ETH Zurich
Thanks to recent advances in sensing, perception and actuation technologies, robots are no longer just mindless machines designed to perform repetitive tasks on factory floors. Nevertheless, the vision of intelligent robotic assistants capable of helping us with every-day tasks at work and at home remains elusive. This is to a large extent because, unlike humans, robots lack an innate understanding of the physical principles that govern the dynamics of the physical world. To overcome this technological barrier, our group develops theoretical and algorithmic foundations for computational models that enable machines to predict how physical objects move and deform. Our efforts in this area have led to an analytically differentiable formulation of dynamics for multi-body systems. Within a unified framework, our simulation model handles rigid bodies, deformable objects, as well as frictional contact. In this talk, I will present our simulation framework and show how it can be used for tasks such as trajectory optimization, policy learning and computational design. Through a set of applications that range from soft robot locomotion to dynamic manipulation of deformable objects, I will also highlight early successes in using our simulation model to bridge the reality gap.
Stelian Coros is an associate professor in the Department of Computer Science at ETH Zurich, where he leads the Computational Robotics Lab. Prior to joining ETH Zurich, he was an assistant professor in the Robotics Institute at Carnegie Mellon University, and he received his PhD in Computer Science from the University of British Columbia in 2011. Through fundamental advances in numerical simulation and motion control algorithms, Stelian's research bridges the fields of robotics, visual computing and computational fabrication. Applications of his work range from studying the principles of dexterous manipulation and legged locomotion to computation-driven design for new breeds of bioinspired robots. Stelian is the recipient of an Alain Fournier Ph.D. Dissertation Award, an Intel Early Career Faculty Award, a Research Initiation Award by the US National Science Foundation, and in 2020 he was awarded an ERC Consolidator grant.
Optical Tracking - from the lab to the NBA and the English Premier League
Jun 28, 2022 11:00 AM

Horesh Ben Shitrit and Charles Dubout
In this talk, a complete automatic real-time system for sport analytics, including tracking 3D skeletal pose of multiple players and the ball from multiple video cameras will be presented. This system was developed by Second Spectrum and successfully deployed in top tier sport leagues including the NBA and the English Premier League.
About Second Spectrum
Second Spectrum creates products that fuse design with spatiotemporal pattern recognition, machine learning, and computer vision to create sports insight and experiences. In Sep 2015, Second Spectrum acquired PlayfulVision, a spin-off from EPFL, which provided video-based player and ball tracking technology. After integrating its core technology, Second Spectrum has signed multiple multi-years league-wide contracts with major sports leagues such as the NBA (basketball), the English Premier League and MLS (football) to be their official player tracking provider. In May of 2021, Second Spectrum was acquired by the Genius Sports group, which covers more than 400 leagues worldwide.
Dr. Horesh Ben Shitrit is the head of the Computer Vision group at Second Spectrum. Dr. Ben Shitrit received his PhD in Computer Science at the Swiss Federal Institute of Technology in Lausanne (EPFL). His research at EPFL focused on behavioral analysis from multiple video sources using people detection, tracking and identification. After obtaining his PhD, he was the CEO and co-founder of PlayfulVision, a spin-off from EPFL, which provided video-based player and ball tracking technology. Then the startup was acquired by Second Spectrum. Dr. Ben Shitrit also supports entrepreneurship and serves as a mentor at Innosuisse and startup accelerators. Dr. Charles Dubout is leading the Computer Vision Algorithms team at Second Spectrum. Dr. Dubout received his PhD in Computer Science from the Idiap Research Institute in Switzerland. His research at Idiap focused on large scale machine learning and fast object detection. After obtaining his PhD, he participated in the creation of PlayfulVIsion where he led the development of the core technologies (camera calibration, player/ball detection and recognition, tracking). After the startup was acquired by Second Spectrum, he has been focusing on researching, training and deploying novel Computer Vision algorithms to real-time tracking systems for top sport leagues around the world.
Active interaction between robots and humans for automatic curriculum learning and assistive robotics
Nov 17, 2021 10:00 AM

Dr. Sao Mai Nguyen - IMT Atlantique
We illustrate through the example of a robot coach for physical rehabiliation the application of GMM in Riemanian manifolds, but also the need to represent complex movements and tasks, as well as the need to evaluate the motivation for patients to interact with their coach.
This motivation to interact is modeled through the theory of intrinsic motivation.
Multi-task learning by robots poses the challenge of the domain knowledge: complexity of tasks, complexity of the actions required, relationship between tasks for transfer learning. However, this domain knowledge can be learned to address the challenges of high-dimensionality and unboundedness in life-long learning. For instance, the hierarchy between tasks of various complexities can be learned to bootstrap transfer of knowledge from simple to composite tasks.
We focus in hierarchical reinforcement learning framework, on algorithms based on intrinsic motivation to explore the action space and task space. They can discover the relationship between tasks and learnable subtasks. Robots can efficiently associate sequences of actions to multiple control tasks: representations of task dependencies, emergence of affordances mechanism, curriculum learning and active imitation learning. These active learning algorithms choose the most appropriate exploration strategy based on empirical measures of competence and learning progress. It infers its curriculum by deciding which tasks to explore first, how to transfer knowledge, and when, how and whom to imitate.
Sao Mai Nguyen specializes in cognitive developmental learning, reinforcement learning, imitation learning, curriculum learning for robots and human activity recognition. She received her PhD from Inria in 2013 in computer science, for her machine learning algorithms combining reinforcement learning and active imitation learning for interactive and multi-task learning. She holds an Engineer degree from Ecole Polytechnique, France and a master's degree in adaptive machine systems from Osaka University, Japan. She has enabled a robot to coach physical rehabilitation in the projects RoKInter and the experiment KERAAL she coordinated, funded by the European Union through FP-7 project ECHORD++. She has participated in project AMUSAAL, for analysing human activities of daily living through cameras, and CPER VITAAL for developing assistive technologies for the elderly and disabled. She is currently an assistant professor at Ensta-Paris and is affiliated to IMT Atlantique, France. She also acts as an associate editor of the journal IEEE TCDS and the co-chair of the Task force "Action and Perception" of the IEEE Technical Committee on Cognitive and Developmental Systems. For more information visit her webpage: https://nguyensmai.free.fr.
Qualitiy of Life, what vision for the future?
Sep 11, 2021 04:00 PM
Melanie Mitchell (Santa Fe) & Stuart Russell (Berkeley)
To celebrate its 50th anniversary, the Dalle Molle Foundation is organizing a conference including two AI oriented speeches by renowned international speakers:
- Artificial intelligence for Thinking Humans by Prof. Melanie Mitchell from the Santa Fe Institute
- Human compatible: AI and problem of control by Prof. Stuart Russell from the University of California, Berkeley, and Honorary Fellow, Wadham College, Oxford
Free registrations and the full program are available here: https://idiap.ch/dallemolle
Schille Conference Room: ground floor, Idiap Research Institute
PhD public defense: Explainable Phonology-based Approach for Sign Language Recognition and Assessment
May 28, 2021 05:00 PM
Sandrine Tornay
Sign language technology, unlike spoken language technology, is an emerging area of research. Sign language technologies can help in bridging the gap between the Deaf community and the hearing community. One such computer-aided technology is sign language learning technology. To build such a technology, there is a need for sign language technologies that can assess sign production of learners in a linguistically valid manner. Such a technology is yet to emerge. This thesis is a step towards that, where we aim to develop an "explainable" sign language assessment framework. Development of such a framework has some fundamental open research questions: (a) how to effectively model hand movement channel? (b) how to model the multiple channels inherent in sign language? and (c) how to assess sign language at different linguistic levels?
The present thesis addresses those open research questions by: (a) development of a hidden Markov model (HMM) based approach that, given only pairwise comparison between signs, derives hand movement subunits that are sharable across sign languages and domains; (b) development of phonology-based approaches, inspired from modeling of articulatory features in speech processing, to model the multichannel information inherent in sign languages in the framework of HMM, and validating it through monolingual, cross-lingual and multilingual sign language recognition studies; and (c) development of a phonology-based sign language assessment approach that can assess in an integrated manner a produced sign at two different levels, namely, lexeme level (i.e., whether the sign production is targeting the correct sign or not) and at form level (i.e. whether the handshape production and the hand movement production is correct or not), and validating it on the linguistically annotated Swiss German Sign Language database SMILE.
Computational methods for live heart imaging with speed-constrained microscopes
Defense de these publique (en francais)
Ce travail de these couvre plusieurs methodes numerique pour assembler des series d'images dynamiques du coeur a partir d'une large gamme de microscopes, dont notamment des microscopes confocaux a balayage laser, qui sont lents mais courants, et des microscopes dont la frequence d'acquisition d'images est lente.
Olivia is a PhD candidate in the Computational Bioimaging Group.
Transformer-based Meta-Imitation Learning and robot kinematic feasibility learning for robot manipulation
Nov 27, 2020 02:00 PM

Dr. Julien Perez and Dr. Seungsu Kim (Naver Labs Europe)
This talk will present two recent work done at Naver Labs Europe at the intersection of robotic manipulation and deep learning.
In the first part of the talk, we will discuss about imitation learning. Recently, one-shot imitation learning has shown encouraging results for executing variations of initial conditions of a given task without requiring task-specific engineering. However, it remains inefficient for generalizing in variations of tasks involving different reward or transition functions. In this work, we aim at improving the generalization ability of demonstration based learning to unseen tasks that are significantly different from the training tasks. We introduce the use of transformer-based sequence-to-sequence policy networks trained from limited sets of demonstrations. Then, we propose to meta-train our model from a set of training demonstrations by leveraging optimization-based meta-learning. We evaluate our approach and report encouraging results using the recently proposed framework Meta-World which is composed of a large set of robotic manipulation tasks organized in various categories.
In the second part of the talk, we will present our recent work on validating kinematic feasibility of a planned robot motion and finding corresponding inverse solutions are time-consuming processes, especially for long-horizon manipulation tasks. As most existing approaches are based on solving iterative gradient based optimization, the processes are time consuming and have high risk of falling in local minima. So, we propose a unified framework to learn a kinematic feasibility model and one-shot inverse mapping model for a redundant robot manipulator. Once they are trained, the models can compute kinematic reachability of a target pose and its inverse solutions directly without iterative process. We validate our approach using a 7-DOF robot arm with an object grasping application.
JULIEN PEREZ is leading the Machine Learning and Optimization (MLO) group at Naver Labs Europe. His current research interests are deep learning, differentiable programming and reinforcement learning applied to robotic manipulation systems. After an engineering degree in computer science and a masters in machine learning and artificial intelligence from the University of Paris Dauphine, Julien obtained his PhD in deep reinforcement learning from Paris Sud University, France. After 2 years as a principal lecturer at Univ. Paris 12 in the domain of autonomous network systems, Julien Perez joined what was then the Xerox research labs in the fall of 2013 (Naver Labs Europe since August 2017) as a research scientist in the Machine Learning for Services team. -------------------------------------------------- SEUNGSU KIM is a senior research scientist at Naver Labs Europe in Machine Learning and Optimization group. He received his Ph.D. in robotics from the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland, in 2014. He was a researcher with the Center for Cognitive Robotics Research, Korea Institute of Science and Technology (KIST), Korea, from 2005 to 2009. He completed his two postdoctoral researches with the Neuroinformatics Group, CITEC, Bielefeld University (from 2014 to 2016) and with ISIR, Sorbonne university (from 2016 to 2018). He worked for Dyson in UK as a Senior Robotics Engineer from 2018 to 2019. He joined Naver Labs Europe in 2019. In 2015, he was awarded the IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award. His main research interests are in machine learning techniques for autonomous robotic manipulation.
The talk of Dr. Nora AI-Badri is now postponed
Apr 01, 2020 11:00 AM

Dr. Nora Al-Badri
Technoheritage - when AI meets art.
How does future heritage look like? What role can AI and deep learning play? How can we visually reassemble the patterns of specific eras or locations in heritage digitisation such as the richness of forms from Mesopotamia or todays' Iraq? How can we built new decolonial non-biased databases and thereby add to the visual contemporary imaginary?
Nora Al-Badri will talk about her artistic research and her new work - the "Neuronal Ancestral Sculptures Series" that examines the potential of GANs in this context as a new artistic tool and use heritage data from artefacts of today's Iraq, which is data based on ancestral knowledge (forms, patterns, artefacts) to create a form of generative aesthetics that go going beyond representation and mimesis.
She will also show a few other projects such as the Nefertiti Hack or a critical chatbot, that decontextualises colonial museum collections.
Nora Al-Badri is a multi-disciplinary media artist with a German-Iraqi background. She lives and works in Berlin. She graduated in political sciences at Johann Wolfgang Goethe University in Frankfurt/Main. She is currently the first artist-in-residence at EPFL. Her work was featured widely in the media at The New York Times, BBC, The Times, Artnet, Wired, Le Monde Afrique, Financial Times, Arte TV, The Independent, New Statesmen, Hyperallergic, Frankfurter Allgemeine Sonntagszeitung, and Spiegel Online, amongst others. She has exhibited in the Viktoria and Albert Museums' Applied Arts Pavilion at La Biennale di Venezia, 3rd Design Biennal Istanbul, ZKM Karlsruhe, Science Gallery, Dublin, NRW Forum, Space Fundacion Telefonica, Berliner Herbstsalon - Gorki Theater, Ars Electronica, Abandon Normal Devices, The Influencers, and Gray Area Festival Art& Technology.
Dr. Julien Perez's talk is now CANCELED. Another date will be scheduled
Mar 04, 2020 11:00 AM

Dr. Julien Perez (Naver Labs Europe)
Iterative Reasoning Path Retrieval for Multi-Hop Question Answering
Multi-hop Machine Reading necessitates retrieving multiple pieces of evidence over a possibly large collection of documents. One of the challenges comes from the fact that only a few lexical or semantic relationships are overlapping with the question. For this reason, classic information retrieval methods, which are mainly based on word matching techniques, even when distributional, have failed to achieve usable results in such a task. This talk introduces a series of novel neural iterative retrieval approaches that learn to find the sequence of necessary pieces of evidence, also called reasoning paths, to answer open-domain multi-hop questions. The approach trains a neural network based on a multi-headed transformer architecture that learns to retrieve evidence paragraphs conditioned on the previously retrieved documents. Any sequential search algorithm, like beam-search or MCTS, can then be coupled to our model to find the most probable sequence regarding a question. Finally, we use a neural approach as a stopping criteria for sequential retrieval. Experimental results show encouraging results in an open-domain QA dataset, HotpotQA. We will conclude the talk with current limitations and perspectives associated to this task.
Julien Perez leads the Machine Learning and Optimization (MLO) group at Naver Labs Europe. He obtained his PhD in machine learning and autonomous computing from Paris Sud University, France. After 2 years as a principal lecturer at Univ. Paris 12 in the domain of autonomous network systems, he joined what was then the Xerox research labs in the fall of 2013 (Naver Labs Europe since August 2017) as a research scientist in the Machine Learning for Services team. His current research interests are deep learning and differentiable programming, reinforcement learning, machine reading and autonomous dialog systems. He has been on the organisational committee for the Dialog Systems and Technology Challenge (DSTC6, DSTC7).
Brain MRI Analysis and Machine Learning for Diagnosis of Neurodegeneration
Cognitive and neuro-biological profiles of elderly individuals are very heterogeneous. Healthy aging, psychiatric disorders, genetic variability, and neuro-degenerative diseases all contribute to this heterogeneity. Classification systems with categories and sub-categories of neuro-degenerative disorders were traditionally used to identify differences across groups. However, a large portion of the observed variability in clinical manifestation on the individual level remains unexplained. Uncertain diagnostic labels, the presence of possibly multiple interacting conditions, and systematic biases due to technical variations in the acquisition hardware further call for innovative solutions. During his PhD, Ahmed Abdulkadir focused on improving MRI image classification and segmentation using deep (end-to-end) and shallow machine learning. The outcomes from the thesis and the literature corroborated that disruptive innovations in the field will be facilitated by the ability to make predictions with confidence bounds in heterogeneous populations with incomplete data. Methodological grounds and preliminary results of ongoing studies as well as a quick preview on the recently accepted SNF Postdoc.Mobility proposal titled MRI-based characterization of subgroups in aging and dementia will be presented.
After an EPFL diploma in Life Sciences Engineering with Specialization in Neurosciences, Ahmed Abdulkadir worked as research assistant at the Uniklinikum Freiburg-im-Breisgau and subsequently at the Universitäre Psychiatrische Dienste (UPD) Bern (University of Bern).In December 2018, he defended his PhD (Dr.-Ing.) in Computer Science at Albert-Ludwigs Universität, Freiburg-im-Breisgau. Since August 2019 he is a postdotoral fellow at the University of Pennsylvania, Department of Radiology, The Center for Biomedical Image Computing and Analytics (CBICA) and also affiliated with UPD Bern.
Behaviomedics - Objective Assessment of Clinically Relevant Expressive Behaviour
Dec 13, 2019 03:00 PM
Michel Valstar Professor
Behaviomedics is the application of automatic analysis and synthesis of affective and social signals to aid objective diagnosis, monitoring, and treatment of medical conditions that alter one's affective and socially expressive behaviour. Or, put more succinctly, it is the objective assessment of clinically relevant expressive behaviour. Objective assessment of expressive behaviour has been around for a couple of decades at least, perhaps most notably in the form of facial muscle action detection (FACS AUs) or pose estimation. While often presented alongside work on emotion recognition, with many works presented as a solution to both emotion and objective behaviour asessment, the two problems are actually incredibly different in terms of machine learning problems. I would argue that a rethink of behaviour assessment is useful, with emotion recognition and other 'higher level behaviours' building on objective assessment methods. This is particularly pertinent in an era where the interpretability of machine learning systems is increasingly a basic requirement. In this talk I will firstly present our lab's efforts in the objective asssessment of expressive behaviour, followed by three areas where we have applied this to automatic assessment of behaviomedical conditions, to wit, depression analysis, distinguishing ADHD from ASD, and measuring the intensity of pain in infants and adults with shoulder-pain. Finally, I will discuss how we see Virtual Humans can be used to aid the process of screening, diagnosing, and monitoring of behaviomedical conditions.
Michel Valstar (http://www.cs.nott.ac.uk/~mfv) is an associate professor at the University of Nottingham, and member of both the Computer Vision and Mixed Reality Labs. He received his masters' degree in Electrical Engineering at Delft University of Technology in 2005 and his PhD in computer science at Imperial College London in 2008, and was a Visiting Researcher at MIT's Media Lab in 2011. He works in the fields of computer vision and pattern recognition, where his main interest is in automatic recognition of human behaviour, specialising in the analysis of facial expressions. He is the founder of the facial expression recognition challenges (FERA 2011/2015/2017), and the Audio-Visual Emotion recognition Challenge series (AVEC 2011-2019). He was the coordinator of the EU Horizon2020 project ARIA-VALUSPA, which will build the next generation virtual humans, deputy director of the 6M£ Biomedical Research Centre's Mental Health and Technology theme, and recipient of Melinda & Bill Gates Foundation funding to help premature babies survive in the developing world, which won the FG 2017 best paper award. His work has received popular press coverage in, among others, Science Magazine, The Guardian, New Scientist and on BBC Radio. He has published over 90 peer-reviewed papers at venues including PAMI, CVPR, ICCV, SMC-Cybernetics, and Transactions on Affective Computing (h-index 38, >8,400 citations).
Towards robust question-answering systems
Nov 27, 2019 10:00 AM

Andrei Popescu-Belis and Gabriel Luthier
Voice-based virtual assistants on smartphones or smart speakers are quite wide-spread, but they are often specialized in tasks related to their providers' business. Some chatterbots, on the contrary, support extended conversations but offer no access to explicit knowledge. We will present a conversational agent designed at HEIG-VD which combines task-oriented dialogue (to answer questions based on the knowledge enclosed in documents such as Wikipedia pages) with the capacity to chat, i.e. to reply to non-task-oriented utterances and manage the social aspects of a conversation. The combination is achieved using dialogue act recognition, which routes user utterances to one of the components. The system is demonstrated as an 'action' or 'skill' on a smart speaker.
Clinical Natural Language Processing
Clinical research has never been more active and diverse than it is at this moment. Research efforts span national and cultural borders and broad online dissemination of results makes insights available at a global scale with ever decreasing latency. In the face of these developments, individual researchers and practitioners are confronted with a seemingly intractable amount of material (approximately 1 Million scholarly articles are newly published in the life sciences each year). While highly trained human experts excel at making precision diagnoses, coverage, especially for uncommon conditions can be greatly improved. In this talk, we will discuss a range of (deep) machine learning techniques that provide automatic clinical decision support on the basis of large-scale data collections. I will present early and ongoing work on a) Predictive assistants in post-operative care of cardiac surgery patients, that serve as early warning systems in case of undesirable and dangerous complications. b) Automatic summarization of individually long patient records to obtain concise and topically targeted summaries for physicians. c) Data-driven diagnosis of rare diseases that individually occur too infrequently to allow clinical specialists to establish the necessary routine and experience.
Carsten is an assistant professor of medical and computer science at Brown University where he leads the Biomedical AI Lab, specializing in the development of data science and information retrieval techniques with the goal of improving patient safety, individual health and quality of medical care. Before coming to Brown, he studied artificial intelligence and machine learning at the University of Edinburgh, TU Delft and ETH Zurich. Carsten has authored more than 80 conference and journal articles on topics pertaining to automatic large-scale text processing and retrieval as well as information extraction from unstructured natural language resources. Aside from his academic endeavors, he is involved in several deep technology startups in the health sector that strive to translate technological innovation to improved safety and quality of life for patients.
Investigating Multiple facets of communication skill assessment and feedback
Jun 13, 2019 11:00 AM

Dr. Dinesh Babu Jayagopi, Assistant Professor at IIIT Bangalore since Dec 2013
Communication skill is an important soft skill that candidates need and what employers look for, to succeed in teams. From candidate's point of view, automatic feedback to aid behavior change is helpful, and for interviewers interviewing 100 is easier than 1000. Automatic methods to assess and select candidates for an in depth interview or a training program is a relevant problem in social computing, involving analysis of nonverbal and verbal cues, using speech, spoken text and visual behavior analysis.
In our research, we have systematically investigated Asynchronous Video Interviews (AVIs) (starting to be widely deployed by companies like Hirevue) for interviewing, vis-a-vis face-to-face interviews, in terms of interviewer perception, performance of automatic assessment methods and interviewee setting preferences. Face-to-face interviews serve as benchmark setting, while AVIs are relatively new and less understood. Unlike Video resumes which don't allow prompting, AVIs are both scalable (i.e. can be done anywhere anytime) and allow prompting. Later we have also compared spoken vs written interviews - Written in text and hand written form. Apart from this, we will also discuss works on Automatic feedback prediction, and assessment in human agent interaction setting. This leads to automatic follow up question generation, an interesting text generation problem. Finally, we will conclude with some preliminary work in behavior generation in the context of Indian Sign Language Synthesis, and show some demos of the works discussed.
Dr. Dinesh Babu Jayagopi is an Assistant Professor at IIIT Bangalore since Dec 2013, where he heads the Multimodal Perception Lab. Currently he is visiting Prof. Marianne Schmid Mast at the University of Lausanne. His research interests are in Audio-Visual Signal Processing, Applied Machine Learning, and Social Computing. He obtained his doctorate from Ecole Polytechnic Federale Lausanne (EPFL), Switzerland, beginning of 2011, working with Dr. Daniel Gatica Perez at Idiap. He continued for a Post doc on Human robot interactions, working with Daniel and Jean-Marc. He received the Outstanding paper award in the International Conference on Multimodal Interaction (ICMI), 2012, Idiap PhD student research award for the year 2009. More recently, his PhD student's work was nominated for Best student paper in ICMI 2016 Tokyo, Japan. Another work received Best student paper award at MedPRAI 2018. He also received Department of Science and Technology (DST) Young Scientist Start up Grant in 2016. In the past, he has successfully collaborated and executed sponsored research projects with CAIR, DRDO (Defence Research and Development Organization) and NI Systems.
Recent advances in weakly-supervised learning and reliable learning
In this talk, I will introduce our recent research on weakly-supervised learning and reliable learning.
The motivation for weakly-supervised learning is to accurately perform machine learning only from "weak" data that can be collected more easily/cheaply than fully-labeled data. In the first half of this talk, I give an overview of our recently developed empirical risk minimization framework for weakly-supervised classification, covering binary classification only from PU data, PNU data, Pconf data, UU data, SU data, and Comp data (P:positive, N:negative, U:unlabeled, Conf:confidence, S:similar, and Comp:complementary).
For reliable deployment of machine learning systems in the real world, various types of robustness is needed. In the latter half of this talk, I will give an overview of our recent work on robust learning towards noisy training data, changing environments, and adversarial test input.
Finally, I will briefly introduce our RIKEN Center for Advanced Intelligence Project (AIP), which is a national AI project in Japan started in 2016. AIP covers a wide range of topics from generic AI research (machine learning, optimization, applied math., etc.), goal-oriented AI research (material, disaster, cancer, etc.), and AI-in-society research (ethics, data circulation, laws, etc.).
Masashi Sugiyama received the PhD degree in Computer Science from Tokyo Institute of Technology, Japan in 2001. He has been Professor at the University of Tokyo since 2014 and concurrently appointed as Director of RIKEN Center for Advanced Intelligence Project in 2016. His research interests include theory, algorithms, and applications of machine learning. He (co)-authored several books such as Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012), Machine Learning in Non-Stationary Environments (MIT Press, 2012), Statistical Reinforcement Learning (Chapman and Hall, 2015), and Introduction to Statistical Machine Learning (Morgan Kaufmann, 2015). He served as a Program co-chair and General co-chair of the Neural Information Processing Systems conference in 2015 and 2016, and as a Program co-chair for the AISTATS conference in 2019. Masashi Sugiyama received the Japan Society for the Promotion of Science Award and the Japan Academy Medal in 2017.
Deep learning for cosmology: parameter measurement and generation of simulations
Apr 17, 2019 11:00 AM

Dr Tomasz Kacprzak
Deep learning-based analysis methods are gaining interest in cosmology due to their unique ability to create very rich and complex models.
These models are particularly well suited for analysis of large scale structure data, as the matter density fields are comprised of highly nonlinear, complicated features, such as halos, filaments, sheets and voids.
But can this information be utilised by the deep learning algorithm to gain a better understanding of the cosmological model?
In this talk I will present the application of Convolutional Neural Networks (CNNs) for constraining cosmological parameters.
I will compare the constraining power against the commonly used statistic, the power spectrum, and explore different regimes in quality of data and simulations.
Finally, I will introduce the Generative Adversarial Networks (GANs): a CNN-based technique, which can learn from a training set and then generate new, statistically similar data.
I will present a study of applying GANs to generating samples of the cosmic web and discuss the prospects of applying them to render 2D and 3D N-body simulation - like data.
Tomasz Kacprzak is a postdoctoral researcher at the ETH Zurich. His main area of research is observational cosmology with gravitational lensing. He is involved in the cosmic shear analysis of Dark Energy Survey, largest cosmological imaging programme to date. Tomasz worked on shear measurement theory, especially the noise bias calibration strategies with simulations, with application to the DES Science Verifiation dataset. He is currently leading the cosmology analysis with the Monte Carlo Control Loops approach, relying heavily on forward modelling and Approximate Bayesian Computation. He is also interested in optimal information retrieval strategies from cosmological mass maps, and applied the Peak Statistics analysis to DES SV data. Recently, together with Aurelien Lucchi, he won a grant at the Swiss Data Science Centre (SDSC) titled "Deep Learning for Obeservational Cosmology”. Within this programme, he is pioneering Deep Learning approaches to measure cosmological parameters, as well as to develop emulators of cosmological simulations via deep learning -based generative models. He obtained a PhD in Cosmology from the University College London (UCL), and previously a MSc degree in Machine Learning at the UCL.”
Gaussian process optimization with simulation failures
Feb 27, 2019 11:00 AM

Dr. François Bachoc
We address the optimization of a computer model, where each simulation either fails or returns a valid output performance. We suggest a joint Gaussian process model for classification of the inputs (computation failure or success) and for regression of the performance function. We discuss the maximum likelihood estimation of the covariance parameters, with a stochastic approximation of the gradient. We then extend the celebrated expected improvement criterion to our setting of joint classification and regression, thus obtaining a global optimization algorithm. We prove the convergence of this algorithm. We also study its practical performances, on simulated data, and on a real computer model in the context of automotive fan design.
AI and Radiology
Topics covered:
- simple definition AI and Radiology
- radiology needs and problems
- why we've been talking so much about AI and radiology for the past few years
- Can radiologist be replaced by algorithms ?
- the radiologist's value chain and clinical workflow
- the AI and Radiology industrial ecosystem
- AI solutions approved by medical regulators in 2019
- clinical use cases to explore
- possible future scenarios.
Medical Doctor, Swiss Board Certified Radiologist & Neuroradiologist and HealthTech Expert fully committed to bridge between Medicine, Science, Technology and Business, bringing Innovation to Market.Clinical AI expert.
Teaching Robots Social Autonomy from In Situ Human Supervision
Feb 14, 2019 11:00 AM

Dr Emmanuel Senft, Plymouth University
Traditionally the behaviour of social robots has been programmed by engineers, but robots should be able to learn from their users to increase their range of application and improve their behaviour over time. This talk will start by presenting Supervised Progressively Autonomous Robot Competencies (SPARC), a machine learning framework enabling non-technical users to control and teach a robot to interact meaningfully with people in an efficient and safe way. The core premise is that the user initially remotely operates the robot, while an algorithm associates actions to states and gradually learns. Over time, the robot is able to take over from the user while still giving the user oversight of its behaviour by ensuring that every action the robot executes has been actively or passively approved by the user. The latter half of this talk will present results of a recent study evaluating SPARC in a real human-robots interaction where a robot was taught to support children in an educational activity.
Emmanuel Senft is a research fellow at Plymouth University (UK), where he obtained his PhD in Human-Robot Interaction in 2018. Prior to joining Plymouth, he completed his master in robotics at EPFL in 2013 where he worked on locomotion for modular robots. His current research explores Human-Robot Interaction, with a focus on how humans can teach robots to interact socially. During his PhD, he worked with the DREAM project on developing new Robot Assisted Therapies for children with Autism Spectrum Disorders and explored ways to teach robots to interact socially from in situ human supervision.
Optimal Transport for Machine Learning
Optimal transport (OT) has become a fundamental mathematical tool at the interface between calculus of variations, partial differential equations and probability. It took however much more time for this notion to become mainstream in numerical applications. This situation is in large part due to the high computational cost of the underlying optimization problems. There is a recent wave of activity on the use of OT-related methods in fields as diverse as image processing, computer vision, computer graphics, statistical inference, machine learning.
In this talk, I will review an emerging class of numerical approaches for the approximate resolution of OT-based optimization problems. This offers a new perspective for the application of OT in high dimension, to solve supervised (learning with transportation loss function) and unsupervised (generative network training) machine learning problems.
More information and references can be found on the website of our book "Computational Optimal Transport" https://optimaltransport.github.io/
Presentation of Code / RESIDENCY ART RESEARCH / a project connecting art and science
Dec 06, 2018 04:00 PM
Valérie Félix
The project Code is dedicated to research on digital issues within our societal structures. Through a program of artistic residencies, conferences, exhibitions and publications, axes of reflection aim to deploy new discursive questioning, by connecting art and science. The digital integration in our daily lives redefines data related to societal aspects, and it is exactly on this axis of reflection that CODE is situated. Too often seen as a threat to our own reality, or on the contrary embodying the utopian promise of a better world, the ubiquitous digital environment is approached as an element that creates a gap between "true and wrong." Whereas on the contrary, bonds must be created. By allowing art to be integrated as a scientist element of discussion related to our everyday environment and societal, the project Code positioning theoretical and empirical reflections side by side.
Valérie Félix is both an art historian specialized in cultural-digital studies and a visual artist. Graduated of University Montreal,Canada, she received the SSHRC (Social Sciences and Humanities Research Council) scholarship for her master's thesis's researches exploring the importance of traces in interactive digital art. With several workshops and conferences on the digital art subject, she has also collaborated with Canadian and Swiss institutions. She teaches in art schools, while being active in the independent art community by becoming involved in artistic residency structures. Her interests are based on the societal conditioning of images. She lives and works in Switzerland.
Airworthy AI; challenges of certification
Full autonomy in aircraft requires systems that can solve problems that are currently the exclusive domain of human cognitive function. Deep Neural Networks are the only feasible option we have today, but we need new methods to satisfy the burden of proof of airworthyness for the safety authorities. We will lay out the problem and propose directions in which to find the answer.
Dr. Luuk van Dijk is founder and CEO of Daedalean, a startup dedicated to make flying cars have their own pilot's license, so you won't have to.
*** The talk of Florian Metze is canceled. *** Open-domain audiovisual speech recognition and video summarization
Sep 27, 2018 04:00 PM

Florian Metze Associate professor
Video understanding is one of the hardest challenges in AI. If a machine can look at videos and "understand" the events that are being shown, then machines could learn by themselves, perhaps even without supervision, simply by "watching" broadcast TV, Facebook, Youtube, or similar sites. Making progress towards this goal requires contributions from experts in diverse fields, including computer vision, automatic speech recognition, machine translation, natural language processing, multimodal information processing, and multimedia. I will report the outcomes of the JSALT 2018 Workshop on this topic, including advances in multitask learning for joint audiovisual captioning, summarization, and translation, as well as auxiliary tasks such as text-only translation, language modeling, story segmentation, and classification. I will demonstrate a few results on the "How-to" dataset of instructional videos harvested from the web by my team at Carnegie Mellon University and discuss remaining challenges and possible other datasets for this research.
Florian Metze is an Associate Research Professor at Carnegie Mellon University, in the School of Computer Science's Language Technologies Institute. His work covers many areas of speech recognition and multi-media analysis with a focus on end-to-end deep learning. Currently, he focuses on multi-modal processing of speech in how-to videos, and information extraction from medical interviews. He has also worked on low resource and multi-lingual speech processing, speech recognition with articulatory features, large-scale multi-media retrieval and summarization, along with recognition of personality or similar meta-data from speech. He is the founder of the "Speech Recognition Virtual Kitchen" project, which strives to make state- of-the-art speech processing techniques usable by non-experts in the field, and started the "Query by Example Search on Speech" task at MediaEval. He was Co-PI and PI of the CMU team in the IARPA Aladdin and Babel projects. Most recently, his group released the "Eesen" toolkit for end-to- end speech recognition using recurrent neural networks and connectionist temporal classification. He was the local organizer for the JSALT 2017 workshop at CMU, and is a co-leader of the "Grounded Sequence to Sequence Transduction" team at the JSALT 2018 workshop at JHU. He received his PhD from the Universität Karlsruhe (TH) for a thesis on "Articulatory Features for Conversational Speech Recognition" in 2005. He worked at Deutsche Telekom Laboratories (T- Labs) from 2006 to 2009, and led research and development projects involving language technologies in the customer care and mobile services area. In 2009, he joined Carnegie Mellon University, where is also the associate director of the InterACT center. His work is funded by a series of grants from IARPA, DARPA, and NSF as well as industry. He serves on the senior program committee of multiple conferences and on journal editorial boards, and was an elected member of the IEEE Speech and Language Technical Committee from 2011 to 2017. For more information, please see http://www.cs.cmu.edu/directory/fmetze
A unified view of entropy-regularized Markov decision processes
Aug 24, 2018 11:00 AM

Gergely Neu
Entropy regularization, while a standard technique in the online learning toolbox, has only been recently discovered by the reinforcement learning community: In recent years, numerous new reinforcement learning algorithms have been derived using this principle, largely independently of each other. So far, a general framework for these algorithms has remained elusive. In this work, we propose such a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations. This result enables us to formalize a number of state-of-the-art entropy-regularized reinforcement learning algorithms as approximate variants of Mirror Descent or Dual Averaging, and thus to argue about the convergence properties of these methods. In particular, we show that the exact version of the TRPO algorithm of Schulman et al. (2015) actually converges to the optimal policy, while the entropy-regularized policy gradient methods of Mnih et al. (2016) may fail to converge to a fixed point.
Gergely Neu is a research assistant professor at the Pompeu Fabra University, Barcelona, Spain. He has previously worked with the SequeL team of INRIA Lille, France and the RLAI group at the University of Alberta, Edmonton, Canada. He obtained his PhD degree in 2013 from the Technical University of Budapest, where his advisors were Andras Gyorgy, Csaba Szepesvari and Laszlo Gyorfi. His main research interests are in machine learning theory, including reinforcement learning and online learning with limited feedback and/or very large action sets.
Machine Learning for Pattern Recognition at ZHAW: Deep Learning Nuggets for e.g. Speaker Diarization, End-to-End Neural Clustering & OMR
Jul 02, 2018 11:00 AM

Prof. Thilo Stadelmann
The first part of this talk will introduce data science research at ZHAW - from the broader picture of the ZHAW Datalab, an interdisciplinary and inter-departmental network of collaborations within our university comprising more than 70 researchers, to the specific work of my core group of PhD and master students. I will showcase specific results of industrial-academic collaborations to illustrate our setup for applied research. Examples are: automated newspaper segmentation for real-time print media monitoring; enabling database queries in natural language; or improving hiring processes with information retrieval.
The second part will focus on recent results of my core group in pattern recognition with deep learning. I will demonstrate recent results in speaker clustering using learned embeddings; learning novel clustering methods purely from scratch, thus extending current metric learning approaches to neural architectures that learn to output a clustering of previously unseen data / unseen classes end-to-end; and our novel dataset and object detection method to detect many tiny objects on large images of musical symbols.
Thilo Stadelmann is senior lecturer of computer science at ZHAW School of Engineering in Winterthur. He received his doctor of science degree from Marburg University in 2010, where he worked on multimedia analysis and voice recognition. Thilo joined the automotive industry for 3 years prior to switching back to academia. His current research focuses on applications of machine learning, especially deep learning, to diverse kinds of pattern recognition tasks. He is head of the ZHAW Datalab, vice president of the Swiss Group for Artificial Intelligence and Cognitive Sciences, and co-founder of the Swiss Alliance for Data-Intensive Services.
Concentration Bounds for Stochastic Approximation with Applications to Reinforcement Learning
Jun 27, 2018 11:00 AM

Dr. Gugan Thoppe
Stochastic Approximation (SA) is useful when the aim is to find optimal points, or zeros of a function, given only its noisy estimates. In this talk, we will review our recent advances in techniques for SA analysis. This talk has four major parts. In the first part, we will see a motivating application of SA to network tomography. Here, we shall also discuss the convergence of a novel stochastic Kaczmarz method. In the second part, we shall discuss a novel tool based on the Alekseev's formula to obtain the rate of convergence of a nonlinear SA to a specific solution, given that there are multiple locally stable solutions. In the third part, we shall extend the previous tool to the two timescale but linear SA setting. We shall also discuss how this tool applies to gradient Temporal Difference (TD) methods such as GTD(0), GTD2, and TDC used in reinforcement learning. For the analyses in the second and third parts to hold, the initial step size must be chosen sufficiently small, depending on unknown problem-dependent parameters; or, alternatively, one must use projections. In the fourth part, we shall discuss a trick to obviate this in context of the one timescale, linear TD(0) method. We strongly believe that this trick is generalizable. We also provide here a novel expectation bound. We shall end with some future directions.
This is joint work with several people.
Gugan Thoppe is a Post-Doctoral Associate at Duke University, USA with Prof. Sayan Mukherjee. Earlier, he worked with Prof. Robert Adler as an ERC Senior Researcher (postdoc) at Technion, Israel. He did his PhD in Systems Science with Prof. Vivek Borkar at TIFR, India. His work won the TAA-Sasken best thesis award for 2017. He is also a two-time recipient of the IBM PhD fellowship award (2013–14 and 2014-15). His research interests include random topology and stochastic approximation.
Understanding the impact of climate variability on water and food security for US
Apr 24, 2018 04:00 PM

Dr. Laureline Josset
To ensure food and water security, it is crucial to understand the impact of climate on our capabilities to meet water demand for all sectors, and in particular for its most consumptive one, agriculture. A great challenge is the scale at which this needs to be tackled. Indeed, while all water demands need to be met locally, large river and groundwater systems, food distribution networks and political decisions span the scale of states or nations.
In this paper, we propose to quantify the risks associated with climate variability under our current water demands. This assessment is performed by exploring the response of an integrated water model developed for the continental United States at a county scale. The model comprises a surface water network with river nodes and reservoirs, a simple representation of groundwater, water demands at the county-scale and a statistical crop model. The integrated model then allocates water across the sectors by solving an optimization function, where the choice of water sources is driven both by relative cost and water availability as it varies over months and yearsi.
To understand the impact of climate variability, reconstruction of past precipitation, temperature and run-off for the last 60 years are used as input for the integrated model. We analyse the spatial and temporal variation in water stress in response to the climate forcing to highlight regions at risk. This analysis is extended by considering 500 years of streamflow reconstructed from paleoclimate data. We then quantify impacts the risks of crop failures, groundwater depletion, and losses of farmer income. The proposed approach results in a quantification of the risks associated with our present water consumption and might serve as a tool for the identification of integrated solution for water across large spatial scales.
xtensor: High performance array computing in C++
Jan 18, 2018 02:00 PM

Dr. Wolf Vollprecht
xtensor is a C++ high-performance n-dimensional array computing library which sets out to be as fast as Eigen, while borrowing many API ideas from NumPy. The library was started in 2016 and uses many features of modern C++ and a good deal of template metaprogramming to be as efficient as possible. Besides the n-dimension arrays, xtensor also features the same broadcasting rules as NumPy. We hope that xtensor will be of great use in the Robotics community, bridging the gap between Python prototyping and implementing algorithms in C++.
Accompanying to the core package, we already have an ecosystem of several useful libraries:
- xtensor-python (bindings from Python to C++, supporting NumPy arrays)
- xtensor-ros (ROS bindings, making it easy to send & receive xtensor-arrays in C++)
- xtensor-blas (BLAS/LAPACK bindings mimicking the numpy.linalg interface)
- xtensor-fftw (Fast Fourier Transforms using NumPy's API, and the well-known fftw-library)
- xtensor-julia / xtensor-R (bindings for R and Julia)
- xtensor-io (easy reading and writing of audio, image, and NumPy npy/npz files)
Link to interactive presentation on Binder:
mybinder.org/v2/gh/wolfv/presentations/master
Wolf Vollprecht holds a Master in Robotics, Systems and Controls from ETH Zurich. He finished his master thesis 2017 at the Autonomous Systems Lab in Stanford and started a position as scientific software developer at QuantStack in Paris, focusing on building an Open Source Software for scientific computing and robotics.
Can Power-sharing Foster Peace? Evidence From Northern Ireland
In the absence of power-sharing, minority groups in opposition have powerful incentives to substitute the ballot with the bullet. In contrast, when power is shared among all major groups in society, the relative gains of sticking to electoral politics are larger for minority groups. After making the theoretical argument, we provide in the current paper an empirical analysis of the impact of power-sharing at the local level, making use of fine-grained data from Northern Ireland's 26 local district councils over the 1973-2001 period. We find that power-sharing has a sizable and robust conflict-reducing impact.
Holding a PhD in Economics from the University of Cambridge, Dominic Rohner is a Professor of Economics at the University of Lausanne. He is among others an Associate Editor of the Journal of the European Economic Association, and a Research Fellow of CEPR, CESifo, OxCarre and HiCN. His research focuses on political and development economics and has won several prizes, such as for example the KfW Development Bank Excellence Award or the SNIS International Geneva Award. He currently hold a Starting Grant of the European Research Council (ERC) investigating “Policies for Peace”. He has published and forthcoming papers in several leading international journals, including, among others: American Economic Review, Econometrica, Journal of Political Economy, Quarterly Journal of Economics, and Review of Economic Studies.
Algorithms on manifolds: geometric means and recommender systems
Sep 06, 2017 11:00 AM

Prof Bart Vandereycken
Many data in scientific computing and machine learning is highly structured. When this structure is given as a mathematically smooth manifold, it is usually advisable to explcilty exploit this property in theoretical analyses and numerical algorithms. I will illustrate this using two examples. In the first, the manifold is classical: the set of symmetric and positive definite matrices. The problem we consider is the computation of the geometric mean, also called Karcher mean, which is a generalization of the arithmetic mean where we explicitly take into account that the data lives on a manifold. The application is denoising or interpolation of covariance matrices. The other example considers a non-standard manifold: the set of matrices of fixed rank. The application is now recommender systems (the Netflix problem) and the algorithm low-rank matrix completion. I will show that one of the benefits of the manifold approach is that the generalisation to low-rank tensor completion is conceptually straightforward but also computationally efficient.
Bart Vandereycken is an assistant professor in the numerical analysis group at the mathematics department of the university of Geneva. Prior to joining the university of Geneva, he was an instructor of mathematics at Princeton University and a post doc at EPF Lausanne and ETH Zurich. He obtained his PhD at KU Leuven in December 2010. He was awarded the Alston S. Householder award for best PhD thesis in numerical linear algebra. For his research on Riemannian optimization for low-rank matrix equations, he received a Leslie Fox Prize in 2011 and a SIAM Outstanding Paper prize in 2012. His research is on large-scale and high-dimensional problems that are solved numerically using low-rank matrix and tensor techniques. Examples of such problems are the electronic Schrödinger equation, parametric partial differential equations, and low-rank matrix completion. In his work, he tends to focus on practical algorithms that can be formulated on Riemannian matrix manifolds and use techniques from numerical linear algebra. His other research interests include pseudospectra, matrix means, model-order reduction, and multilevel preconditioning.
Computational methods for fluorescence microscopy and quantitative bioimaging
Aug 30, 2017 02:00 PM

Dr. Charles Kervrann Senior Researcher
During the past two decades, biological imaging has undergone a revolution in the development of new microscopy techniques that allow visualization of tissues, cells, proteins and macromolecular structures at all levels of resolution. Thanks to recent advances in optics, digital sensors and labeling probes, one can now visualize sub-cellular components and organelles at the scale of a few dozens nanometers to several hundreds of nanometers. As a result, fluorescent microscopy and multimodal imaging has become the workhorse of modern biology. All these technological advances in microscopy, created new challenges for researchers in quantitative image processing and analysis. Therefore, dedicated efforts are necessary to develop and integrate cutting-edge approaches in image processing and optical technologies to push the limits of the instrumentation and to analyze the large amount of data being produced.
In this talk, we present image processing methods, mathematical models, and algorithms to build an integrated imaging approach that bridges the resolution gaps between the molecule and the whole cell, in space and time. The presented methods are dedicated to the analysis of proteins in motion inside the cell, with a special focus on Rab protein trafficking observed in time-lapse confocal microscopy or total internal reflection fluorescence microscopy. Nevertheless, the proposed image processing methods and algorithms are flexible in most cases, with a minimal number of control parameters to be tuned. They can be applied to a large range of problems in cell imaging and can be integrated in generic image-based workflows, including for high content screening applications.
Charles Kervrann received the M.Sc. (1992), the PhD (1995) and the HDR (2010) in Signal Processing and Telecommunications from the University of Rennes 1, France. From 1997 to 2010, he was researcher at the INRA Applied Mathematics and Informatics Department (1997-2003) and he joined the VISTA Inria research group in 2003 (Rennes, France). In 2010, he was appointed to the rank of Research Director, Inria Research Centre in Rennes. His is currently the head of the Serpico (Space-timE RePresentation, Imaging and cellular dynamics of molecular COmplexes) research group. His work focuses on image sequence analysis, motion estimation, object detection, noise modeling for microscopy and protein trafficking and dynamics modeling in cell biology. He is member of the editorial board of IEEE Signal Processing Letters, member of the IEEE BISP (Bio Imaging and Signal Processing) technical committee and co-head of the IPDM-BioImage Informatics node of the french national infrastructure France-BioImaging.
Multilingual speech recognition in under-resourced environments
When speech processing systems are designed for use in multilingual environments, additional complexity is introduced. Identifying when language switching has occurred, predicting how cross-lingual terms will be pronounced, obtaining sufficient speech data from diverse language backgrounds: such factors all complicate the development of practical speech-oriented systems. In this talk, I will discuss our research group's experience in building speech recognition systems for the South African environment, one in which 11 official languages are recognised. I will also show how this relates to our participation in the BABEL project, a recent 5-year international collaborative project aimed at solving the spoken term detection task in under-resourced languages.
Marelie Davel is a research professor at North-West University, South Africa, and the director of the Multilingual Speech Technologies (MuST) research group. She has a specific interest in multilingual speech technology development in under-resourced environments and the data-driven modelling of human speech and language. She received her BSc degree (Computer Science & Mathematics) from Stellenbosch University, her MSc from University of London, and her PhD (Electronic Engineering, 2005) from the University of Pretoria. She joined the South African CSIR in 1995 as an electronic engineer, later becoming a principal researcher and the research group leader of the Human Language Technologies (HLT) research group at the same institution. In 2002 she spent a year as a visiting scholar at Carnegie Mellon University’s Robust Speech group. She joined MuST in 2011 and became the group’s director in 2014. Recent MuST projects include the development of multilingual resources for Google, pronunciation modelling for the BABEL project, and the development of an automatic speech transcription platform for the South African government. She has published approx. 90 papers related to speech and language processing.
Charisma: Measurement and outcomes
Charisma has been devilishly difficult to measure; there has also been a dearth of studies estimating the causal impact of charisma on outcomes. In this seminar I will use a new definition of charisma to demonstrate how it can be manipulated, and will also show the economic impact of charisma on worker productivity. Moreover, I will discuss how charisma can be coded from archival data, and demonstrate its utility for predicting a range of outcomes including winning the U.S. presidential election, the amount of views on TED talks, and retweets of tweets.
John Antonakis is of Swiss, Greek, and South-African nationality. He is Professor of Organizational Behavior, and Director of the Ph.D. Program in Management in the Faculty of Business and Economics of the University of Lausanne, Switzerland. He received his Ph.D. from Walden University in Applied Management and Decision Sciences specializing in the psychometrics of leadership. He was a postdoctoral fellow in the Department of Psychology at Yale University focusing on leader development and expertise. Professor Antonakis’ research is currently focused on charisma, predictors of leadership, and research methods. Professor Antonakis is Editor in Chief of The Leadership Quarterly. He has previous served as associate editor for The Leadership Quarterly and Organizational Research Methods, and is on the boards of several top academic journals including the Academy of Management Review and the Journal of Management. He is a fellow of the Society of Industrial and Organizational Psychology as well as the Association for Psychological Science. He has published in prestigious academic journals such as Science, Psychological Science, Academy of Management Journal, Intelligence, The Leadership Quarterly, Journal of Operations Management, Journal of Management, Harvard Business Review, Academy of Management Learning and Education, Organizational Research Methods, among others. He has also published two books: The Nature of Leadership (two editions), and Being There Even When You Are Not: Leading Through Strategy, Structures, and Systems. He has been awarded or directed research funds totaling over Sfr. 2.3 million (about $2.45 million). He frequently consults—and provides talks, trainings, and workshops—to organizations on leadership and human resources issues. His clients regularly include organizations in various business sectors including banks, manufacturing, high-tech, consulting, and finance as well as government organizations, NGOs, and athletics organizations. His research is regularly quoted in the international media and has been showcased on political and science-based TV shows. He engages a general audience in many science-based videos; for an example, refer to his TEDx talk on charisma: https://youtu.be/SEDvD1IICfE
Domain Adaptation for Visual Recognition: From Shallow to Deep
Apr 24, 2017 11:00 AM
Mathieu Salzmann
In this talk, I will present our work on Domain Adaptation, which tackles scenarios where the training (source) and test (target) data have been acquired in different conditions. To address this, we have introduced learning algorithms that attempt to make the distributions of the source and target data as similar as possible. In particular, I will present a (shallow) transformation learning method, and discuss different measures that can be used to compare the source and target distributions. I will then turn to a Deep Learning approach, in which I will show that allowing the weights of the network to differ between the source and target samples yields better accuracy. I will show results on standard image recognition benchmarks, as well as on the task of leveraging synthetic data to train a classifier for real images.
Deep Learning for Speech Processing: An NST Perspective
Sep 27, 2016 11:00 AM

Prof. Mark Gales
The Natural Speech Technology EPSRC Programme Grant was a 5 year collaboration between Edinburgh,Cambridge and Sheffield Universities, with the aim of improving core speech recognition and synthesis technology. During the lifetime of the project dramatic changes took place in the underlying technology for speech processing with the introduction of deep learning. This has yielded significant performance improvements, as well as offering a very rich space of model to investigate. This talk discusses the general area of deep learning for speech processing, with a particular emphasis on sequence-to-sequence models: in speech recognition, waveform to text; and in synthesis, text to waveform. Both generative and discriminative models for sequence-to-sequence models are described along with variants on the standard topologies and the implications for both training and inference. Rather than focusing on results for particular models, the talk aims to describe the connections and differences between sequence-to-sequence models and the underlying assumptions for these models.
Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He was appointed Reader in Information Engineering in 2004. He is currently a Professor of Information Engineering and a College Lecturer and Official Fellow of Emmanuel College. Mark Gales is a Fellow of the IEEE, a Senior Area Editor of IEEE/ACM Transactions on Audio Speech and Language Processing for speech recognition and synthesis, and a member of the Speech and Language Processing Technical Committee (2015-2017, previously a member from 2001-2004). He was an associate editor for IEEE Signal Processing Letters from 2008-2011 and IEEE Transactions on Audio Speech and Language Processing from 2009-2013. He is currently on the Editorial Board of Computer Speech and Language. Mark Gales has been awarded a number of paper awards, including a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.
TUTORIAL - Tutorial on Regression
Jul 15, 2016 03:00 PM

Dr. Freek Stulp
Tutorial on Regression based on the article:
Freek Stulp and Olivier Sigaud (2015). Many Regression Algorithms, One Unified Model - A Review. Neural Networks, 69:60-79.
Link:
freekstulp.net/publications/pdfs/stulp15many.pdf
http://freekstulp.net/#Bio Dr. Freek Stulp's research focuses on using machine learning and artificial intelligence to improve the robustness and adaptivity of planning and control for autonomous robots. One of his main research themes is enabling robots to autonomously acquire and refine skills through imitation and reinforcement learning. He received his doctorate degree in Computer Science from the Technische Universität München in 2007. He was awarded post-doctoral research fellowships from the Japanese Society for the Promotion of Science and the German Research Foundation (DFG), to pursue his research at the Advanced Telecommunications Research Institute International (Kyoto) and the University of Southern California (Los Angeles). From 2011 to 2015 he was an assistant professor at the École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech). Since March 2016 he is the head of the new department of cognitive robotics at the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany.
TALK - Robot Skill Learning: From Reinforcement Learning to Evolution Strategies
Jul 15, 2016 11:00 AM

Dr. Freek Stulp
A popular approach to robot skill learning is to initialize a skill through imitation learning, and to then refine and improve the skill through reinforcement learning. In this presentation, I highlight three contributions to this approach:
1) Enabling skills to adapt to task variations by using multiple demonstrations for imitation learning,
2) Improving skills through reinforcement learning based on reward-weighted averaging and black-box optimization with evolution strategies.
3) Using covariance matrix adaptation to automatically tune exploration during reinforcement learning.
Throughout the presentation I show several applications to challenging manipulation tasks on several humanoid robots.
http://freekstulp.net/#Bio Dr. Freek Stulp's research focuses on using machine learning and artificial intelligence to improve the robustness and adaptivity of planning and control for autonomous robots. One of his main research themes is enabling robots to autonomously acquire and refine skills through imitation and reinforcement learning. He received his doctorate degree in Computer Science from the Technische Universität München in 2007. He was awarded post-doctoral research fellowships from the Japanese Society for the Promotion of Science and the German Research Foundation (DFG), to pursue his research at the Advanced Telecommunications Research Institute International (Kyoto) and the University of Southern California (Los Angeles). From 2011 to 2015 he was an assistant professor at the École Nationale Supérieure de Techniques Avancées (ENSTA-ParisTech). Since March 2016 he is the head of the new department of cognitive robotics at the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany.
Eliciting and recognising complex emotions and mental states including engagement and boredom
Jul 07, 2016 02:00 PM

Harry Witchel* & Carina Westling#
Complex emotions are any emotional state except for Ekman's 6 basic emotions: happy, sad, fear, anger, surprise and disgust. Complex emotions can include mixtures of the basic emotions (e.g. horror), emotions outside the basic emotions (e.g. musical "tension"), and emotions mixed with mental states that are not emotions (e.g. engagement and boredom). Eliciting and recognising complex emotions, and allowing systems to respond to them, will be useful for eLearning, human factors (including vigilance), and responsive systems including human-robot interaction.
In this talk we will present our work towards the elicitation and recognition of conscious or subconscious responses. Engineering and psychological solutions to non-invasively determine such mental states and complex emotions may use movement, posture, facial expression, physiology, and sound. Furthermore, our team has shown that what people suppress is as revealing as what they do. We consider aspects of music listening, movie watching, game playing, quiz-taking, reading, and walking to untangle the complex emotions that can arise. The mental states of engagement and boredom are considered in relation to fidgeting and to Non-Instrumental Movement Inhibition (NIMI), in order to clarify fundamental research problems and direct research design toward improved solutions.
In 2016 Harry Witchel and Carina Westling published their ninth inter-disciplinary paper together, on Non-Instrumental Movement Inhibition. It received significant international media attention, including an article about it in Scientific American. Harry Witchel is Discipline Leader in Physiology at Brighton and Sussex Medical School at the University of Sussex. His research interests are: Nonverbal Behaviour; Motion Capture; Gait in Multiple Sclerosis; Soundscape; Engagement; Psychobiology. His laboratory uses wearable sensors, motion capture and time series analysis to determine the cognitive and behavioural correlates of engagement and disengagement in response to different psychologically relevant stimuli, especially music. He has performed experiments for many consultancy clients, including Honda, Nike, DHL and Tesco. He also has an international track record of promoting public engagement with science including appearances on the Discovery Channel, BBC World Service Radio, and the Financial Times. In 2004 he was awarded the national honour of the Charles Darwin Award lecture by the British Science Association. In 2011 his book on music was published: “You Are What You Hear: How Music and Territory Change Who We Are” (Algora, New York). Carina Westling researches live and mediated interaction design, and worked as a researching designer with Punchdrunk theatre company 2011-2014. She is the Creative Director of the Nimbus Group, who produce digital arts projects, including Giddy (2016), The Nimbus (2014), and 0-1 (2012). She is a contributing author to Digital Make-Believe, which was published in May 2016 (Springer, Berlin). Her research interests include interface design, interactive system narratives, audience research, spatial sound design, and nonverbal behaviour.
Training models with images: algorithms and applications
Jun 22, 2016 11:00 AM

Asst Prof Gregoire Mariethoz
Multiple-point geostatistics (MPS) has received a lot of attention in the last decade for modeling complex spatial patterns. The underlying principle consists in representing spatial variability using training images. A common conception is that a training image can be seen as a prior for the desired spatial variability. As a result, a variety of algorithmic tools have been developed to generate stochastic realizations of spatial processes based on what can be seen broadly as texture generation algorithms.
While the initial applications of MPS were dedicated to the characterization of 3D subsurface structures and the study of geological/hydrogeological reservoirs, a new trend is to use MPS for the modeling of earth surface processes. In this domain, the availability of remote sensing data as a basis to construct training images offers new possibilities for represent complexity with such non-parametric data-driven approaches. Repeated satellite observations or climate models outputs, available at a daily frequency for periods of several years, provide the required patterns repetition for having robust statistics on high-order patterns that vary in both space and time.
This presentation will delineate recent results in this direction, including MPS applications to the stochastic downscaling of climate models, the completion of partially informed satellite images, the removal of noise in remote sensing data, and modeling of complex spatio-temporal phenomena such as precipitation.
Grégoire Mariethoz was born in Neuchâtel (Switzerland) in 1978. He received a M.S. degree (2003), a MAS degree (2006) and a Ph.D. degree (2009) in hydrogeology from the University of Neuchâtel. In 2009-2010 he worked as a postdoctoral researcher at Stanford University, then between 2010 and 2014 he was Senior Lecturer at UNSW Australia. Since 2014 he is Professor Assistant at the University of Lausanne, Switzerland. His interests include the development of spatial statistics algorithms and their application in hydrology, hydrogeology and remote sensing.
Adaptation of Neural Network Acoustic Models
May 12, 2016 10:30 AM

Prof. Steve Renals
Neural networks can learn invariances through many layers of non-linear transformations. Explicit adaptation to speaker or acoustic characteristics can further improve accuracy. A good adaptation technique should: (1) have a compact representation to allow the speaker-dependent parameters to be estimated from small amounts of adaptation data, and minimises storage requirements; (2) operate in an unsupervised fashion without requiring labelled adaptation data; and (3) allow for both test-only adaptation and speaker-adaptive training.
In this talk I'll discuss some approaches to the adaptation of neural network acoustic models - for both speech recognition and speech synthesis - with a focus on some approaches that we have explored in the "Natural Speech Technology" programme: factorised i-vectors, LDA domain codes, learning hidden unit contributions (LHUC), and differentiable pooling.
Steve Renals is professor of Speech Technology and director of the Institute for Language, Cognition, and Communication in the School of Informatics, at the University of Edinburgh. Previously, he was director of the Centre for Speech Technology Research (CSTR). He received a BSc in Chemistry from the University of Sheffield in 1986, an MSc in Artificial Intelligence from the University of Edinburgh in 1987, and a PhD in Speech Recognition and Neural Networks, also from Edinburgh, in 1990. From 1991-92 he was a postdoctoral fellow at the International Computer Science Institute (ICSI), Berkeley, and was then an EPSRC postdoctoral fellow in Information Engineering at the University of Cambridge (1992-94). From 1994-2003 he was lecturer, then reader, in Computer Science at the University of Sheffield, moving to Edinburgh in 2003. He has over 200 publications in speech and language processing, and has led several large projects in the field, including EPSRC Programme Grant Natural Speech Technology and the AMI and AMIDA Integrated Projects. He is a senior area editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing and a member of the ISCA Advisory Council. He is a fellow of the IEEE, and a member of ISCA and of the ACM.
Securing Encrypted Biometric Authentication With Multi-Factor Liveness Detection And One Time Passwords
May 04, 2016 02:00 PM
Kenneth Okereafor
Basic Multi-biometric Authentication System was thought to have sealed the vulnerabilities and escape route from cyber criminals, but emerging attack patterns have proved us wrong. In spite of their benefits, multi-biometric systems also have peculiar challenges especially circumvention of security strategy. Circumvention refers to how susceptible the system or the presented biometric trait is to spoof attacks and identity fraud. Liveness detection has long been applied as an anti-spoofing mechanism to checkmate spoofing, however its application approach has thrown up more vulnerabilities. We have adopted the Structured Systems Analysis and Design Methodology (SSADM) to assist us understand the weaknesses and propose a solution which integrates liveness detection to halt spoofing. In this seminar, we present a different approach to performing liveness detection in multi-biometric systems to significantly minimize the probability of circumvention and considerably strengthen the overall security strategy of the authentication process.
Kenneth Okereafor is a Ph.D student of the University of Azteca, Mexico. His doctoral research focuses on Multi-biometric liveness detection. With over 18 years’ professional IT experience, he currently works with the Nigerian National Health Insurance Scheme (NHIS) as Assistant Director of Network Security and has facilitated several International presentations in Cybersecurity. A multiple recipient of the United Nations Cybersecurity Scholarship award under the ITU Global Cybersecurity Agenda, Kenneth has a combined background in Electrical & Electronics Engineering, and Computer Information Systems Security, with special interests in biometric security, electronic communications, and digital forensics. He is a certified Network Security Specialist.
The Lognormality Principle
Mar 21, 2016 02:00 PM

Prof. Réjean Plamondon
The Kinematic Theory of rapid human movements and its family of lognormal models provide analytical representations of pen tip strokes, often considered as the basic unit of handwriting. This paradigm has not only been experimentally confirmed in numerous predictive and physiologically significant tests but it has also been shown to be the ideal mathematical description of the impulse response of a neuromuscular system. This proof has led to postulate the LOGNORMALITY PRINCIPLE. In its simplest form, this fundamental premise states that the lognormality of the neuromuscular impulse responses is the result of an asymptotic convergence, a basic global feature reflecting the behaviour of individuals who are in perfect control of their movements. As a corollary, motor control learning in young children can be interpreted as a migration toward lognormality. For the larger part of their lives, healthy human adults take advantage of lognormality to control their movements. Finally, as aging and health issues intensify, a progressive departure from lognormality is occurring. To illustrate this principle, we present various software tools and psychophysical tests used to investigate the fine motor control of subjects, with respect to these ideal lognormal behaviors, from childhood to old age. In this latter case, we focus particularly on investigations dealing with brain strokes, Parkinson and Alzheimer diseases. We also show how lognormality can be exploited in many pattern recognition applications for automatic generation of gestures, signatures, words and script independent patterns as well as CAPTCHA production, graffiti generation, anthropomorphic robot control and even speech modelling. Among other things, this lecture aims at elaborating a theoretical background for many handwriting applications as well as providing some basic knowledge that could be integrated or taking care of in the development of new automatic pattern recognition systems to be used for e-Learning, e-Security and e-Health.
Réjean Plamondon is a Full Professor in the department of Electrical Engineering at École Polytechnique de Montréal and Head of Laboratoire Scribens at this institution. Throughout his career, he has been involved in many pattern recognition projects, particularly in the field of on-line and off-line handwriting analysis and processing. His main contribution has been the development of a kinematic theory of rapid human movements which can take into account, with the help of lognormal functions, the major psychophysical phenomena reported in studies dealing with rapid movement control. The theory has been found successful in describing the basic kinematic properties of velocity profiles as observed in finger, hand, arm, head and eye movements. Professor Plamondon has studied and analyzed these bio-signals extensively in order to develop creative and powerful methods and systems in various domains of engineering, publishing more than 300 papers on these topics. He is a Fellow of the Netherlands Institute for Advanced Study in the Humanities and Social Sciences (NIAS; 1989), of the International Association for Pattern Recognition (IAPR; 1994) and of the Institute of Electrical and Electronics Engineers (IEEE; 2000). He recently received the IAPR/ICDAR 2013 outstanding achievement award for “theoretical contributions to the understanding of human movement and its applications to signature verification, handwriting recognition, instruction, and health assessment, and for promoting on-line document processing in numerous multidisciplinary fields”.
How technology is opening up new potential for democracy, participation and collaboration
The barriers to production are being lowered so it's a good time to build platforms which make it as simple as possible for everyone to join in and help train and refine language technologies, share their stories and spread the word. Gareth draws on digital storytelling with the BBC, democratic activism via hyperlocal journalism and tools for citizenship to see if there's a new way to corral people's enthusiasm for languages to help build better, more relevant resources.
Probabilistic Models for Music Performance: Interaction, Creation, Cognition
Music performance is an epitome of complex and creative motor skills. It is indeed striking that musicians can continuously show more physical virtuosity in playing their instrument and can show more creativity in varying their interpretation. Technology-mediated music performance has naturally explored the potential of interfaces and interactions for enhancing musical expression. It is however a difficult (and ill-posed) problem and musical interactive systems cannot yet challenge traditional instruments in terms of expressive control and skill learning.
I believe that an important aspect of the problem relies on the understanding of variability in the performer's movements. I will start my talk by presenting the computational approach based on probabilistic models, particularly suited to handle uncertainty in motion data that stem from noise or intentional variations of the performers. I will then illustrate the potential of the approach in the design of expressive music interactions through experiments with proofs of concept developed and evaluated in the lab; as well as real world applications in artistic projects and in industrial products for consumer devices. Finally, I will present my upcoming EU-funded research project that takes a more theoretical perspective by examining how this approach could potentially be used to infer an understanding of the cognitive processes underlying sensorimotor learning in music performance.
Baptiste Caramiaux is a Marie Sklowodska Curie Research Fellow between McGill University (Montreal, Canada) and IRCAM (Paris, France). His current research focuses on the understanding of the cognitive processes of motor learning in musical performance and the computational modelling of these processes. Before, he worked on gesture expressivity and the design of musical interactive systems through machine learning. He conducted academic research at Goldsmiths University of London, and applied part of his academic research works on industrial products at Mogees Ltd. Baptiste holds a PhD in computer science from University Pierre et Marie Curie in Paris, and IRCAM Centre Pompidou.
Shape, Medialness and Applications
I will present on-going research in my group with a focus on shape understanding with applications to computer vision, robotics and the creative industries. I will principally discuss our recent work on building an algorithmic chain exploiting models of shape derived from the cognitive science literature but relating closely to well-known approaches in computer vision and computational geometry: that of medial descriptors of shape.
Recent relevant publications:
[1] Point-based medialness for 2D shape description and identification
P. Aparajeya and F. F. Leymarie
Multimedia Tools and Applications, May 2015
link.springer.com
[2] Portrait drawing by Paul the robot
P. Tresset and F. F. Leymarie
Computers & Graphics, April 2013
Special Section on Expressive Graphics
www.sciencedirect.com
Frederic Fol Leymarie is a Professor of Computing at Goldsmiths, University of London since late 2004. Previously he was the co-founder of the SHAPE Lab. at Brown University (1999) and later its Lab manager (2002-4) while a postdoctoral fellow. He completed his PhD thesis at Brown in 2002 on the topic of 3D Shape Representation by Shock Scaffolds. This work was supported in part by two (US) NSF grants Frederic co-wrote and one IBM Doctoral Fellowship (1999). Since joining Goldsmiths, Frederic has launched and directed the MSc Arts Computing (2004-7), as well as the MSc Computer Games Entertainment (since 2008) and the MA Computer Games Art and Design (starting in Sept. 2015), both of these in collaboration with Prof. William Latham. More details on his publication record and research and other interests and professional activities can be found on his LinkedIn profile via: www.folleymarie.com
Enabling novices to create behaviours for autonomous agents
Jun 16, 2015 11:15 AM
Dr Stéphane Magnenat
This talk will present my research path under the overarching theme of enabling non-specialists to create behaviours for autonomous robot. I will start with a short description on my work on scaling up robot autonomy in the context of autonomous construction. I will then focus on modular 3-D mapping using the iterative closest point algorithm and programming by demonstration with a method requiring little user-defined parameters to be tuned. Finally, I will present my work on teaching the computer science concept of event handling using the Thymio mobile robot. I will present quantitative and qualitative results with students of different ages, and will show an experiment exploring the use of augmented reality to provide real-time program tracing. Finally, I will propose a roadmap for future work.
Dr Stéphane Magnenat is currently Associate Research Scientist at Disney Research Zürich. He received his PhD from EPFL in 2010, and before joining Disney, worked as a senior researcher at Autonomous Systems Lab at ETH Zürich. In fall 2012, he visited Willow Garage at Menlo Park, CA, USA. He then visited Tufts University, MA, USA in 2013 and Aalto University, Helsinki, Finland in 2015. He is a co-founder and board member of Mobsya, the association producing the Thymio educational robot. His current research focus on mobile robotics, CS education, and visual computing.
A hybrid approach to segmentation of speech
Jun 12, 2015 11:00 AM
Prof. Hema Murthy
The most common approach to automatic segmentation of speech is, to perform forced alignment using monophone HMM models that have been obtained using embedded reestimation after flat start initialisation. Segmentation using this approach requires large amounts of data and does not work very well for low resource languages. To address the issue of paucity of data, signal processing cues are used to restrict embedded reestimation.
Voice activity detection is first performed to determine the voiced regions in an utterance. Short-term energy (STE) and spectral flux (SF) are computed on intra voiced segments. STE yields syllable boundaries, while locations of significant change in spectral flux are indicative of fricatives, nasals. STE and SF can not be used directly to segment an utterance. Minimum phase group delay based smoothing is performed to preserve these landmarks, while at the same time reducing the local fluctuations. Boundary corrections are performed at the syllable level, wherever it is known that the syllable boundaries are correct. Embedded reestimation of monophone HMM models is then restricted to the syllable boundaries. The boundaries obtained using group delay smoothing results in a number of false alarms. HMM boundaries are used to correct these boundaries. Similarly, spectral flux is used to correct fricative boundaries. Thus, using signal processing cues and HMM reestimation in tandem, robust monophone HMM models are built. These models are then used in an HTS framework to build speech synthesis systems for a number (9 at the time of this presentation) of Indian languages. Both quantitative and qualitative assessments indicate that there is a significant improvement in quality of synthesis.
In another experiment on key word spotting (KWS) in speech, the group delay based syllable boundaries are used to reduce the search space for keyword spotting on Indian English lectures. Appropriate score normalisation based on vowel normalisation in a neural network framework is used to learn the thresholds. An F-score of 72.32% was obtained on a subset of the NPTEL lectures (http://www.nptel.iitm.ac.in).
Discoverying Life patterns
The main goal of this proposal is to discover a person's life patterns (e.g., where she goes, what she does, how she is and feels and whom she spends time with) namely those situations that repeat themselves, almost but not exactly identical, with regularity, and to exploit this knowledge for improving her quality of life.
The challenge is how to synchronize a sensor and data-driven representation of the world, which is noisy, imprecise and agnostic of the user needs with a knowledge level representation of the world which should be: (i) general, by allowing for the representation and integration of different combinations of sensors and interesting aspects of the user's life and, (ii) adaptive, by representing life happenings at the desired level of abstraction, capturing their progress, and adapting to changes in the life dynamics.
The solution exploits three main components: (i) a methodology and mechanisms for an incremental evolution of a knowledge level representation of the world (e.g., ontologies), (ii) an extension of deep learning to take into account and adapt to the constraints coming from the knowledge level and (iii) a Question Answering (Q/A) service which allows the user to interact with the computer according to her needs and terminology.
Fausto Giunchiglia is a professor of computer science at the University of Trento, an ECCAI fellow, and a member of Academia Europaea. Fausto’s current main interest is in providing a theory, algorithms and systems for handling of highly heterogeneous big data in highly dynamic and unpredictable environments. The issues he is mainly interested in are (in decreasing order of importance) variety, veracity and vulnerability. His focus is on three types of data: open government data, enterprise data and personal data. Fausto has covered all the spectrum from theory to technology transfer and innovation. Some relevant roles: member of the Panel "Computer Science and Informatics" of the European Research Council (ERC), "ERC Advanced Grants" (2008 – present), Chair of the International Advisory board of the Scottish Informatics and Strategic Informatics and Computer Science Alliance (SICSA) of the 10 Scottish Universities. More than 40 invited talks in international events; chair of more than 10 international events; was/is editor or editorial board member of around 10 journals, among them: Journal of Autonomous Agents and Multi-agent Systems, Journal of applied non Classical Logics, Journal of Software Tools for Technology Transfer, Journal of Artificial Intelligence Research. He held the following roles in scientific organizations: member of the IJCAI Board of Trustees (01-11), President of IJCAI (05-07), President of KR, Inc. (02-04), Advisory Board member of KR, Inc., Steering Committee of the CONTEXT conference. Fausto has coordinated and participated in various EC projects; among them: coordination of the FP7 FET IP Smart Society and of the FP7 FET IP Living knowledge, local coordinator of the FP7 IP Cubrik, Open Knowledge, Knowledge Web.
Modeling Human Communication Dynamics
Human face-to-face communication is a little like a dance, in that participants continuously adjust their behaviors based on verbal and nonverbal cues from the social context. Today's computers and interactive devices are still lacking many of these human-like abilities to hold fluid and natural interactions. Leveraging recent advances in machine learning, audio-visual signal processing and computational linguistic, my research focuses on creating human-computer interaction (HCI) technologies able to analyze, recognize and predict human subtle communicative behaviors in social context. I formalize this new research endeavor with a Human Communication Dynamics framework, addressing four key computational challenges: behavioral dynamic, multimodal dynamic, interpersonal dynamic and societal dynamic. Central to this research effort is the introduction of new probabilistic models able to learn the temporal and fine-grained latent dependencies across behaviors, modalities and interlocutors. In this talk, I will present some of our recent achievements modeling multiple aspects of human communication dynamics, motivated by applications in healthcare (depression, PTSD, suicide, autism), education (learning analytics), business (negotiation, interpersonal skills) and social multimedia (opinion mining, social influence).
Louis-Philippe Morency is Assistant Professor in the Language Technology Institute at the Carnegie Mellon University where he leads the Multimodal Communication and Machine Learning Laboratory (MultiComp Lab). He received his Ph.D. and Master degrees from MIT Computer Science and Artificial Intelligence Laboratory. In 2008, Dr. Morency was selected as one of "AI's 10 to Watch" by IEEE Intelligent Systems. He has received 7 best paper awards in multiple ACM- and IEEE-sponsored conferences for his work on context-based gesture recognition, multimodal probabilistic fusion and computational models of human communication dynamics. For the past two years, Dr. Morency has been leading a DARPA-funded multi-institution effort called SimSensei which was recently named one of the year’s top ten most promising digital initiatives by the NetExplo Forum, in partnership with UNESCO.
Can biometric similarity scores be used to calculate forensically interpretable likelihood ratios?
May 08, 2015 11:00 AM
Geoffrey Stewart Morrison
Dr Morrison is currently Scientific Counsel, Office of Legal Affairs, INTERPOL General Secretariat. He is contributing to the European Union funded Speaker Identification Integrated Project (SIIP), which aims to develop investigative and police intelligence solutions for law enforcement agencies, including sharing of data via INTERPOL. He is also an Adjunct Associate Professor, Department of Linguistics, University of Alberta. He has been Director of the Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunication, University of New South Wales; Chair of the Forensic Acoustics Subcommittee, Acoustical Society of America; and a Subject Editor for the journal Speech Communication. He has been involved in forensic casework in Australia and the United States.
Robust image feature extraction learning and object registration
Extracting image features such as feature points or edges is a critical step of many Computer Vision systems, however this is still performed with carefully handcrafted methods. In this talk, I will first present a new Machine Learning-based approach to detecting local image features, with application to contour detection in natural images, but also biomedical and aerial images, and to feature point extraction under drastic weather and lighting changes. I will then show that it is also possible to learn efficient object description based on low-level features for scalable 3D object detection.
Dr. Vincent Lepetit is a Professor at the Institute for Computer Graphics and Vision, TU Graz and a Visiting Professor at the Computer Vision Laboratory, EPFL. He received the PhD degree in Computer Vision in 2001 from the University of Nancy, France, after working in the ISA INRIA team. He then joined the Virtual Reality Lab at EPFL as a post-doctoral fellow and became a founding member of the Computer Vision Laboratory. He became a Professor at TU GRAZ in February 2014. His research interests include vision-based Augmented Reality, 3D camera tracking, Machine Learning, object recognition, and 3D reconstruction. He often serves as program committee member and area chair of major vision conferences (CVPR, ICCV, ECCV, ACCV, BMVC). He is an editor for the International Journal of Computer Vision (IJCV) and the Computer Vision and Image Understanding (CVIU) journal. http://www.icg.tugraz.at/Members/lepetit/vincent-lepetits-homepage
Medical visual information retrieval: techniques & evaluation
Medical imaging has enormously increased in importance and volume in medical institutions, particularly 3D tomographic imaging. Via digital analysis the knowledge stored in medical cases can be used for more than a single patient to help decision-making.
This presentation will highlight several challenges in medical image data processing starting with the VISCERAL EU project that evaluates segmentation, lesion detection and similar case retrieval on large amounts of medical 3D data using a cloud-based infrastructure for participants. The description of the MANY project highlights techniques for 3D texture analysis that can be used in a variety of contexts. Finally an overview of the radiology search system of the Khresmoi project will show a combination of the 3D data and the 3D analyses in a multi-modal environment.
Henning Müller studied medical informatics at the University of Heidelberg, Germany, then worked at Daimler-Benz research in Portland, OR, USA. From 1998-2002 he worked on his PhD degree at the University of Geneva, Switzerland with a research stay at Monash University, Melbourne, Australia. Since 2002 Henning has been working for medical informatics at the University hospital of Geneva, where he habilitated in 2008 and was named titular professor in medicine in 2014. Since 2007 he has also been a full professor at the HES-SO Valais and since 2011 he is responsible for the eHealth unit of the school. Henning was coordinator of the Khresmoi EU project, is scientific coordinator of the VISCERAL EU project and initiator of the ImageCLEF benchmark. He has worked on several other EU projects that include the access to and the analysis of medical data. He has authored over 400 scientific papers and is in the editorial board of several journals.
The role of electrochemical energy storage systems in a Smart Grid
Feb 18, 2015 11:00 AM
Prof. Hubert Girault
He shall present the demonstrator they are installing at the water treatment plant in Martigny. It is based on a redox flow battery able to produce hydrogen to maintain the battery at an optimum state of charge. He shall therefore explain how a redox flow battery works and discuss the advantages and disadvantages. Then, he shall present how our concept of service station for electric cars, with lithium batteries like the Tesla or with hydrogen fuel cells like the Hyundai ix35.
Data Valorisation based on Linked (open) Data approaches
Feb 12, 2015 11:00 AM
Prof. Maria Sokhn
Maria will also present her group at the Hes-so Valais Wallis
Video Inpainting of Complex Scenes
Feb 05, 2015 02:00 PM

Prof. Yann Gousseau
While image inpainting is a relatively mature subject whose numerical results are often visually striking, the automatic filling-in of video is still prone to yield incoherent results in many situations. Moreover, the subject is impaired by strong computational bottlenecks. In this talk, we present a patch-based approach to inpaint videos, relying on a global, multi-scale optimization heuristic. Contrarily to previous approaches, the best patch candidates are selected using texture attributes, that are built within a multi-scale video representation. We show that this rationale prevents the usual wash-out of textured and cluttered parts of video. Combined with an appropriate nearest neighbor search and a simple stabilization-like procedure, the resulting approach is able to successfully and automatically inpaint complex situations, including high resolution sequences with dynamic textures and multiple moving objects.
Yann Gousseau received the engineering degree from the École Centrale de Paris, France, in 1995, and the Ph.D. degree in applied mathematics from the University of Paris-Dauphine in 2000. He is currently a professor at Telecom ParisTech. His research interests include the mathematical modeling of natural images and textures, mono and multi-image restoration, computational photography, stochastic geometry, image analysis, computer vision and image processing.
Trainable Interaction Models for Embodied Conversational Agents
Human communication is inherently multimodal: when we communicate with one another, we use a wide variety of channels, including speech, facial expressions, body postures, and gestures. An embodied conversational agent (ECA) is an interactive character -- virtual or physically embodied -- with a human-like appearance, which uses its face and body to communicate in a natural way. Giving such an agent the ability to understand and produce natural, multimodal communicative behaviour will allow humans to interact with such agents as naturally and freely as they interact with one another, enabling the agents to be used in applications as diverse as service robots, manufacturing, personal companions, automated customer support, and therapy.
To develop an agent capable of such natural, multimodal communication, we must first record and analyse how humans communicate with one another. Based on that analysis, we then develop models of human multimodal interaction and integrate those models into the reasoning process of an ECA. Finally, the models are tested and validated through human-agent interactions in a range of contexts.
In this talk, I will give three examples where the above steps have been followed to create interaction models for ECAs. First, I will describe how human-like referring expressions improve user satisfaction with a collaborative robot; then I show how data-driven generation of facial displays affects interactions with an animated virtual agent; finally, I describe how trained classifiers can be used to estimate engagement for customers of a robot bartender.
Mary Ellen Foster is a Research Fellow in the Interaction Lab at the School of Mathematical and Computer Sciences at Heriot-Watt University in Edinburgh, Scotland. She received her Ph.D. in Informatics from the University of Edinburgh, and has previously worked in the Robotics and Embedded Systems Group at the Technical University of Munich and in the School of Informatics at the University of Edinburgh. Her research interests include embodied communication, natural language generation, and multimodal dialogue systems. In particular, she is interested in designing, implementing, and evaluating practical artificial systems that support embodied interaction with human users, such as embodied conversational agents and human-robot dialogue systems. She has worked on European and national projects including COMIC, JAST, ECHOES, JAMES, and EMOTE.
Language identification@BUT
Nov 12, 2014 11:00 AM
Pavel Matejka
This talk presents an ongoing work in language identification for DARPA RATS programme. The talk will describe an application of Neural Network Bottleneck (BN) features in Language Identification (LID). BN features are generally used for Large Vocabulary Speech Recognition in conjunction with conventional acoustic features, such as MFCC or PLP. We compare the BN features to several common types of acoustic features used in the present-day state-of-the-art LID systems. The test set is from DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state-of-the-art detection capabilities on audio from highly degraded radio communication channels. On this type of noisy data, we show that in average, the BN features provide a 45% relative improvement in the Cavg or Equal Error Rate (EER) metrics across several test duration conditions, with respect to our single best acoustic features.
Speech technologies - going from the research labs to market
Nov 12, 2014 10:00 AM
Petr Schwarz
Several speech technologies like speech transcription, keyword spotting, language identification, speaker identification will be discussed from the architecture point of view. Then cases how these speech technologies are used in call centers, banks, by governmental agencies, or by broad cast service providers for speech data mining, voice analytic or voice biometry will be presented. Each client and use case has some specific requirements on technology, data handling and services. The requirements and its implication on technology development and research will be mentioned.
Pose estimation and gesture recognition using structured deep learning
In this talk I will address the problem of gesture recognition and pose estimation from videos, following two different strategies:
(i) estimation of articulated pose (full body or hand pose) alleviates subsequent recognition steps in many conditions and allows smooth interaction modes and tight coupling between object and manipulator;
(ii) in situations of low image quality (e.g. large distances between hand and camera), obtaining an articulated pose is hard. Training a deep model directly on video data can give excellent results in these situations.
We tackle both cases by training deep architectures capable of learning discriminative intermediate representations. The main goal is to integrate structural information into the model in order to decrease the dependency on large amounts of training data.To achieve this, we propose an approach for hand pose estimation that requires very little labelled data. It leverages both unlabeled data and synthetic data produced by a rendering pipeline. The key to making it work is to integrate structural information not into the model architecture, which would slow down inference, but into the training objective. We show that adding unlabeled real-world samples significantly improves results compared to a purely supervised setting.
In the context of multi-modal gesture detection and recognition, we propose a deep recurrent architecture that iteratively learns and integrates discriminative data representations from individual channels (pose, video, audio), modeling complex cross-modality correlations and temporal dependencies. It is based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalities; and ii) gradual fusion of modalities from strongest to weakest cross-modality structure.
We present experiments on the "ChaLearn 2014 Looking at People Challenge" gesture recognition track organized in conjunction with ECCV 2014, in which we placed 1st out of 17 teams. The objective of the challenge was to detect, localize and classify Italian conversational gestures from large database of 13858 gestures. The multimodal data included color video, range maps and a skeleton stream.
The talk will be preceded by a brief introduction to the work done in my LIRIS team.
liris.cnrs.fr/christian.wolf/research/gesturerec.html
Christian WOLF received his MSc in computer science from Vienna University of Technology in 2000, and the PhD in computer science from the National Institut of Applied Science(INSA de Lyon), France, in 2003. In 2012 he obtained the habilitation diploma, also from INSA de Lyon. From september 2004 to august 2005 he was assistant professor at the Louis Pasteur University, Strasbourg, and member of the Computer and Image Science and Remote Sensing Laboratory (LSIIT). Since september 2005 he is assistant professor at INSA de Lyon, and member of LIRIS a laboratory of the CNRS, where he is interested in computer vision and machine learning, espacially in structured models, deep learning, gesture and activity recognition and computer vision for robotics.
Fitting Ancient Texts into Modern Technology: The Maya Hieroglyphic Codices Database Project
Jul 01, 2014 11:00 AM

Dr. Gabrielle Vail
The Maya hieroglyphic codices provide a rich dataset concerning astronomical beliefs, divinatory practices, and the ritual life of prehispanic Maya cultures inhabiting the Yucatan Peninsula in the years leading up to the Spanish conquest in the early sixteenth century. Structurally, the codices are organized in terms of almanacs and astronomical tables, both of which incorporate several types of data--calendrical, iconographic, and textual--that together allowed Maya scribes to encode complex relationships among deities, dates having ritual and/or celestial significance, and associated activities. In order to better understand these relationships, the Maya Hieroglyphic Codices Database project was initiated to develop sophisticated online research tools to aid in analysis of these manuscripts. Because the Maya scribes did not live in a culture that demanded strict adherence to paradigms that we take for granted when organizing information for electronic search and retrieval, this posed a significant challenge in efforts to discover how data contained in ancient manuscripts could be converted into data structures that facilitated computer searching and information retrieval. This presentation discusses the approaches taken by the author and the architect of the database project to find compromises that enable computer analysis of a set of texts created by scribes more than half a millennium ago, while avoiding the biases inherent in translating knowledge across spatial and cultural divides. The presentation will be made by Dr. Vail; the technical architect to the project, William Giltinan, will be available to answer questions at the conclusion of the lecture.
Gabrielle Vail specializes in the study of Maya hieroglyphic texts, with an emphasis on prehispanic Maya ritual and religion as documented in screenfold manuscripts painted in the fourteenth and fifteenth centuries. Her research is highlighted in numerous print and online publications, as well as the online Maya Codices Database (www.mayacodices.org), a collaborative project undertaken with funding from the National Endowment for the Humanities. Dr. Vail has published ten books and edited journals, most recently Códice de Madrid (Universidad Mesoamericana, 2013) and Re-Creating Primordial Time: Foundation Rituals and Mythology in the Postclassic Maya Codices (University Press of Colorado, 2013; with Christine Hernández). Dr. Vail received her Ph.D. from Tulane University in 1996 and holds a research and faculty position at New College of Florida in Sarasota, where she teaches courses on a variety of subjects, including the decipherment of Maya hieroglyphic texts and the astronomy of prehispanic cultures of the Americas. Technical architect: William Giltinan earned his bachelor’s degree in computer science from New College of Florida and a master’s degree in computer science and engineering from the University of Michigan. Following this, he spent more than a decade as a software engineer and entrepreneur in technology-driven enterprises. In 1992, he assumed the role of technical architect of the Maya Hieroglyphic Codices Database project and has continued in this capacity through the present. Mr. Giltinan returned to academia in 2003 to earn his Juris Doctorate and later his Master of Law degree in intellectual property law from the George Washington University Law School. He is a practicing intellectual property attorney and teaches patent law as an adjunct professor.
Recognising people, motion and actions in video
Learning to recognise the motion or actions of people in video has wide applications covering topic from sign or gesture recognition through to surveillance and HCI. This talk will discuss approaches to video mining, allowing the discovery of weakly supervised spatiotemporal signatures such as actions embedded in video or Signs/facial motion weakly supervised by language. Whether the task is recognising an atomic action of an individual or their implied activity, the continuous multichannel nature of sign language recognition or the appearance of words on the lips, all approaches can be categorised at the most basic level as the learning and recognition of spatio-temporal patterns. However, in all cases, inaccuracies in labelling and the curse of dimensionality lead us to explore new learning approaches that can operate in a weakly supervised setting. This talk will discuss the adaptation of mining to the video domain and new approaches to learning spatiotemporal signatures covering a broad range of application areas such as facial feature extraction and regression, lip reading, activity recognition and sign and gesture recognition in both 2D and 3D.
Prof Richard Bowden received a BSc degree in computer science from the University of London in 1993, an MSc degree with distinction from the University of Leeds in 1995, and a PhD degree in computer vision from Brunel University. He is currently Professor of computer vision and machine learning at the University of Surrey, United Kingdom, where he leads the Cognitive Vision Group within the Centre for Vision Speech and Signal Processing and was recently awarded a Royal Society Leverhulme Trust Senior Research Fellowship. He was a visiting research fellow at the University of Oxford 2001-2004 working with Profs Zisserman and Brady. His research focuses on the use of computer vision to locate, track, and understand humans with specific examples in Sign and Gesture recognition, Activity and Action recognition, lip-reading and facial feature tracking. His research into tracking and artificial life received worldwide media coverage, appearing at the British Science Museum and the Minnesota Science Museum. He has published more than 140 peer reviewed papers and has served as either program committee member or area chair for ICCV, CVPR and ECCV in addition to numerous international workshops and conferences. He was general chair for BMVC2012, track chair for ICPR2012 and is associate editor for the journal Image and Vision Computing and IEEE Pattern Analysis and Machine Learning. He was a member of the British Machine Vision Association (BMVA) executive committee and a company director for seven years. He is a member of the BMVA, a fellow of the Higher Education Academy, and a senior member of the IEEE. He has held over 20 research grants worth in excess of £5M and supervised over fifteen PhD students. His research has been recognised by prizes, plenary talks & media/press coverage including the Sullivan thesis prize in 2000 and many best paper awards.
On the use of multimodal cues for the modeling of group involvement and individual engagement in multiparty dialogue
Jun 05, 2014 10:30 AM
Catharine Oertel
Multiparty conversations are characterized by various degrees of participants' engagement and group involvement. Humans are able to detect and interpret these degrees, basing their perception on multimodal cues. The automatic detection, in particular for bigger groups of people, poses however many challenges. In this talk, I will mainly focus on a study in which we analysed group-behaviour in an eight-party, multimodal corpus. We propose four features that summarize different aspects of eye-gaze patterns and allow us to describe individual engagement as well as group involvement in time. Our overall aim is to build a system which is able to foster group involvement.
In addition, I will briefly comment on 2 studies in which we use the robot head Furhat to advance in this direction. Furhat is a robotic head that combines state-of-the-art facial animation with physical embodiment in order to facilitate multi-party dialogues with robots.
Catharine Oertel is a PhD candidate at the Department of Speech, Music and Hearing at the Royal Institute of Technology (KTH) in Sweden since 2012. She is a member of the Speech group and is supervised by Prof. Joakim Gustafson. She received her Master's degree in Linguistics: Communication, Cognition and Speech Technology from Bielefeld University in 2010. From 2010-2012 she was a member of the Speech Communication Lab at Trinity College, Dublin. Her work has mainly been focused on the multi-modal modeling of conversational dynamics but she has also been active in the area of Human-Robot-Interaction.
The Web: Wisdom of Crowds or Wisdom of a Few?
The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as more than two billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. But how is the Web? What are the activities of people? How content is generated? Web data mining is the main approach to answer these questions. Web data comes in three main flavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or log mining. Each case reflects the wisdom of some group of people that can be used to make the Web better. For example, user generated tags in Web 2.0 sites. In this presentation we explore the wisdom of crowds in relation to several dimensions such as bias, privacy, scalability, and spam. We also cover related concepts such as the long tail of the special interests of people, or the digital desert, content that nobody sees.
Ricardo Baeza-Yates is VP of Yahoo! Labs for Europe and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile, since 2006. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electrical engineering degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.
Interpersonal synchrony: social signal processing and social robotics for revealing social signatures
Social signal processing is an emerging research domain with rich and open fundamental and applied challenges. In this talk, I'll focus on the development of social signal processing techniques for real applications in the field of psycho-pathology. I'll overview recent research and investigation methods allowing neuroscience, psychology and developmental science to move from isolated individuals paradigms to interactive contexts by jointly analyzing behaviors and social signals of partners. From the concept of interpersonal synchrony, we'll show how to address the complex problem of evaluating children with pervasive developmental disorders. These techniques are also demonstrated in the context of human-robot interaction by a new way of using robots in autism (moving from assistive devices to clinical investigations tools). I will finish by closing the loop between behaviors and physiological states by presenting new results on oxytocin and proxemics during early parent-infant interactions.
Prof. Mohamed Chetouani is the head of the IMI2S (Interaction, Multimodal Integration and Social Signal) research group at the Institute for Intelligent Systems and Robotics (CNRS UMR 7222), University Pierre and Marie Curie-Paris 6. He received the M.S. degree in Robotics and Intelligent Systems from the UPMC, Paris, 2001. He received the PhD degree in Speech Signal Processing from the same university in 2004. In 2005, he was an invited Visiting Research Fellow at the Department of Computer Science and Mathematics of the University of Stirling (UK). Prof. Chetouani was also an invited researcher at the Signal Processing Group of Escola Universitaria Politecnica de Mataro, Barcelona (Spain). He is currently a Full Professor in Signal Processing, Pattern Recognition and Machine Learning at the UPMC. His research activities, carried out at the Institute for Intelligent Systems and Robotics, cover the areas of social signal processing and personal robotics through non-linear signal processing, feature extraction, pattern classification and machine learning. He is the head of the interdisciplinary research group IMI2S (Interaction, Multimodal Integration and Social Signal) gathering researchers from social signal processing, social robotics, psycho-pathology and neuroscience. This group develops models and methods for the analysis, recognition and prediction of social signals, behaviors with a life-span perspective with a particular attention to disorders (autism, Alzheimer). He has published numerous research papers including some in high impact journals (Plos One, Biology Letters, Pattern Recognition, IEEE Transactions on Audio, Speech and Language Processing). He is also the co-chairman of the French Working Group on Human-Robots/Systems Interaction (GDR Robotique CNRS) and a Deputy Coordinator of the Topic Group on Natural Interaction with Social Robots (euRobotics).
Anthropomorphic media design and attention modeling
Mar 10, 2014 11:00 AM
Dr. Tomoko Yonezawa and Ms. Yukari Nakat
In this talk, we would like to introduce our past trials on the human-robot / human-agent interactions especially focusing on the user's attention and the gaze communication.
At first, in "Communication on Anthropomorphic Media", Dr. Tomoko Yonezawa will make a presentation on the past researches on gaze-communication and the robot's behaviors. Additionally, she will talk about her current research on touch interaction between human and wearable robot.
Second, in "Presences with Avatars' Appearances Attached to Tex Communication in Twitter", Ms. Yukari Nakatani introduces her research theme on the representations of multiple virtual agents for sustainable communications in SNS.
Finally we introduce the students' researches in our laboratory with some presentation movies.
Building a Multilingual Heritage Corpus with Applications in Geo-Tagging and Machine Translation
Mar 03, 2014 04:00 PM
Martin Volk
In this talk Martin Volk will present the Text+Berg project, an initiative to digitize and annotate all the yearbooks of the Swiss Alpine Club from its start in 1864 until today. The resulting corpus of 40 million words contains texts in the 4 official Swiss languages, with a large parallel part in German and French. Based on these translations Martin's group works on domain-specific machine translation systems, but also on search systems for word-aligned parallel corpora as a new resource for translators and linguists. Most of the yearbooks (more than 100'000 pages) were scanned and converted to text at the University of Zurich. Martin Volk will share his experiences on automatically correcting OCR errors as well as on dealing with tokenization, lemmatization and PoS-tagging issues in a corpus that spans 150 years and multiple languages. He will also report on the Text+Berg toponym detection and classification as well as person name recognition and tagging of temporal expressions. Recently the group has released Kokos, a system for collaborative correction of OCR errors in the yearbooks of the 19th century (http://kokos.cl.uzh.ch) and asked the SAC members to join in creating a clean corpus.
Martin Volk is Professor of Computational Linguistics at the University of Zurich. His research focuses on multilingual systems, in particular on Machine Translation. His group has been investigating domain adaptation techniques for statistical machine translation, hybrid machine translation for lesser resourced languages, and machine translation into sign language. He is also known for his work on machine translation of film and TV subtitles. Together with Noah Bubenhofer he is leading the Text+Berg project for the digitization and annotation of a large multilingual heritage document as a showcase in the Digital Humanities.
Recent trends and future challenges in action recognition
This talk will overview recent progress and open challenges in human action recognition. Specifically, I will focus on the three problems of (i) action representation in video, (ii) weakly-supervised action learning and (iii) ambiguity of action vocabulary. To the first problem, I will overview local feature methods providing state-of-the-art results on current action recognition benchmarks. Motivated by the difficulty of large-scale video annotation, I will next present our recent work on weakly-supervised action learning from video and corresponding video scripts. I will finish by highlighting limitations of the standard action classification paradigm and will show some of our work addressing this problem.
Ivan Laptev is a research director at INRIA Paris-Rocquencourt, France. He received his PhD degree in Computer Science from the Royal Institute of Technology (KTH) in 2004 and a Master of Science degree from the same institute in 1997. He was a research assistant at the Technical University of Munich (TUM) during 1998-1999. He has joined INRIA as a postdoc in 2004 and became a full-time INRIA researcher in 2005. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 50 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of International Journal of Computer Vision and Image and Vision Computing Journal, he was/is an area chair for CVPR 2010, ICCV 2011, ECCV 2012, CVPR 2013 and ECCV 2014, he has co-organized several workshops and tutorials on human action recognition at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013). Ivan was awarded ERC Starting Grant in 2012.
Privacy & Trust Challenges in Open Public Display Networks
Jan 21, 2014 11:00 AM

Prof. Marc Langheinrich
Future public displays have the potential to become much more than a simple digital signage -- they can form the basis for a novel communication medium. By interconnecting displays and opening them up to applications and content from a wide range of sources, they can not only support individuals and their communities, but also increase their relevance and ultimately their economic benefits. Ultimately, open display networks could have the same impact on society as radio, television and the Internet. In this talk, I will briefly summarize this vision and its related challenges, in particular with respect to privacy and trust, and present the work that we did in this area in the context of a recently finished FET-Open project titled "PD-Net".
Marc Langheinrich is an Associate Professor at the Università della Svizzera italiana (USI) in Lugano, Switzerland. Marc received his PhD (Dr. sc. ETH) on the topic of "Privacy in Ubiquitous Computing" from the ETH Zurich, Switzerland, in 2005. He has published extensively on both privacy and usability of ubiquitous and pervasive computing systems, and is a regular program committee member of various conferences and workshops in the areas of pervasive computing, security and privacy, and usability. Marc currently serves on the editorial board of IEEE Pervasive Computing Magazine and Elsevier's "Personal and Mobile Communications" Journal, and is a Steering Committee member of the UbiComp and IoT conference series.
Cost-effective, Autonomic and Adaptive Cloud Resource Management
Dec 18, 2013 10:00 AM
Thanasis Papaioannou
Current large scale web applications pose enormous and dynamic processing and storage requirements. Failures of any type are common in current datacenters, partly due to the higher scales of the data stored. As data scales up, its availability becomes more complex, while different availability levels per application or per data item may be required. At the same time, cloud infrastructures should be able to effectively deal with the elastic nature of these applications in an autonomic manner. To make things worse, as clients get increasingly averse to vendor lock-in and data unavailability risks, client data has to be efficiently split across clouds. In this talk, we briefly discuss three very effective cloud resource management solutions that deal with the different aforementioned requirements: Skute, Scarce and Scalia. Skute is a self-managed key-value store that dynamically allocates the resources of a data cloud to several applications in a cost-efficient and fair way. Scarce is a decentralized economic approach for dynamically adapting the cloud resources of various applications, so as to statistically meet their SLA performance and availability goals in the presence of varying loads or failures. Scalia is a cloud storage brokerage solution that continuously adapts the placement of data based on its access pattern, subject to optimizations objectives and data placement constraints, such as storage costs and vendor lock-in avoidance.
Dr. Thanasis G. Papaioannou is a senior researcher at the Information Technologies Institute of the Center for Research and Technology Hellas (CERTH). Formerly, he was a postdoctoral fellow at the Distributed Information Systems Laboratory of Ecole Polytechnique Fédérale de Lausanne (EPFL). He received his B.Sc. (1998) and M.Sc. (2000) in Networks and in Parallel/Distributed Systems from the Department of Computer Science, University of Crete, Greece, and his Ph.D. (2007) from the Department of Computer Science, Athens University of Economics and Business (AUEB). From spring 2007 to spring 2008, he was a Visiting Professor in the Department of Computer Science of AUEB, teaching i) Distributed Systems and ii) Networks - Network Security. He has over 45 publications in high quality journals and conferences including Springer Electronic Commerce Research, Elsevier Computer Networks Journal, INFOCOM'13, EDBT'13, CIKM'12, ACM SC'12 (SuperComputing), IEEE ICDE'10, ACM SOCC'10, IEEE CCGRID'11, INFOCOM'08, etc. He has been TPC member in over 25 conferences including SSDBM'14, ICDCS'13, SIGMOD Demo'13, SSDBM'13, SIGMOD Demo'12, SSDBM'12, ICDE'12, SocInfo'10, ICEC '07-09, Valuetools'08, etc.
Statistical methods for environmental modelling and monitoring
Nov 29, 2013 10:00 AM
Dr. Eric A. Lehmann
The CSIRO Division of Computational Informatics (CCI) aims to transform information and decision making to enhance productivity, foster collaboration and deliver impact through services across a wide range of sectors. CCI researchers have in-depth expertise in applying statistical and mathematical methods in a variety of scien- tific fields including, among others, environmental and agricultural informatics, wireless sensor networks, in- formation and communication technologies for healthcare and clinical treatment, development of early screening tests for Alzheimer's disease (bioinformatics), computational and simulation sciences (high perform- ance computing), as well as statistical modelling for seasonal climate forecasting and complex biogeochemical systems (e.g. marine environments).
This presentation will focus on some aspects of the research being carried out at CCI on applications of statisti- cal and computational methods for environmental modelling and natural resource management. In particular, I will present an overview of my recent work on the following topics:
- multi-sensor integration of remote sensing data for large-scale vegetation mapping and monitoring,
- data fusion methods for water resources assessment using ground-based and remote sensing data, and
- spatial modelling of extreme weather events and associated risks in the context of a changing climate.
These projects involve several aspects of multivariate Bayesian modelling and analysis (spatial and temporal), computational simulation methods (Markov chain Monte Carlo), issues of data quality and continuity, as well as scientific dissemination and stakeholder engagement.
Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Dipl. El.-Ing. ETH diploma (M.Sc. in Electrical Engineering). He received the M.Phil. and Ph.D. degrees, both in Elec- trical Engineering, from the Australian National University (Canberra) in 2000 and 2004, respectively. From 2004 to 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, where he was active in the field of acoustics, array signal processing and beamforming, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker localisation and tracking. He now works as a Research Scientist for CSIRO in Perth, within the division of Computational Informatics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatiotemporal data for environmental mapping and monitoring. He also contributes to the development of Bayesian hierarchical methods for natural resource management and climate modelling purposes.
Robot learning by imitation and exploration with probabilistic dynamical systems
Nov 22, 2013 10:00 AM
Dr. Sylvain Calinon
Robots in current industrial settings reproduce repetitive movements in a stiff and precise manner, with sensory information often limited to the role of stopping the motion if a human or object enters the robot's workspace. The new developments in robot sensors and compliant actuators bring a new human-centric perspective to robotics. An increase of robots in small and medium-sized enterprises (SMEs) is predicted for the next few years. Products in SMEs are characterized by small batch sizes, short life-cycles and end-user driven customization, requiring frequent re-programming of the robot. SMEs also often involve confined spaces, so that the robots must work in safe collaboration with the users by generating natural movements and anticipating co-workers' movements with active perception and human activity understanding.
Interestingly, these robots are much closer to human capabilities in terms of compliance, precision and repeatability. In contrast to previous technology, the planning, control, sensing and interfacing aspects must work hand-in-hand, where the robot is only one part of a broader robotics-based technology. The variety of signals to process and the richness of interaction with the users and the environment constitute a formidable area of research for machine learning.
Current programming solutions used by the leading commercial robotics companies do not satisfy the new requirements of re-using the same robot for different tasks and interacting with multiple users. The representation of manipulation movements must be augmented with forces (for task execution, but also as a communication channel for collaborative manipulation), compliance and reactive behaviors. An attractive approach to the problem of transferring skills to robots is to take inspiration from the way humans learn by imitation and self-refinement.
I will present a task-parametrized model based on dynamic movement primitives and Gaussian mixture regression to exploit the local correlations in the movement and the varying accuracy requirements of the task. The model is used to devise a controller for the robot that can adapt to new situations and that is safe for the surrounding users. Examples of applications with a compliant humanoid and with gravity-compensated manipulators will be showcased.
Dr Sylvain Calinon is Team Leader of the Learning and Interaction Lab at the Italian Institute of Technology (IIT), and a visiting researcher at the Learning Algorithms and Systems Laboratory (LASA), Ecole Polytechnique Fédérale de Lausanne (EPFL). He received a PhD on robot programming by demonstration in 2007 from LASA, EPFL, which was awarded by the Robotdalen Scientific Award, ABB Award and EPFL-Press Distinction. From 2007 to 2009, he was a postdoctoral research fellow at LASA, EPFL. His research interests cover robot learning by imitation, machine learning and human-robot interaction. Webpage: http://programming-by-demonstration.org/SylvainCalinon/
Quality in Face and Iris Research
Nov 20, 2013 10:30 AM
Dr. Stephanie Schuckers
Because of limited resources (e.g. number and type of cameras, amount of time to focus on an individual, real-time processing power), using intelligence within standoff biometric capture systems can help in determining which individuals to focus on and for how long. Benchmark datasets available to the general research community are needed designing a stand-off multimodal biometric system. The overall goal of the research to investigate the fusion approaches to measure face, iris, and voice through experiments for identity at distances from 10 to 25 meters. This research includes a growing corpus of data, entitled Quality in Face and Iris Research Ensemble-Q-FIRE dataset which includes the following: (1) Q-FIRE Release 1 (made available in early 2010) is composed of 4T of face and iris video for 90 subjects out to 8.3meters (25 feet) with controlled quality degradation. (2) Release 2 is an additional 83 subjects with same collection specifications. Release 1 and 2 were used by NIST in IREX II: Iris Quality Calibration and Evaluation (IQCE). (3) Last, an extension of the dataset has been collected with unconstrained behavior of subjects on the same set of subjects, entitled Q-FIRE Phase II Unconstrained out to 8.3 meters. In this talk, the datasets will be described as well as results of experiments fusing face and iris scores with quality.
http://people.clarkson.edu/~sschucke/
Multimodal Interaction with Humanoid Robots
Nov 19, 2013 10:00 AM
Prof. Kristiina Jokinen
In this talk I will discuss issues related to multimodal interaction with intelligent agents, and in particular, present the Nao Wikitalk, an application that enables the user to query Wikipedia via the Nao robot. The robot can talk about an unlimited range of topics, so it supports open-domain conversations using Wikipedia as a knowledge source. The robot suggests some topics to start with, and the user can shift to related topics by speaking the topic names after the robot mentions them. The user can also switch to a totally new topic by spelling the first few letters. The challenge in presenting Wikipedia information is how to convey its structure to the user so that she can understand what is new information, and how to navigate in the topic structure. In Wikipedia, new relevant information is marked with hyperlinks to other entries, and the robot's interaction capabilities have been extended so that it signals these links non-verbally while reading the text. As well as speaking, the robot uses gestures, nods and other multimodal signals to enable clear and rich interaction. Gesture and posture changes can also be used to manage turn-taking, and to add liveliness to the interaction in general. To manage the interaction in a smooth way, it is also important to capture the user's emotional and attentional state. For this, we have experimented with gazing and face tracking to infer the user's interest level. The Nao WikiTalk system was evaluated by comparing the users' expectations with their experience of the robot interaction. In many respects the users had high expectations regarding the robot's interaction capabilities, but they were impressed by the robot's lively appearance and natural gesturing.
Kristiina Jokinen is Adjunct Professor and Research Manager at University of Helsinki, and she is also Adjunct Professor of Interaction Technology at University of Tampere, Finland, and Visiting Professor at University of Tartu, Estonia. She received her PhD from University of Manchester, UK, and was alltogether four years as a post-doc at NAIST and as an invited researcher at ATR in Japan. In 2009-2010 she was Visiting Professor at Doshisha University in Kyoto. Her research focuses on spoken dialogue modelling, multimodal interaction management (especially gestures and eye gaze), natural language communication, and human-machine interaction. She has published many papers and articles, and three books: "Constructive Dialogue Modelling - Speech Interaction and Rational Agents" (John Wiley), "Spoken Dialogue Systems" (together with M. McTear; Morgan & Claypool), and "New Trends in Speech-based Interactive Systems" (edited together with F. Chen; Springer). She has been invited speaker e.g. at IWSDS 2010 and Multimodal Symposium in 2013. She organised the Nordic Research Training Course "Feedback, Communicative Gesturing, and Gazing" in Helsinki in 2011, and led the summer workshop "Speech, gaze and gesturing - multimodal conversational interaction with the Nao robot" in Metz, together with Graham Wilcock, in 2012. She has had several national and international cooperation projects and served in several programme and review committees. She is Programme Chair for the 2013 International Conference of Multimodal Interaction (ICMI), and she is Secretary-Treasurer of SIGDial, the ACL/ISCA Special Interest Group for Discourse and Dialogue.
Advancing bio-microscopy with the help of image processing
Nov 18, 2013 10:00 AM
Prof. Michael Liebling
Image processing in bio-microscopy is no longer confined to the post-processing stage, but has gained wide acceptance as an integral part of the image acquisition process itself, as it allows overcoming hard limits set by instrumentation and biology. In this talk, I will present my lab's efforts to image dim and highly dynamic biological samples by boosting the temporal and spatial resolution of optical microscopes via software solutions and modified imaging protocols. Focusing on spatio-temporal image registration strategies to build 3D+time models of samples with repetitive motions, a superresolution algorithm to reconstruct image sequences from multiple low temporal resolution acquisitions, and a fast multi-channel deconvolution algorithm for multi-view imaging, I will illustrate the central role signal processing can play to advance bio-imaging. I will share the approaches we implemented in my group to rapidly bring new ideas from theory to full deployment in remote biology labs , where our tools can be applied with a variety of microscopy types. Finally, I will speculate on the future of image processing in bio-microscopy and suggest areas where efforts may be most rewarding.
Michael Liebling is an Associate Professor of Electrical and Computer Engineering at the Universitz of California, Santa Barbara (UCSB). He received the MS in Physics (2000) and PhD in image processing (2004) from EPFL. From 2004 to 2007, he was a Postdoctoral Scholar in Biology at the California Institute of Technology, before joining the faculty in the department of Electrical and Computer Engineering in 2007, first as an Assistant Professor and, since Summer 2013, as an Associate Professor. His research interests include biological microscopy and image processing for the study of dynamic biological processes and, more generally, computational methods for optical imaging. He teaches both at the graduate and undergraduate level in the areas of signal processing, image processing and biological microscopy. Michael Liebling is a recipient of prospective and advanced researcher fellowships from the Swiss National Science Foundation and a 2011 Hellman Family Faculty Fellowship. He is v ice-chair (2014 Chair-elect) of the IEEE Signal Processing Society's Bio-Imaging and Signal Processing technical committee and was Technical Program co-chair of the IEEE International Symposium on Biomedical Imaging in 2011 and 2013.
Human-Centered Computing for Critical Multimodal Cyber-Physical Environments
Nov 05, 2013 11:00 AM
Dr. Nadir Weibel
Critical cyber-physical environments such as the ones found in many healthcare settings or on the flight deck of modern airplanes are built on complex systems characterized by important properties spanning the physical and digital world, and centered on human activity. In order to properly understand this critical activity, researchers need to first understand the context and environment in which the activity is situated. Central in those environments is often interaction with the available technology and the communication between the individuals, both of which often involve multiple parallel modalities. Only an in-depth understanding of the properties of these multimodal distributed environments can inform the design and development of multimodal human-centered computing.
After presenting an overview of my current research in human-centered computing, this talk will present some of the challenges and proposed solutions in terms of technologies and theoretical frameworks for collecting and making sense of rich multimodal data in two critical cyber-physical environments: the cockpit of a Boeing 787 airplane, and the medical office. The talk will explain how the combination of a range of data collection devices such as depth cameras, eye tracking, digital-pens, and HD video cameras, combined with powerful data visualization and a flexible analysis suite, allows in-depth understanding of those complex environments. I will end with a discussion of cutting-edge multimodal technology and how devices such as depth cameras and wearable augmented reality glasses open up a range of opportunities to develop new technology for knowledge workers of critical cyber-physical environments.
BIO: Dr. Nadir Weibel is a Research Assistant Professor in the Department of Computer Science and Engineering at the University of California San Diego (UCSD), where he is teaching human-computer interaction and ubiquitous computing. His research is situated at the intersection of computer science, cognitive science, communication, health and social sciences. Dr. Weibel investigates tools, techniques and infrastructure supporting the deployment of innovative interactive multimodal and tangible devices in context, and studies the cognitive consequences of the introduction of this technology in the everyday life. Current work focuses on interactive physical-digital systems that exploit pen-based and touch-based devices, depth-cameras, wearable and mobile devices, in the setting of critical populations such as healthcare and education. Dr. Weibel is author of more than 45 publications on these topics. His work has been funded by the Swiss National Science Foundation, the European Union, Boeing, the US NSF, NIH and AHRQ.
Interacting with the Embodied Mind
Humans do not think like computers. Our minds are 'designed' for us to function as embodied beings in the world in ways that are: 1. Physical-Spatial; 2. Temporal-Dynamic; 3 Social-Cultural; and 4. Affective-Emotional. These aspects of embodiment give us four lenses to understand the embodied mind and how computation/technology may support its function. I adopt a two-pronged to human-computer interaction research by first harnessing technological means to contribute to the understanding of how embodiment ultimately ascends into mind, and second, to inform the design and engineering of technologies that support and augment human higher psychological functions of learning, sensemaking, creating, and experiencing.
In line with the first approach, I shall first show how language, as a core human capacity, is rooted in human embodied function. We will see that mental imagery shapes multimodal (gesture, gaze, and speech) human discourse. In line with the second approach, I shall then present an assemblage of interactive projects that illustrate how our concept of human embodiment can inform technology design through the light of our four lenses. Projects cluster around three application domains, namely 1. Technology for special populations (e.g. mathematics instruction and reading for the blind, games for older adults); 2. Learning and Education (e.g. learning and knowledge discovery through device/display ecologies, creativity support for children); and 3. Experience (e.g. socially-based information access, experience of images, affective communication).
Francis Quek is a currently Professor of Visualization and TAMU Chancellor’s Research Initiative hire at Texas A&M University. He has formerly been Professor of Computer, Director of the Center for Human-Computer Interaction, and Director of Vision Interfaces and Systems Laboratory at Virginia Tech. He has previously been affiliated with Wright State University, the University of Illinois at Chicago, the University of Michigan, and Hewlett-Packard. Francis received both his B.S.E. summa cum laude (1984) and M.S.E. (1984) in electrical engineering from the University of Michigan. He completed his Ph.D. in Computer Science at the same university in 1990. Francis is a member of the IEEE and ACM. He performs research in embodied interaction, embodied learning and sensemaking, interactive systems for special populations (individuals who are blind, children, older adults), systems to support learning and creativity in children, multimodal verbal/non-verbal interaction, multimodal meeting analysis, vision-based interaction, multimedia databases, medical imaging, assistive technology for the blind, human computer interaction, computer vision, and computer graphics. He has published over 150 peer-reviewed journal and conference articles in human-computer interaction, computer vision, and medical imaging.
Technology Innovation and Related Partnerships – Case Idiap and Nokia
Oct 10, 2013 10:45 AM
Dr. Juha K. Laurila
This talk focuses on technology related innovation within companies like Nokia - and covers the flow from early phase ideas towards the technology transfer and productization. Further, the role of research partnerships as a part of the overall innovation process is discussed. More specifically, various modes of industry-academia collaboration and related drivers for each of them are briefly covered. Aspects like, technology licensing are touched briefly too.
More particularly this presentation focuses on collaboration between Idiap and Nokia as a case study and investigates the role of Idiap-Nokia interactions from the perspective of overall innovation chain. This part covers e.g. Idiap's contribution on Nokia's Call for Research Proposals in 2008, joint initiatives around mobile data (Lausanne Data Collection Campaign 2009-2012 and Mobile Data Challenge 2011-2012) as well as bi-lateral research projects.
The power of the cellphone: small devices for big impact
There are almost as many mobile phones in the world as humans. The mobile phone is the piece of technology with the highest levels of adoption in human history. We carry them with us all through the day (and night, in many cases). Therefore, mobile phones have become sensors of human activity in the large scale and also the most personal devices.
In my talk, I will present some of the work that we are doing at Telefonica Research in the area of mobile computing, both in terms of analyzing and understanding large-scale human behavioral data from mobile traces and in designing novel mobile systems in the areas of healthcare, education and information access.
The LiveLabs Urban LifeStyle Innovation Platform : Opportunities, Challenges, and Current Results
Sep 13, 2013 03:00 PM
Rajesh K. Balan
A central question in mobile computing is how do you test mobile applications, that depend on real context, in real environments with real users? User studies done in lab environments are frequently insufficient to understand the real-world interactions between user context, environmental factors, application behaviour, and performance results. In this talk, I will describe LiveLabs, a new 5 year project that started at the Singapore Management University in early 2012. The goal of LiveLabs is to convert four real environments, the entire Singapore Management University campus, a popular resort island, a large airport, and a popular shopping mall, into living testbeds where we instrument both the environment and the cell phones of opted-in participants (drawn from the student population and members of the public). We can then provide 3rd party companies, and researchers the opportunity to test their mobile applications and scenarios on the opted-in participants -- on their real phones in the four real environments described above. LiveLabs will provide the software necessary to collect network statistics and any necessary context information. In addition, LiveLabs will provide software and mechanisms to ensure that privacy, proper participant selection, resource management, and experimental results and data are maintained and provided on a need-to-know basis to the appropriate parties.
I will describe the broad LiveLabs vision and identify the key research challenges and opportunities. In particular, I will highlight our current insight into indoor location tracking, dynamic group and queue detection, and energy aware context sensing for mobile phones.
Detecting Conversing Groups in Still Images
Sep 13, 2013 11:00 AM
Hayley Hung
In our daily lives, we cannot help but communicate with people. Aside from organised and more structured communication like emails, meetings, or phone calls, we communicate instantaneously and often in adhoc, freely formed groups where it is not known beforehand how long the conversation will last for, who will be in the conversation, or what it will be about. In crowded settings like a
conference, for example, this type of conversing group exists and who gravitates towards whom tells us a lot about the relationship between the members of the group. In this talk, I will discuss the challenges of this problem, solutions, and open questions of this emerging topic.
Biometric Recognition: Sketch to photo matching, Tattoo Matching and Fingerprint Obfuscation
http://biometrics.cse.msu.edu
http://scholar.google.com/citations?user=g-_ZXGsAAAAJ&hl=en
If you are like many people, navigating the complexities of everyday life depends on an array of cards and passwords that confirm your identity. But lose a card, and your ATM will refuse to give you money. Forget a password, and your own computer may balk at your command. Allow your card or passwords to fall into the wrong hands, and what were intended to be security measures can become the tools of fraud or identity theft. Biometrics - the automated recognition of people via distinctive anatomical and behavioral traits has the potential to overcome many of these problems.
Biometrics is not a new idea. Pioneering work by several British scholars, including Fauld, Galton and Henry in the late 19th century established that fingerprints exhibit a unique pattern that persists over time. This set the stage for the development of Automatic Fingerprint Identification Systems that are now used by law enforcement agencies worldwide. The success of fingerprints in law enforcement coupled with growing concerns related to homeland security, financial fraud and identity theft has generated renewed interest in research and development of biometric systems. It is, therefore, not surprising to see biometrics permeating our society (laptops and mobile phones, border crossing, civil registration, and access to secure facilities). Despite these successful deployments, biometrics is not a panacea for human recognition. There are challenges related to data acquisition, image quality, robust matching, multibiometrics, biometric system security and user privacy. This talk will introduce three challenging problems of particular interest to law enforcement and border crossing agencies: (i) face sketch to photo matching, (ii) scars, marks & tattoos (SMT) and (iii) fingerprint obfuscation.
Anil K. Jain is a University Distinguished Professor in the Department of Computer Science at Michigan State University where he conducts research in pattern recognition, computer vision and biometrics. He has received Guggenheim fellowship, Humboldt Research award, Fulbright fellowship, IEEE Computer Society Technical Achievement award, W. Wallace McDowell award, IAPR King-Sun Fu Prize, and ICDM Research Award for contributions to pattern recognition and biometrics. He served as the Editor-in-Chief of the IEEE Trans. Pattern Analysis and Machine Intelligence and is a Fellow of ACM, IEEE, AAAS, IAPR and SPIE. Holder of eight patents in biometrics, he is the author of several books. ISI has designated him as a highly cited author. He served as a member of the National Academies panels on Information Technology, Whither Biometrics and Improvised Explosive Devices (IED). He also served as a member of the Defense Science Board. His H-index is 137 (Source: Google Scholar).
Component Analysis for Human Sensing
Enabling computers to understand human behavior has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human computer interaction, and social robotics. A critical element in the design of any behavioral sensing system is to find a good representation of the data for encoding, segmenting, classifying and predicting subtle human behavior. In this talk I will propose several extensions of Component Analysis (CA) techniques (e.g., kernel principal component analysis, support vector machines, spectral clustering) that are able to learn spatio-temporal representations or components useful in many human sensing tasks.
In the first part of the talk I will give an overview of several ongoing projects in the CMU Human Sensing Laboratory, including our current work on depression assessment from videos. In the second part, I will show how several extensions of CA methods outperform state-of-the-art algorithms in problems such as facial feature detection and tracking, temporal clustering of human behavior, early detection of activities, weakly-supervised visual labeling, and robust classification. The talk will be adaptive, and I will discuss the topics of major interest to the audience.
Fernando De la Torre received his B.Sc. degree in Telecommunications (1994), M.Sc. (1996), and Ph. D. (2002) degrees in Electronic Engineering from La Salle School of Engineering in Ramon Llull University, Barcelona, Spain. In 2003 he joined the Robotics Institute at Carnegie Mellon University, and since 2010 he has been a Research Associate Professor. Dr. De la Torre's research interests include computer vision and machine learning, in particular face analysis, optimization and component analysis methods, and its applications to human sensing. He is Associate Editor at IEEE PAMI and leads the Component Analysis Laboratory (http://ca.cs.cmu.edu) and the Human Sensing Laboratory (http://humansensing.cs.cmu.edu).
Signal Analysis using Autoregressive Models of Amplitude Modulation
Aug 23, 2013 11:00 AM
Dr. Sriram Ganapathy
Conventional speech analysis techniques are based on estimating the spectral content of relatively short (about 10-20 ms) segments of the signal. However, an alternate way to describe a speech signal is a long-term summation of amplitude modulated frequency bands, where each frequency band consists of a smooth envelope (gross structure) modulating a carrier signal (fine structure). We develop an auto-regressive (AR) modeling approach for estimating the smooth envelope of the sub-band signal. This model, referred to as frequency domain linear prediction (FDLP), is based on the application of linear prediction on discrete cosine transform of the signal and it describes the perceptually dominant peaks in the signal while removing the finer details. This suppression of detail is useful for developing a parametric representation of speech/audio signals. In this talk, I will also show several applications of the FDLP model for speech and audio processing systems.
In the last leg of the talk, I will focus on our recent efforts at IBM for speech analysis in noisy radio communication channels. This will highlight the challenges involved along with a few solutions addressing parts of the problem.
Sriram Ganapathy received his Doctor of Philosophy from the Center of Language and Speech Processing, Johns Hopkins University in January 2012. Prior to this, he obtained his Bachelor of Technology from College of Engineering, Trivandrum, India in 2004 and Master of Engg. from Indian Institute of Science, Bangalore in 2006. He has worked as a Research Assistant in Idiap Research Institute, Switzerland from 2006 to 2008 working on speech and audio projects. Currently, he is a post-doctoral researcher at IBM T.J. Watson Research Center working on signal analysis methods for radio communication speech in highly degraded environments. His research interests include signal processing, machine learning and robust methodologies for speech and speaker recognition.
Three Factor Authentication for Commodity Hand-Held Communication Devices
Jul 17, 2013 02:00 PM
Prof Brian C. Lovell
User authentication to online services is at a cross-roads. Attacks are increasing, and current authentication schemes are no longer able to provide adequate protection. The time has come to include the third factor of authentication, and start using biometrics to authenticate people. However, despite signficant progress in biometrics, they still suffer from a major mode of attack: replay attacks, where biometric signals may be captured previously and reused. Replay attacks defeat all current liveness tests. Current literature recognises replay attacks as a significant issue, but there are no practical and tested solu- tions available today. The purpose of this research is to improve authentication to online services by including a face recognition biometric, as well as providing one solution to the replay attack problem for the proposed face recognition system. If this research is success- ful, it will enable the use of enhanced authentication mechanisms on mobile devices, and open new research into methods of addressing biometric replay attacks.
Brian C. Lovell was born in Brisbane, Australia in 1960. He received a BE in electrical engineering Honours I) in 1982, a BSc in computer science in 1983, and a PhD in signal processing in 1991: all from the University of Queensland (UQ). Professor Lovell is Project Leader of the Advanced Surveillance Group in the School of ITEE, UQ. He served as President of the International Association of Pattern Recognition 2008-2010, and is a Fellow of the IAPR, Senior Member of the IEEE, Fellow of the IEAust, and voting member for Australia on the Governing Board of the International Association for Pattern Recognition since 1998. Professor Lovell was Program Co-Chair of ICPR2008 in Tampa, Florida, and was General Co-Chair of ACPR2011 in Beijing, and General Co-Chair of ICIP2013 in Melbourne. His Advanced Surveillance Group works with port, rail and airport organizations as well as several national and international agencies to identify and develop solutions addressing operational and security concerns. http://itee.uq.edu.au/~lovell/ http://scholar.google.com.au/citations?user=gXiGxcMAAAAJ&hl=en
Biosignals and Interfaces
May 14, 2013 11:00 AM
Prof. Tanja Schultz
Human communication relies on signals like speech, mimics, or gestures and the interpretation of these signals seems to be innate to humans. In contrast, human interaction with machines and thus human communication mediated through machines is far from being natural. To date, it is restricted to few channels and the capabilities of machines to interpret human signals are still very limited.
At the Cognitive Systems Lab (CSL) we explore human-centered cognitive systems to improve human-machine interaction as well as machine-mediated human communication. We aim to benefit from the strength of machines by departing from just mimicking the human way of communication. Rather we focus on considering the full range of biosignals emitted from the human body, such as electrical biosignals like brain and muscle activity. These signals can be directly measured and interpreted by machines, leveraging emerging wearable, small and wireless sensor technologies. Using these biosignals offers an inside perspective on human mental activities, intentions, or needs and thus complement the traditional way of observing humans from the outside.
In my talk I will discuss ongoing research on "Biosignals and Interfaces" at CSL, such as speech recognition, silent speech interfaces that rely on articulatory muscle movement, and interfaces that use brain activity to determine users' mental states, such as task activity, cognitive workload, attention, emotion, and personality. We hope that our research will lead to a new generation of human centered systems, which are completely aware of the users' needs and provide an intuitive, efficient, robust, and adaptive input mechanism to interaction and communication.
Tanja Schultz received her Ph.D. and Masters in Computer Science from University Karlsruhe, Germany in 2000 and 1995 respectively and got a German Staatsexamen in Mathematics, Sports, and Educational Science from University of Heidelberg, in 1990. She joined Carnegie Mellon University in 2000 and became a Research Professor at the Language Technologies Institute. Since 2007 she is also a Full Professor at the Department of Informatics of the Karlsruhe Institute of Technology (KIT) in Germany. She is the director of the Cognitive Systems Lab, where her research activities focus on human-machine interfaces with a particular area of expertise in rapid adaptation of speech processing systems to new domains and languages. She co-edited a book on this subject and received several awards for this work. In 2001 she received the FZI price for an outstanding Ph.D. thesis. In 2002 she was awarded the Allen Newell Medal for Research Excellence from Carnegie Mellon for her contribution to Speech Translation and the ISCA best paper award for her publication on language independent acoustic modeling. In 2005 she received the Carnegie Mellon Language Technologies Institute Junior Faculty Chair. Her recent research focuses on human-centered technologies and intuitive human-machine interfaces based on biosignals, by capturing, processing, and interpreting signals such as muscle and brain activities. Her development of silent speech interfaces based on myoelectric signals was in the top-ten most important attractions at CeBIT 2010, received best demo and paper awards in 2006 and 2013, and was awarded with the Alcatel-Lucent Research Award for Technical Communication in 2012. Tanja Schultz is the author of more than 250 articles published in books, journals, and proceedings. She is a member of the Society of Computer Science (GI) for more than 20 years, of the IEEE Computer Society, and the International Speech Communication Association ISCA, where she serves her second term as an elected ISCA Board member.
Perceptually motivated speech recognition and mispronunciation detection
Dec 12, 2012 04:00 PM
Christos Koniaris, PhD.
Chris will be presenting his doctoral thesis as the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes.
Incorporation of phonetic constraints in acoustic- to-articulatory inversion
Dec 10, 2012 10:00 AM
Blaise Potard, PhD.
Blaise will be talking about his doctoral research on the acoustic-to-articulatory inversion problem. The main aim of his Ph. D. was to investigate the use of additional constraints (phonetical and visual) to improve the realism of the solutions found by an existing inversion framework. This research was conducted in LORIA, Nancy, France, under the supervision of Yves Laprie.
Grapheme-to-Phoneme (G2P) Training and Conversion with WFSTs
Jul 30, 2012 01:30 PM
Josef Novak
The talk is of tutorial nature. Basically, a hands-on introduction to using some of the features of OpenFst-based G2P toolkit, Phonetisaurus, developed by Josef Novak with some high-level background information and a description of the features/shortcomings/goals of the toolkit.
The slides, a special tutorial distribution, and cut-and-paste terminal commands in wiki format can be found on the Phonetisaurus googlecode site,
Home page and code:
http://code.google.com/p/phonetisaurus/ (see the downloads' section of the lefthand sidebar)
Copy-and-paste tutorial companion:
http://code.google.com/p/phonetisaurus/wiki/FSMNLPTutorial
Josef Novak is currently a Ph.D. student in Hirose-Minematsu laboratory, in the EEIC department at the University of Tokyo. More information: http://www.gavo.t.u-tokyo.ac.jp/~novakj/
On the beauty of Online Selective Sampling
May 02, 2012 11:00 AM
Francesco Orabona
Online selective sampling is an active variant of online learning in which the learner is allowed to adaptively subsample the labels of an observed sequence of feature vectors. The learner's goal is to achieve a good trade-off between mistakes rate and number of sampled labels. This can viewed as an abstract protocol for interactive learning applications. For example, a system for categorizing stories in a newsfeed asks for human supervision whenever it feels that more training examples are needed to keep the desired accuracy.
A formal theory, almost assumptionless, that allows to calculate exact confidence values on the predictions will be presented. Using this theory, two selective sampling algorithms that use regularized least squares (RLS) as base classifier will be shown. These algorithms have formal guarantees on the performance and the maximum number of labels queried. Moreover the RLS is easy and efficient to implement and empirical results will be shown as well to validate the theoretical results.
Overview of some research activities at Australia s Commonwealth Scientific and Industrial Research Organisation (CSIRO)
Apr 20, 2012 02:00 PM
Eric Lehmann
CSIRO is Australia's national science agency and one of the largest and most diverse research organisations in the world. It employs over 6000 scientists at more than 50 centres throughout Australia and overseas. The core research undertaken at CSIRO focuses on the main challenges facing Australia at present time, and includes research areas such as health, agriculture and food supply, mineral resources and mining, information and communication technologies, understanding climate change, and sustainable management of the environment, the oceans and water resources. In this talk, I will present an overview of my recent research work at CSIRO, which involves aspects of Bayesian filtering and hierarchical modelling for applications related to environmental mapping and monitoring, and model-data fusion for water resource assessment at continental scale.
About the presenter:
Eric Lehmann graduated in 1999 from the Swiss Federal Institute of Technology in Zurich (ETHZ) with a Diploma in Electrical Engineering. He received the M.Phil. and Ph.D. degrees, both in Electrical Engineering, from the Australian National University (Canberra) in 2000 and 2004 respectively. Between 2004 and 2008, he held various research positions with National ICT Australia (NICTA) in Canberra and the Western Australian Telecommunications Research Institute (WATRI) in Perth, WA, where he was active in the field of acoustics and array signal processing, with emphasis on sequential Monte Carlo methods (particle filtering) for acoustic speaker tracking. He is now working as a Research Scientist for CSIRO in Perth, within the division of Mathematics, Informatics and Statistics. His current work involves the development of statistical image processing techniques for remote sensing imagery (optical and synthetic aperture radar), with a focus on the multi-sensor analysis and integration of spatio-temporal data for environmental mapping and monitoring. He also contributes to the scientific research on Bayesian hierarchical methods for the assimilation of soil moisture satellite data with modeled estimates (model-data fusion) for water resource management.
CSIRO is Australia's national science agency and one of the largest and most diverse research organisations in the world. It employs over 6000 scientists at more than 50 centres throughout Australia and overseas. The core research undertaken at CSIRO focuses on the main challenges facing Australia at present time, and includes research areas such as health, agriculture and food supply, mineral resources and mining, information and communication technologies, understanding climate change, and sustainable management of the environment, the oceans and water resources. In this talk, I will present an overview of my recent research work at CSIRO, which involves aspects of Bayesian filtering and hierarchical modelling for applications related to environmental mapping and monitoring, and model-data fusion for water resource assessment at continental scale.
Fractal Marker Fields
Apr 20, 2012 11:00 AM
Marketa Dubska
Many augmented reality systems are using fiduciary markers to localize the camera in the 3D scene. One big disadvantage of the markers used today is that the camera motion is tightly limited: the marker (one of the markers) must be visible and it must be observed at a proper scale.
This talk presents a fractal structure of markers similar to matrix codes (such as QRcode or DataMatrix): the Fractal Marker Field. The FMF allows for embedding markers of a virtually unlimited number of scales. At the same time, for each of the scales it guarantees a constant density of markers at that scale. The talk sketches out construction of FMF and a baseline algorithm for detecting the markers.
Parallel Coordinates and Hough Transform
Apr 19, 2012 11:00 AM
Marketa Dubska
Parallel coordinates provide coordinate system used mostly or solely for high-dimensional data visualization. There exist only few applications which used them for computational tasks. We proposed new utilization of them - as a new line parametrization for Hough transform. This parameterization, called PClines, outperform the existing approaches in terms of accuracy. Besides, PClines are computationally extremely efficient, require no floating-point operations, and can be easily accelerated by different hardware architectures. What is more, regular patterns as grids and groups of parallel lines can be effectively detected by this parameterization.