A personalized speech recognition framework for audio messaging on the edge

Today, we interact with our friends and colleagues through computational systems. When working together towards a shared goal, computing should augment our capacity to achieve collective success, yet we’ve been seeing obstacles and frustrations in distributed collaboration. As a result, a large number of communication applications have been developed, particularly since the pandemic. However, most of them are designed to fit the needs of their creators, namely the knowledge workers operating from a desk space. For the larger part of the workforce, which is mobile-first and customer-facing (i.e., the deskless workers), no solution yet has met the demands of their workflows. In parallel, there is a growing use of voice-based solutions, especially in the form of asynchronous audio messages. Building upon several years of research and hacking, along with 3 co-founders, I have started to develop Voki, an asynchronous voice communication platform, designed to increase the flow and quality of information in deskless workflows. An essential component of our app is speech recognition: converting raw audio into text. The state-of-the-art approach to the audio-text conversion requires compute-heavy deep learning models that run on remote servers. Speech recognition performed directly on the user’s phone is limited to big corporations with large datasets and computing power, where small companies are still dependent on expensive cloud-based services. For now, through our key partnership with Microsoft for Startups, we can handle the server costs. However, in order to scale our app we need to build a new speech recognition framework for mobile devices with near-state-of-the-art performance and low user-perceived latency. The project is the culmination of years of research and a continuation of our community-recognized prototype, which was awarded a 2nd place at HackZurich 2021, the largest hackathon in Europe. We have since tested our pilot project with a potential customer and already have 5 more planned. The goal of the proposed project is to turn our prototype into a market ready solution focusing especially on scalability and user experience.
Idiap Research Institute
Swiss National Science Foundation
Jun 01, 2023
May 31, 2024