Proof Of Voice (PoV): From Synthetic Speech Risk to Practical, On-Device Defense

Global markets for AI-generated media and deepfake detection are growing rapidly, turning what was a niche technical concern into a mainstream security risk. Audio deepfakes --synthetic voices that convincingly mimic real people. Now used in CEO fraud, investment scams and disinformation, while most countermeasures remain heavy, cloud-based services aimed atlarge institutions. Ordinary users, SMEs, tele-support services and online communities have almost no way to check a suspicious voice in real time without sending biometric audio toexternal servers, which is difficult to reconcile with Swiss and European data-protection rules.Proof-of-Voice (PoV) aims to close this protection gap with a compact audio deepfake detector that runs directly on phones, browsers, laptops and small servers. The project builds on my research on deepfake-native small speech foundation models and the multi-dataset Speech-DF-Arena benchmark. Preliminary experiments indicate that the compact synthetic-first model attains performance competitive with substantially larger commercial systems.Over 12 months, PoV will compress and calibrate this detector, package it into a cross-platform Software Development Kit (SDK), and demonstrate its use in two concrete settings. First, a Discord “VoiceGuard” bot will provide real-time risk scores and short explanation tags to moderators in youth-heavy online communities, helping them spot likely synthetic voices during and Fortemedia will screen high-risk calls and embedded-device interactions without exporting raw audio, showing how theSonaid.ailive conversations. Second, endpoint prototypes with same SDK can be integrated into tele-support and edge-AI pipelines. Across these pilots PoV will collect incident logs, false-alarm statistics, usability feedback and deployment-cost estimates to assess technical and commercial viability.The expected outcome is a technically validated, privacy-preserving on-device detector, an SDK ready for further industrial hardening, and two demonstrators that showcase societal relevancein real-world environments. By prioritizing compact, quantized models over large cloud services, PoV limits additional compute and network overhead and supports more sustainable use of AI.At the same time, it will lay the groundwork for a broader family of small speech foundation models that can support related tasks such as robust speaker verification, authenticity checksfor AI-generated audio services and trustworthy human–AI voice interaction, positioning Swiss actors as providers of secure speech-AI technology.

Leader Name

Funding

Start

May 01, 2026

Stop

Apr 30, 2027

Groups

Speech & Audio Processing

People

MAGIMAI DOSS, Mathew