Spiking for Denoising and Regression

The main objective of the present proposal is to advance the state of the art in spiking neural networks (SNNs), in particular for generative models.  It arises from work done under a recently completed SNSF project Neural Architectures for Speech Technology (NAST), which was concerned with SNNs and text to speech synthesis (TTS, a particular application of generative modelling), albeit separately.SNNs have potentially better representational capability than their artificial counterparts, and much lower power consumption.  Under the SNN thread of NAST, we were able to show that state of the art performance currently depends on the concept of surrogate gradients.  Such surrogates are important when the gradients associated with spikes are undefined or flat.  Under the TTS thread of NAST, we were also able to show that state of the art performance depends on denoising approaches, notably diffusion.  Diffusion defines a forward process that gradually warps a complex distribution into a simple one such as Gaussian noise; the generative process is then the reverse of this.Recent literature has shown that well defined noise implies a concrete definition of surrogate gradients, which are otherwise quite ad-hoc.  This and the observations above lead to a first concrete goal of the present proposal: to show that generative modelling using spiking implies a much tighter integration of the noise associated with surrogates and with diffusion.  This in turn has two potential outcomes.  A first is state of the art generative modelling that can benefit from the advantages of spiking.  The other is a principled approach to spiking that should improve the performance of SNNs in general applications.Another observation from NAST is that the membrane potential inside the neurons has a similar impulse response to the muscle models previously studied by the applicant.  Such muscle models were shown to lead to more natural sounding speech synthesis.  Generally, the literature on use of spiking for regression (hence generative modelling) is quite sparse.  A key issue is how to convert the binary spikes of the SNN into continuous outputs; approaches such as convolution layers are available, but ad-hoc.  This leads to a second concrete goal of the proposal: to show that natural speech synthesis can arise from the principled use of the continuous signals available inside the spiking neurons, or from physiologically derived models.  The potential outcome, dependent on the first goal above, is good natural TTS using SNNs.A third, more speculative, goal arises from the literature and observations from NAST, that the current state of the art in generative modelling is based on score matching and probability flow.  These related concepts can be seen as continuous versions of the discrete diffusion described above.  Typically, they require differential equation solvers to discretise the continuous process.  The goal, then, is to combine the results above with these state of the art techniques to yield truly state of the art performance.  At the outset, we note that the equation solvers use the same (Euler) methods as the SNN neurons and the diffusion processes, suggesting that advances could be built on these relationships.The research plan is written as three task groups that broadly map onto the three goals above.  Each task group has baseline and  development phases, leading to two, more speculative, final tasks.  This enables a graded risk profile with the potential to truly innovate, whilst supporting mitigation by backing off to development tasks or focusing on just one speculative task.Most tasks have both practical and theoretical components.The overall relevance and impact of the proposal is detailed in terms of five areas: scientific output, software, collaboration, teaching, and industrial transfer.
Idiap Research Institute
SNSF
Nov 01, 2025
Oct 31, 2029