Idiap at NeurIPS 2020

This week Idiap's speech and machine learning group presented their joint work on Fast Transformers with Clustered Attention

Transformers have recently gained a lot of popularity after establishing a new state-of-the-art on number  of applications dealing with text, images and speech data.  However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this limitation, clustered attention approximates the true attention by grouping queries into clusters and computing attention using the query centroids. This approximation is further improved by recomputing attention for each query on a small number of important keys. For any query, the set of  keys with highest clustered attention weights with the corresponding centroid forms a good candidate.
More information including code is available on the project page: