Lightweight AI for Greater Performance

As artificial intelligence (AI) becomes more deeply embedded in everyday tech, from smartphones to medical devices, the demand for models that are not just powerful but also efficient and lightweight is rising fast. This is especially important in settings where computing resources are limited. To meet this challenge, Idiap researchers Mutian He and Philip Garner have developed a new agile method.

As AI becomes more widely used across different tools and technologies, it raises important questions about computing power. To manage this, Mutian He and Philip Garner have developed a method that transforms large, resource-intensive AI models into faster, more compact versions with a distinct and more efficient architecture, without needing to retrain them from scratch.

AI systems that handle tasks like speech recognition or language understanding such as voice assistants and chatbots are typically built on transformers. Transformers deliver impressive results but can severely slow down when processing large amounts of data, such as long conversations or audio recordings.

He and Garner’s technique, called CALD (short for Cross-Architecture Layerwise Distillation), offers a smart workaround. Rather than starting from scratch, CALD takes an existing model and fine-tunes it into a leaner, more efficient version during training. This approach allows developers to recycle pretrained models, saving time, money, and energy.

The method was tested across a range of tasks, including text comprehension, speech recognition, and speaker identification, using popular models like RoBERTa and Wav2Vec2. By swapping out the slower transformer-based components for newer, speed-optimized ones like Mamba, while keeping the model’s essential knowledge intact, they achieved impressive results with minimal performance loss.

CALD is especially effective in natural language processing (NLP), where the structure and “thought process” of the original model continues to be valuable even after conversion.

In short, this technique makes it easier to bring cutting-edge AI into real-world applications, even when computing power is limited, making the interaction human-machine more efficient.

Furthermore, the researchers have made their code available on GitHub, inviting others to build on their work. 

This study will be presented at the 13th International Conference on Learning Representations (ICLR) at the end of April.

Reference:
He, M., & Garner, P. N. (2023). Joint fine-tuning and conversion of pretrained speech and language models towards linear complexity. 13th International Conference on Learning Representations (ICLR).