Interpretable Beliefs and Programmable Knowledge with Bayesian Attention in Large Language Models

We address the controllability of large language models (LLMs) by giving them interpretable beliefs and programmable knowledge through leveraging the PI's work on understanding and improving transformer embeddings. Transformers' empirical success comes from the attention function's ability to induce graphs of relations from text. Our recent work has extended this ability to knowledge graphs, and to inducing the nodes of the graph as well, known as entity induction, with the first variational-Bayesian generalisation of the attention mechanism. This project will further develop this information-theoretic understanding of transformer embeddings and its sparsity-inducing regulariser, for learning graphs of higher-level abstract entities. The resulting Bayesian beliefs over generalised transformer embeddings of texts and graphs will give us the more interpretable, more programmable and more learnable abstract representations which are the core of this proposed project.To leverage and extend these fundamental advances in representation learning, we will develop LLM architectures with a memory. Motivated by the success of Retrieval Augmented LLMs, our Belief Augmented Language Models (BALMs) will move knowledge extracted from training data out of large uninterpretable weight matrices into our interpretable Bayesian beliefs over large transformer embeddings. These beliefs will then be: augmented with human-editable knowledge graphs and selected new texts, refined with control objectives and multi-hop reasoning, and combined with inference of concensus beliefs and opinion summarisation. BALMs will be developed both to evaluate these beliefs and as a chat interface for specifying, accessing and editing the beliefs themselves, including the collaborative specification of shared beliefs. These fundamental advances in deep learning theory and architectures will allow us to control what an LLM says by controlling what it believes, thereby unlocking the power of AI for society.
Idiap Research Institute
Horizon Europe
Nov 01, 2025
Oct 31, 2030