NeuMath: Neural Discourse Inference over Mathematical Texts

Recent advances in NLP enabled by Deep Learning based architectures bring the opportunity to support the interpretation of textual content at scale. The application of natural language inference methods can facilitate scientific discovery, reducing the gap between current research and the available large-scale scientific knowledge. Contemporary NLP models, however, are still limited in their ability to interpret abstract and mathematical knowledge. The articulation of mathematical arguments is a fundamental part of scientific reasoning and communication. Across many scientific disciplines, expressing relations and inter-dependencies between quantities (usually in an equational form) is at the centre of the scientific argumentation. NeuMath aims to address this gap by building neuro-symbolic representation models which can support mathematical natural language inference (MNLI). The project will develop models which can jointly represent and reason over two symbolic modalities (natural language and mathematical expressions) and will build the foundations to deliver embedding models which can interpret and support the generation of mathematical arguments (by leveraging available large-scale scientific corpora). The project will pioneer neural representation paradigms with novel semantic control/explicit representation mechanisms over embedding spaces (the injection of structural biases) aiming to enable explainable, multi-hop and abstractive MNLI inference. NeuMath will focus on the design of representations which facilitate the transference of inter-sentence and inter-document relations into inference relations and will develop novel neuro-symbolic architectures which can efficiently represent multi-type/multi-hop inference. The project provides a head-on approach a paradigmatic area in the AI and NLP spaces: the development of models which can approximate the flexibility of neural models with the rigour and abstraction capability of symbolic models. MNLI provides an ideal problem space to push the envelope of explainable and precise NLP models. These features dialogue with requirements for the application of AI in economic areas of high national relevance, such as pharmaceutical and finance. More specifically, the ability to interpret mathematical discourse at scale can be a fundamental infrastructure to catalyse scientific discovery.
Idiap Research Institute
Swiss National Science Foundation
Apr 01, 2022
Mar 31, 2025