Structured-entities Ontology Extension

Reference ontologies play an essential role in organising knowledge in the life sciences and other domains. They are built and maintained manually. Since this is an expensive process, many reference ontologies only cover a small fraction of their domain. The goal of this project is to develop techniques that enable the automatic extension of the coverage of a reference ontology by extending it with classes that have not yet been manually added. The extension shall be faithful to the (often implicit) design decisions by the developers of the reference ontology. While this is a generic problem, our use case addresses the automatic extension of the Chemical Entities of Biological Interest (ChEBI) ontology with classes of molecules, since the chemical domain is particularly suited to our approach. We achieve our goal by using the leaf classes of the manually curated reference ontology to train a system to predict subclass relationships between mid-level classes and new classes. Thus, our method uses machine learning techniques, but – in contrast to other approaches – does not rely on text corpora as input, but uses the content of the ontology itself. Annotations of classes that provide information that are relevant for the classification of a given entity within the ontology play a key role in this learning task. E.g., in the case of ChEBI these are annotations that represent the structure of chemical entities (e.g., molecules and functional groups). In addition, the axioms of the ontology are represented as logical neural networks, which are used during the training of prediction models. Thus, our approach for ontology extension uses neural-symbolic integration. In our previous work we have established the feasibility of the approach by comparing the performance of a number of machine learning approaches at subclass prediction. In spite of the limitations of this initial work, the performance of some of our models compare positively to ClassyFire. The latter is a rule-based system representing the state of the art for this task, and is already being used in the development of ChEBI. Furthermore, our results show that different machine learning approaches are suited for different kinds of chemical entities. Thus, we plan to use an ensemble approach in our project. The outcomes of this project will be (a) a benchmark training set for training models for chemical ontology extension, and (b) a system that – when provided with a set of new chemical entities as input – will automatically generate a new ontology that extends ChEBI to cover these entities. The benefit of this work is a novel methodology for extending the coverage of existing reference ontologies. If adopted, it will allow improved interoperability and knowledge integration for the communities that use these reference ontologies. Another benefit will be a novel neural-symbolic architecture, integrating graph neural networks, transformers and logical neural networks. We will also explore methods for explainability of neural networks using neural-symbolic approaches.

Leader Name

Funding

Start

Mar 01, 2024

Stop

Feb 28, 2027

Groups

People

HASTINGS, Janna