SDialog: Idiap's Open-Source Toolkit for Reproducible Conversational AI

Building reliable conversational AI systems—such as chatbots or virtual assistants like Siri or Alexa—is more challenging than it may initially appear. Although modern LLMs have improved dramatically in capability, researchers and developers still contend with a fragmented ecosystem. Datasets are often stored in incompatible formats, evaluation methodologies lack consistency, and reproducibility remains limited across studies and implementations.

Good science demands the ability to compare, repeat, and build on each other's work — and that has been surprisingly difficult in the dialogue systems field. To address this, researchers at Idiap have developed SDialog, a fully open-source Python toolkit designed to bring structure, transparency, and reproducibility to the complete conversational AI development pipeline. The toolkit is freely available on GitHub under an MIT license, and represents a concrete contribution by Idiap to the growing open science movement in AI research.

SDialog covers the full lifecycle of a dialogue system in a single, coherent workflow: building agents, simulating users, generating synthetic dialogues, and evaluating results. Rather than stitching together incompatible tools, researchers can work within one framework from start to finish.

Key capabilities include:

Realistic simulation: Create detailed personas and let models generate believable conversations between them.
Standard format: Use a unified format to store and share dialogue data.
Built-in evaluation: Metrics are included to easily compare dialogue quality and systems.
Model insight: Analyze what’s happening inside models and even influence their behavior.
Audio generation: Turn text dialogues into realistic spoken conversations.
Wide compatibility: Works with most major AI platforms.

The choice to release SDialog as open-source is not incidental — it reflects a deliberate commitment to the principles of open science. The code, documentation, tutorials, and even pre-built container images are all publicly available. Datasets standardized to the SDialog format are published on Hugging Face for the whole community to use.

This kind of openness is increasingly recognized as essential for trustworthy AI research. When methods and tools are shared, results can be independently verified, experiments can be reproduced, and the whole community benefits from accumulated progress rather than duplicating effort in isolation.

SDialog was recently presented as a system demonstration at EACL 2026 in Rabat, one of the major European venues for natural language processing research.

The work was developed within the framework of the EU Horizon 2020 ELOQUENCE project, and received additional development contributions during the JSALT 2025 workshop at Johns Hopkins University, reflecting the international collaborative reach of Idiap's research.The repository has already attracted significant community interest, with over 125 GitHub stars and 25 forks since its release.

SDialog is available at github.com/idiap/sdialog, with full documentation at sdialog.readthedocs.io. A video demonstration and interactive tutorials are also available for those who want to get started quickly.

Researchers or developers who wish to contribute — whether by converting a dataset, proposing a new evaluation metric, or simply reporting a bug — are encouraged to open an issue on GitHub.