Personal tools
You are here: Home Dataset Disco Annotation


Get Data null

Disco-Annotation is a collection of training and test sets with manually annoted discourse relations for 8 discourse connectives in europarl texts.

The 8 connectives with their annotated relations are:

although    (contrast|concession)
as        (prep|causal|temporal|comparison|concession)
however     (contrast|concession)
meanwhile    (contrast|temporal)
since        (causal|temporal|temporal-causal)
though        (contrast|concession)
while        (contrast|concession|temporal|temporal-contrast|temporal-causal)
yet        (adv|contrast|concession)

For each connective there is a training set and a test set. The relations were annotated by two trained annotators with a translation spotting method. The division into training and test also allows for comparison reasons if you train your own models.

If you need software for the latter, have a look at:

The publication (Meyer et al., submitted, see below) will also include an explanation on how to map the discourse relations annotated here in the europarl corpus to the senses used in the reference data set of the Penn Discourse TreeBank (PDTB) (Prasad et al., 2008).


For any questions please contact:
Thomas Meyer: thomas.meyer (at), ithurtstom (at)



Please cite the two following papers if you make use of these datasets (and to know more about the annotation method):

  author = {Popescu-Belis, Andrei and Meyer, Thomas and Liyanapathirana, Jeevanthi and Cartoni, Bruno and Zufferey, Sandrine},
  title = {{D}iscourse-level {A}nnotation over {E}uroparl for {M}achine {T}ranslation:
    {C}onnectives and {P}ronouns},
  booktitle = {Proceedings of the eighth international conference on Language Resources and Evaluation ({LREC})},
  year = {2012},
  address = {Istanbul, Turkey}

  Author =  {Cartoni, Bruno and Zufferey, Sandrine and Meyer, Thomas},
  Title =   {{Annotating the meaning of discourse connectives by looking at their translation: The translation-spotting technique}},
  Journal = {Dialogue \& Discourse},
  Volume = {4},
  Number = {2},
  pages = {65--86},
  year =    {2013}

  author = {Meyer, Thomas and Hajlaoui, Najeh and Popescu-Belis, Andrei},
  title = {{Disambiguating Discourse Connectives for Statistical Machine Translation in Several Languages}},
  journal = {IEEE/ACM Transactions of Audio, Speech, and Language Processing},
  year = {submitted},
  volume = {},
  pages = {},
  number = {}

(will be updated as soon as the paper is published)


Work regarding these datasets was funded by the SNF Sinergia project COMTIS. We would also like to thank Bastien Crettol, Yann Rodriguez and Vincent Spano of Idiap for their help in making the data available.