Disco-Annotation is a collection of training and test sets with manually annoted discourse relations for 8 discourse connectives in europarl texts ( ).

The 8 connectives with their annotated relations are:

although    (contrast|concession)
as        (prep|causal|temporal|comparison|concession)
however     (contrast|concession)
meanwhile    (contrast|temporal)
since        (causal|temporal|temporal-causal)
though        (contrast|concession)
while        (contrast|concession|temporal|temporal-contrast|temporal-causal)
yet        (adv|contrast|concession)

For each connective there is a training set and a test set. The relations were annotated by two trained annotators with a translation spotting method. The division into training and test also allows for comparison reasons if you train your own models.

If you need software for the latter, have a look at:

The publication (Meyer et al., submitted, see below) will also include an explanation on how to map the discourse relations annotated here in the europarl corpus to the senses used in the reference data set of the Penn Discourse TreeBank (PDTB) (Prasad et al., 2008).


For any questions please contact:
Thomas Meyer: thomas.meyer (at), ithurtstom (at)



Please cite the two following papers if you make use of these datasets (and to know more about the annotation method):

(will be updated as soon as the paper is published)


Work regarding these datasets was funded by the SNF Sinergia project COMTIS. We would also like to thank Bastien Crettol, Yann Rodriguez and Vincent Spano of Idiap for their help in making the data available.

