Shared Task on Scene Segmentation (STSS)


Task Description

Our shared task has the goal of developing methods to detect scenes in narrative texts automatically. A scene can be understood as a segment of a text where the story time and the discourse time are more or less equal, the narration focuses on one action and space and character constellations stay the same. Scenes can be found predominantly in narrative texts like novels or biographies, which can be understood as a sequence of segments where some of the segments are scenes and others are not. Scene segmentation is of great interest for the high-level analysis of longer texts, for example the reconstruction of plot, but also for many areas of NLP that deal with longer narrative texts, since even modern methods struggle with processing text longer than a couple of sentences or paragraphs. The shared task thus also provides a testbed to explore methods to handle long texts.


The data set used in the shared task consists of German-language dime novels annotated with scene boundaries. The annotation guidelines are already available here. A partial data set of 15 novels has already been published in the context of an EACL paper (preprint). As the annotation process continues, test data annotated according to the same guidelines will be available soon.

We structure our shared task into two tracks distinguished by the evaluation dataset: Track 1 focuses on dime novels, which are ‘simple texts’ without strong variation. In track 2, contemporary high literature is used as a test set, thus allowing to evaluate transfer across different narrative text types.

While novels are substantially longer than ‘typical NLP texts’, the number of annotated novels in our dataset is not very large. Participants are thus encouraged to incorporate additional knowledge sources and/or data sets.

Evaluation Metrics

Evaluating segmentation with variable length is not straightforward. Attempts to allow for some leeway (i.e., penalise near misses not as harshly) introduce parameters that are difficult to optimise. The ranking metric will therefore be the exact F1 score over all boundaries. For informative reasons, we will also publish various other metrics and visualisations that allow for a deeper understanding of the performances of the submitted systems.

We will also offer an interim evaluation, i.e., allow participants to test their systems on unknown test data before the final submission deadline.

Important Dates

This is the timeline for this challenge. Further information will be announced in the future. All dates are given in the AoE (anywhere on earth) timezone.


If you have any questions regarding the challenge, please write an email to stss2021@informatik.uni-wuerzburg.de.


Legal Information/Impressum