Simon Päpcke and Ulrik Brandes
Time and Place: Thursday, 01.07., 10:20–10:40, Room 1
Session: Networking Publications
Keywords: Networks extracted from texts; Character constellation; Graph distance; Literary networks
Background
We consider a 19th century corpus of novellas and analyze whether their character constellation networks have common structural properties. In recent years network analysis has become a fast growing tool in the digital humanities. As a common application, literary scholars try to mimic the network methods used in the social sciences and apply them to character networks in literature. Traditional approaches use among others centrality measures, network motifs or community detection and apply these to static or dynamic character networks. With this, one can either verify known results from literary studies or gain a deeper understanding of the characters roles. However, a large part of these analyses deal with a singular work or a collection of texts with reoccurring characters (e.g. the Marvel Universe or Harry Potter). In contrast, we want to analyze and compare an entire network ensemble. Our corpus was composed with the editors intention to be a paradigmatic sample of the novella style and a strict formal criteria was given to distinguish novellas from novels. This motivated our research question whether the restricted form of a novella give preference to a specific character constellation.
Methods and Data
We illustrate our methods on the Deutsche Novellenschatz, a corpus published by Paul Heyse and Hermann Kurz between 1871 and 1876. It contains 86 novellas of 82 different authors (eleven female, 71 male) that are of various length. The texts that have been originally published between 1811 and 1875 cover the epochs of German romanticism, Biedermeier, Young Germany, Vormärz and literary realism. Main topics are love, wedding and marriage as well as village life, art and justice. Moreover, the corpus itself contains a long introduction in which the editors state their intent to build a canonical collection of 19th century German language novellas. As a criterion to outline the characteristic of a novella in their definition, they established the phrase strong silhouette („starke Silhouette“) and claimed it to be their guiding principle in the selection of texts. As such, a text does not demand to have a certain text length to be considered as a novella but instead needs to stay focused on a single topic that then can be executed thoroughly. Hence, we hypothesize that the novellas in the corpus have a similar character constellation network. To test this we use methods from natural language processing and network analysis to generate and compare the required character networks. First, we consider undirected co-occurrence networks where the nodes represent main characters that are derived from the text by named entity recognition. In this networks, a link between two nodes is present if they co-occur in the same sentence. Moreover, there are two link weights attached to each link reflecting its strength and overall sentiment between the characters involved. Therefore, a sentiment analysis is conducted on the sentences where characters co-occur. Second, we also derive character networks by syntax structures and make use of case grammar networks as proposed by Franzosi (2004, chap. 2). While the nodes are again the main characters, the links are now directed from subjects to objects via an action and weighted by the actions sentiment. To answer our research question we consider these networks as instances in a metric space. The structural graph distance (Nagel 2011, chap. 6, 7) is a similarity measure that consistently match network motifs by considering spectrum transformation costs. Thus, its invariance under network size and automorphisms make it applicable for our purpose of comparing 86 different networks of diverse size with unrelated characters.
Findings
While co-occurrence networks display the simple appearance of two characters in the same sentence, case grammar networks require an interaction of the characters for tie formation. Hence, these networks can be viewed as subgraphs and in fact approximately only one third of the links remain such that we can expect different results for our two approaches if the thinning is not happening in a uniform manner. In fact, in the case of co-occurrence networks we observe that actually more than half of the novellas build a core with a very short structural graph distance and hence, a structurally similar character constellation. However, there is still a large group of texts that have a larger variance to this core. When considering the fine grain case grammar approach, graph distances are even more pronounced and clusters of character constellation types become visible.
References
Franzosi, Roberto. 2004. From Words to Numbers: Narrative, Data, and Social Science. Cambridge: Cambridge University Press.
Nagel, Uwe. 2011. „Analysis of Network Ensembles.“ PhD diss., University of Konstanz.