{"id":485,"date":"2021-06-15T12:53:34","date_gmt":"2021-06-15T12:53:34","guid":{"rendered":"http:\/\/hnr2021.historicalnetworkresearch.org\/?page_id=485"},"modified":"2021-06-21T12:35:22","modified_gmt":"2021-06-21T12:35:22","slug":"robustness-in-network-extraction-from-text-a-case-study","status":"publish","type":"page","link":"http:\/\/hnr2021.historicalnetworkresearch.org\/?page_id=485","title":{"rendered":"Robustness in Network Extraction from Text: a Case Study"},"content":{"rendered":"\n<h2 style=\"text-align:center\"><em>Ana L. C. Bazzan<\/em>, Silvio Renato Dahmen, Sandra Denise Prado, M\u00e1ir\u00edn MacCarron, Julia Hillner and Ulriika Vihervalli<\/h2>\n\n\n\n<p class=\"box\"><strong>Time and Place:<\/strong> Friday, 02.07., 12:10\u201312:30, Room 1<br><strong>Session:<\/strong> Data and Methodology<\/p>\n\n\n\n<p><strong>Keywords:<\/strong> Robustness of networks; early medieval history; ecclesiastical history<\/p>\n\n\n\n<p><strong>Background&nbsp;<\/strong><\/p>\n\n\n\n<p>Techniques stemming from the theory of social networks are increasingly being used as quantitative\u00a0 tools with which one may analyse and quantify interpersonal relationships. In particular, historians\u00a0 are employing them aiming at gaining new insights in several case studies (Gould, 2003).<br><br>A social network or a graph G is formally defined as G = (N; L), where N is the set of nodes\u00a0 (the actors in the network), and L is the set of links. A link is a connection (or interaction) of any sort\u00a0 between two nodes. There are many measures that quantify the structure of the network and the\u00a0 importance of nodes in a network (see Costa et al. (2007)). One of these is the degree centrality,\u00a0 which measures how many direct connections a node has. Extracting a network from a textual\u00a0 source is a key step in this quantitative process. If this step is not accomplished carefully, then it\u00a0 might be that the insights gained from analysing the structure and other characteristics of the\u00a0network are flawed or at least partially invalid.<\/p>\n\n\n\n<p><strong>Methods and Data<\/strong><\/p>\n\n\n\n<p>In this work, we investigate the robustness of networks to mistakes arising in data extraction from\u00a0 textual sources. Specifically, we take networks that were manually compiled\u2014 considered golden\u00a0 standards\u2014and insert, with a certain probability, noise of three types: (i) removal, (ii) addition, and\u00a0 (iii) rewiring of connections. Removal of connections aims at investigating what happens if they are\u00a0 missed during the data extraction; the second relates to extra connections being accidentally\u00a0 inserted; the third refers to the human compiler making mistakes such as connecting node A to C,\u00a0 instead of the expected connection from A to B. We then compare the results for the original\u00a0 network to those for the perturbed network. For this experiment, we use early medieval texts in\u00a0 which the role of women as connectors is being investigated within the project \u2018Women, Conflict\u00a0 and Peace: Gendered Networks in Early Medieval Narratives\u2019. Among them, we cite Bede\u2019s Ecclesiastical History of the English People, Eusebius\u2019 Ecclesiastical History, Stephen\u2019s Life of Wilfrid,\u00a0 Baudonivia\u2019s Life of Radegund, and Venantius Fortunatus\u2019 Life of Radegund.<br><br>In that project, data from early medieval texts were extracted. These texts date from the\u00a0 fourth to the eighth centuries and have survived in manuscripts. These have later been edited into\u00a0 volume compilations in the original language \u2013 Koine Greek and Latin. The data compilers used the edited Greek or Latin volumes, working through the narrative, using their expertise of the language\u00a0 in question and of the historical context of the work to record every active character and any\u00a0 interactions they have. Regarding these, the historian experts have themselves developed a data\u00a0 model of 21 categories. Identity of characters, names, dates, genealogies, etc. were all double checked. While the primary material is sometimes straightforward, this is not always the case.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"http:\/\/hnr2021.historicalnetworkresearch.org\/wp-content\/uploads\/2021\/06\/Graphik-Robustness.jpg\"><img decoding=\"async\" loading=\"lazy\" width=\"826\" height=\"455\" src=\"http:\/\/hnr2021.historicalnetworkresearch.org\/wp-content\/uploads\/2021\/06\/Graphik-Robustness.jpg\" alt=\"\" class=\"wp-image-486\" srcset=\"http:\/\/hnr2021.historicalnetworkresearch.org\/wp-content\/uploads\/2021\/06\/Graphik-Robustness.jpg 826w, http:\/\/hnr2021.historicalnetworkresearch.org\/wp-content\/uploads\/2021\/06\/Graphik-Robustness-300x165.jpg 300w, http:\/\/hnr2021.historicalnetworkresearch.org\/wp-content\/uploads\/2021\/06\/Graphik-Robustness-768x423.jpg 768w, http:\/\/hnr2021.historicalnetworkresearch.org\/wp-content\/uploads\/2021\/06\/Graphik-Robustness-800x441.jpg 800w\" sizes=\"(max-width: 826px) 100vw, 826px\" \/><\/a><figcaption>Figure 1. Change in Women&#8217;s Average Degrees with Probability or Rewiring<br><\/figcaption><\/figure>\n\n\n\n<p>Sometimes actors and their links in the text can take some effort to establish. To handle such\u00a0 difficult parts, the compilers held several meetings and discussed them all, especially in what\u00a0 regards where certain interactions fell within the 21 categories of interactions. Each interaction recorded is thus the outcome of not only close reading of a text, but the data harvesting process\u00a0 involves numerous steps, checks and discussions by experts. The database also undergoes\u00a0 continuous quality checks to ensure the accuracy of the thousands of entries and the even more\u00a0 numerous links between them, verifying that links are made correctly and between the right people.\u00a0 This work was done by all project members to ensure consistency between databases, maintaining\u00a0 the high quality and accuracy of the data, which will go on to enable comparisons between different\u00a0 databases and their networks. Hence, the historians assess the quality of the collected databases as\u00a0 extremely good, with data being very accurate.<br><br>As mentioned, these texts were used to draw conclusions about the role of female actors in\u00a0 the network. For instance, in Prado et al. (2020), the text by Bede was used to investigate\u00a0 communicability of various nodes. One conclusion is that two women were fairly relevant: Eanfled,\u00a0 a former queen of Northumbria, and Hild, abbess of Whitby. Regarding Venantius\u2019 Radegund, one\u00a0 important characteristic of the network is the high number of women (nearly 50%). The other texts\u00a0 are providing further interesting insights too (under investigation). Thus, one may ask how such\u00a0 conclusions would change if each network were not carefully extracted from the textual sources.<br><br>To investigate this, we have devised the aforementioned robustness measures. We have\u00a0 perturbed those networks in order to artificially remove, add, or rewire connections with varying\u00a0 probability. For instance, 1% of connections can be changed. We then perform two types of\u00a0 comparisons, with results as follows.<\/p>\n\n\n\n<p><strong>Findings<\/strong><\/p>\n\n\n\n<p>The first type of comparison refers to the average degree of women, i.e., how much the degree of\u00a0 all women in each network has changed. Here, results show that noise of the type (iii), i.e., making\u00a0 the wrong connection between two nodes in the network\u2014no matter if men or women\u2014 is less\u00a0 likely to affect the overall conclusion(s), as seen in Figure 1. However, the other two types of errors\u00a0 that are failing to include connections that in fact would exist, or adding connections that in fact\u00a0 would not be present, may affect the drawing of conclusions since they change the degree of\u00a0 women.<br><br>The second type of comparison regards the position of key actors in the ranking of women. Here we investigated if the most relevant women would change their position in the ranking of\u00a0 degree centrality. The main conclusion so far is that the ranking of women is resilient to those\u00a0 perturbations.\u00a0<\/p>\n\n\n\n<p><strong>References&nbsp;<\/strong><\/p>\n\n\n\n<p>Costa, L. da. F., F. A. Rodrigues, G. Travieso, and P. R. V. Boas (2007). Characterization of complex networks: A survey of measurements. Advances in Physics 56(1), 167\u2013242. Gould, R. V. (2003). USES OF NETWORK TOOLS IN COMPARATIVE HISTORICAL RESEARCH, pp. 241\u2014-269. Cambridge Studies in Comparative Politics. Cambridge: Cambridge University Press.&nbsp;<\/p>\n\n\n\n<p>Prado, S. D., S. R. Dahmen, A. L. C. Bazzan, M. MacCarron, and J. Hillner (2020). Gendered networks and communicability in medieval historical narratives. (available at <a href=\"https:\/\/arxiv.org\/abs\/2002.01396\">https:\/\/arxiv.org\/abs\/2002.01396<\/a>).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ana L. C. Bazzan, Silvio Renato Dahmen, Sandra Denise Prado, M\u00e1ir\u00edn MacCarron, Julia Hillner and Ulriika Vihervalli Time and Place: Friday, 02.07., 12:10\u201312:30, Room 1Session: Data and Methodology Keywords: Robustness of networks; early medieval history; ecclesiastical history Background&nbsp; Techniques stemming from the theory of social networks are increasingly being used as quantitative\u00a0 tools with which one may analyse and quantify<\/p>\n<p><a class=\"more-link\" href=\"http:\/\/hnr2021.historicalnetworkresearch.org\/?page_id=485\">Weiterlesen<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":98,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/pages\/485"}],"collection":[{"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=485"}],"version-history":[{"count":5,"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/pages\/485\/revisions"}],"predecessor-version":[{"id":636,"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/pages\/485\/revisions\/636"}],"up":[{"embeddable":true,"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=\/wp\/v2\/pages\/98"}],"wp:attachment":[{"href":"http:\/\/hnr2021.historicalnetworkresearch.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}