Christian Henriot and Cecile Armand
Time and Place: Thursday, 01.07., 14:45–15:05, Room 2
Session: Biographies and Careers in China
Keywords: China; elite; biography; NLP
Our objective is fourfold: to reconstruct the network of social relations among historical figures extracted from a biographical dictionary, the Biographical Dictionary of Republican China (BRDC); to derive a network of social relations from networks of words; to reveal hidden patterns of connections within and across the articles, and between individuals (as well as institutions and places); to propose ultimately an alternative non-linear reading of this dictionary.
Methods and data
The BDRC consists of 4 volumes of about 500 pages each and an index volume. The four main volumes describe 588 individuals. To extract and analyze relations, we proceeded in three steps:
- Building networks of words: We developed a specific workflow to automatically extract relations between entities from the BDRC, relying on Named Entity Recognition (NER) and a visualization tool (Padagraph) (Magistry et al., 2019). We extracted not only individuals but also institutions, places and events. The first challenge was language (Chinese, English, Wade-Giles transliteration) and inconsistencies in naming entities across the book. This involved a long process of human checking/cleaning data produced a mapping of all entities and their connections. Yet as social historians, we are not just interested in relations between named entities per se, but in retrieving actual relations between historical actors from the extracted words. To exemplify the issue of moving from named entities to actors, we focused on interpersonal relations (relations between individuals).
- Reconstructing interpersonal networks: We examined the relations between individuals regardless of the nature of the relations, using directed networks, various centrality measures (degree, betweenness) and clustering techniques in Cytoscape: (1) We distinguished between “biographical nodes” (bionodes) – i.e. individuals who are the subject of an entry in the BDRC – and non-biographical nodes (individuals only mentioned in an entry); (2) We identified different profiles depending on their position in the network (bionodes, brokers, outsiders); (3) We focused on “brokers” – actors who are not necessarily central in the BDRC, but who nevertheless occupied a pivotal position in our network, by linking disconnected data on actors in the articles; (4) We identified different clusters of relations that suggested patterns of political, professional, or kinship affiliations.
- Classifying relationships: We propose to move beyond the sentiment analysis classification of relations used in previous studies (De Camp, 2011) and common in contemporary SNA (like with Facebook data) in order to build an ontology of historical relations: (1) We first distinguished between actual/ social and textual/contextual relationships (simple co-occurrences of words); (2) Then we qualified more specifically the nature of relationships (personal, political, business, kinship, etc.) using a four-step approach: (a) annotating relations in a representative sample of articles using InCeption; (b) applying these annotations to the entire corpus (c) checking-and correcting possible errors and inconsistencies (d) visualizing and analyzing results using Cytoscape.
We observed that “bionodes” were not necessarily the most central actors – in terms of degree centrality (number/density of connections) – although the top ten includes well-known figures of political or military leaders (Chiang Kai-shek, Sun Yat-Sen, Yuan Shikai, Mao Zedong, Zhou Enlai). SNA highlights a mixed group of intermediate characters in politics (Zhang Binglin), academia (Li Dazhao), and the military (Li Zhongren). Intellectuals, including some with religious affiliation, often play the role of brokers in the whole network. Furthermore, community detection brings to the fore groups around individuals that we did not expect, particularly women (such as Song Meiling). Finally, the network of classified relations, combined with clustering techniques, helps redefine new configurations of social and political power among Chinese Republican elites.
Boorman, H.L. Biographical Dictionary of Republican China (New York: Columbia University Press, 1967).
Camp, Matje van de, and Antal van den Bosch. “A Link to the Past: Constructing Historical Social Networks.” In Proceedings of the 2Nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, 61–69. WASSA’11. Stroudsburg, PA, USA: Association for Computational Linguistics, 2011.
Magistry, P. et al. Mining the Biographical Dictionary of Republican China. From print to network exploration. In Proceedings of the 3rd Conference on Biographical Data in a Digital World 2019. Varna, Bulgaria, 2019.