From Textual to Historical Networks: Reconstructing Social Relations among Chinese Elites from the Biographical Dictionary of Republican China (BRDC)

Christian Henriot and Cecile Armand

Time and Place: Thursday, 01.07., 14:45–15:05, Room 2
Session: Biographies and Careers in China

Keywords: China; elite; biography; NLP

Background

Our objective is fourfold: to reconstruct the network of social relations among  historical figures extracted from a biographical dictionary, the Biographical Dictionary of  Republican China (BRDC); to derive a network of social relations from networks of words; to reveal  hidden patterns of connections within and across the articles, and between individuals (as well as  institutions and places); to propose ultimately an alternative non-linear reading of this dictionary.  

Methods and data

The BDRC consists of 4 volumes of about 500 pages each and an index volume.  The four main volumes describe 588 individuals. To extract and analyze relations, we proceeded in  three steps:

  1. Building networks of words: We developed a specific workflow to automatically extract  relations between entities from the BDRC, relying on Named Entity Recognition (NER) and a  visualization tool (Padagraph) (Magistry et al., 2019). We extracted not only individuals but also  institutions, places and events. The first challenge was language (Chinese, English, Wade-Giles  transliteration) and inconsistencies in naming entities across the book. This involved a long  process of human checking/cleaning data produced a mapping of all entities and their  connections. Yet as social historians, we are not just interested in relations between named  entities per se, but in retrieving actual relations between historical actors from the extracted  words. To exemplify the issue of moving from named entities to actors, we focused on  interpersonal relations (relations between individuals).
  2. Reconstructing interpersonal networks: We examined the relations between individuals  regardless of the nature of the relations, using directed networks, various centrality measures  (degree, betweenness) and clustering techniques in Cytoscape: (1) We distinguished between  “biographical nodes” (bionodes) – i.e. individuals who are the subject of an entry in the BDRC – and  non-biographical nodes (individuals only mentioned in an entry); (2) We identified different  profiles depending on their position in the network (bionodes, brokers, outsiders); (3) We focused  on “brokers” – actors who are not necessarily central in the BDRC, but who nevertheless occupied  a pivotal position in our network, by linking disconnected data on actors in the articles; (4) We  identified different clusters of relations that suggested patterns of political, professional, or  kinship affiliations.
  3. Classifying relationships: We propose to move beyond the sentiment analysis classification  of relations used in previous studies (De Camp, 2011) and common in contemporary SNA (like with  Facebook data) in order to build an ontology of historical relations: (1) We first distinguished  between actual/ social and textual/contextual relationships (simple co-occurrences of words); (2)  Then we qualified more specifically the nature of relationships (personal, political, business,  kinship, etc.) using a four-step approach: (a) annotating relations in a representative sample of  articles using InCeption; (b) applying these annotations to the entire corpus (c) checking-and correcting possible errors and inconsistencies (d) visualizing and analyzing results using Cytoscape.

Findings

We observed that “bionodes” were not necessarily the most central actors – in terms of  degree centrality (number/density of connections) – although the top ten includes well-known  figures of political or military leaders (Chiang Kai-shek, Sun Yat-Sen, Yuan Shikai, Mao Zedong,  Zhou Enlai). SNA highlights a mixed group of intermediate characters in politics (Zhang Binglin),  academia (Li Dazhao), and the military (Li Zhongren). Intellectuals, including some with religious  affiliation, often play the role of brokers in the whole network. Furthermore, community detection  brings to the fore groups around individuals that we did not expect, particularly women (such as  Song Meiling). Finally, the network of classified relations, combined with clustering techniques,  helps redefine new configurations of social and political power among Chinese Republican elites.  

References  

Boorman, H.L. Biographical Dictionary of Republican China (New York: Columbia University Press,  1967).  

Camp, Matje van de, and Antal van den Bosch. “A Link to the Past: Constructing Historical Social  Networks.” In Proceedings of the 2Nd Workshop on Computational Approaches to Subjectivity and  Sentiment Analysis, 61–69. WASSA’11. Stroudsburg, PA, USA: Association for Computational  Linguistics, 2011.  

Magistry, P. et al. Mining the Biographical Dictionary of Republican China. From print to network  exploration. In Proceedings of the 3rd Conference on Biographical Data in a Digital World 2019.  Varna, Bulgaria, 2019.  

InCeption. https://inception-project.github.io/