Andrea Sangiacomo, Raluca Tanasescu, Silvia Donker and Hugo Hogenbirk
Time and Place: Thursday, 01.07., 11:15–11:35, Room 1
Session: History of Science
Early modern natural philosophy (the ancestor of today’s natural sciences) underwent dramatic transformations that completely reshaped its conceptual framework and set of practices. The master narrative about the seventeenth- and eighteenth-century, the Scientific Revolution, has often presented this as a somehow linear process, which progressed from the dismissal of Aristotelian natural philosophy to the establishment of a new Newtonian paradigm. Today’s scholarship is critical of this overly simplified reconstruction, but it struggles to find ways of delving into the actual historical complexity of the period. The difficulty is mostly due to the limitations of traditional methods and approaches, which are not well suited to handle and study the vast amount of materials that should be taken into account for providing a more satisfying investigation into the evolution of the field.
Our long-term project aims at integrating traditional scholarship and network analysis in order to explore the co-evolution of social and semantic dimensions that shaped early modern natural philosophy. In the first phase of the project, we reconstructed a large corpus of works related to natural philosophy, compiled from the point of view of how the discipline was taught, thus focusing on textbooks and other works connected with the early modern academic milieu.
In this paper, we explore the following question: how can we best combine socio and semantic dimensions of a network in which we do not have access to explicit ties among authors or works composing it?
2. Methods and data
Building on our previous research, we managed to compile a corpus of 239 early modern printed books, containing approximately twenty million words, written in Latin, French, and English, which are all concerned with providing a systematic and encompassing account of the changing field of natural philosophy between 1587 (Abraham de la Framboisière’s Methodicae Institutiones) and 1832 (John Robison’s A System of Mechanical Philosophy). The OCR quality of the corpus scores a minimum of 90% per page, which allows for reliable text mining. The criterion used for compiling this corpus has been largely affected by available bio-bibliographical information in secondary scholarship and web-scraping procedures in the WorldCat. This provided access to a wealth of titles, which are mostly obscure or entirely forgotten in today’s scholarship. We do not have access to explicit information about how particular authors or works were connected among one another (e.g. personal relationships or correspondences between authors, direct references among works). Despite how large the amount of exciting research could be conducted on these materials, little can be done without finding suitable ways of representing this collection of scattered works as forming some coherent whole.
In this paper, we offer a method for creating a multifaceted representation of our corpus, which expresses key aspects or features of the corpus in terms of different but connected multiplex networks. In particular, we assume that a thorough study of our corpus should encompass at least three different dimensions: (i) social; (ii) semantic; (iii) linguistic (textual). The social dimension is more concerned with the question of ‘who’ the authors of our works were, and how can we bind together from the point of view of social properties, such as the fact of having studied or worked at certain institutions or having interacted with certain publishers. The semantic dimension encompasses the way in which specific keywords were used in our corpus, from which we expect to derive information about how certain concepts were understood, reshaped, and disseminated by different authors or appropriated by different approaches and traditions. The linguistic dimension represents even broader features, such as the homogeneity in the style and linguistic usages in the overall corpus, both among works written in the same language, and across multiple languages. These three dimensions, then, tackle the potential ‘similarity’ between the authors and works in our corpus from different perspectives, and our method consists in using this threefold notion of similarity to build links between the authors and works by formalizing the relationships they establish as networks.
Since each of the three dimensions we consider is in itself complex and multifaceted, the networks we construct for each of them cannot be a single-layered network, but rather a multiplex network composed of several layers. Each multiplex network combines different computational approaches: co-affiliation and assortativity coefficient for the social dimension, collocate analysis for the semantic dimension, and a combination of topic modelling, tf-idf and word embeddings for the linguistic dimension.
In order to exemplify how our method works, we pick a small selection of books, which illustrate how human readers with some background knowledge would connect and group together different works included in the corpus at hand. We use these works as a reference and throughout our discussion we then compare where they are located and represented in the networks we build. In this way, we offer a more direct insight into how our distant computational perspective adds and integrates our initial expectations and assumptions as human readers. Our purpose here is not to advance any specific claim about the history of early modern natural philosophy and science that can be derived by using our method or studying these particular works, but rather establishing that that method is sound and effective and that it can be implemented for exploring our sources in new ways.
The result of the method used is that we can now represent our starting corpus from the point of view of three multiplex networks, which are connected with one another in virtue of the fact that they are derived from the same entities (ultimately, the 239 books). This result is already sufficient to begin exploring the properties of this corpus and how it can be used to investigate the history of early modern natural philosophy. However, the method has even a greater potential, since our three multiplex networks can be built themselves together into a complex multilayer network, which would then allow for a synoptic representation of the three dimensions described here as a whole unified graph. In this sense, the methodology presented in this paper provides the groundwork for such further development. Given the technical and conceptual complexities involved in this research, we focus for now on the more technical and practical methodological dimensions, in order to also demonstrate its potential for being applied to any other multilingual corpus relevant for other disciplines or time periods.
4. Selected bibliography:
Bianconi, Ginestra. 2018. Multilayer Networks: Structure and Function. Oxford: Oxford Scholarship Online.
Borgatti, S. & Halgin, D. 2014. “Analyzing affiliation networks.” In Scott, J., & Carrington, P. J. (eds.). The SAGE handbook of social network analysis, pp. 417-433. London: SAGE Publications
Brezina, V., McEnery, T., & Wattam, S. 2015. “Collocations in context: A new perspective on collocation networks.” International Journal of Corpus Lingustics, 20(2), pp. 139-173.
de Bolla, P., Jones, E., Nulty, P., Recchia, G., & Regan, J. (2019). Distributional Concept Analysis: A Computational Model for History of Concepts. Contributions to the history of concepts, 66-92.
Dickinson, Mark E; Magnani, Matteo; and Rossi, Luca. 2016. Multilayer Social Networks. Cambridge : Cambridge University Press.
Roth, Camille, and Jean-Philippe Cointet. 2010. “Social and Semantic Coevolution in Knowledge Networks.” Social Networks 32 (1), pp. 16–29. https://doi.org/10.1016/j.socnet.2009.04.005.
Taeho, Jo. 2019. Text Mining. Concepts, Implementation, and Big Data Challenge. New York: Springer.