Christof Schöch: “Big? Smart? Clean? Messy? Data in the HumaniLes,”

This article is about the characteristics of data encountered in Humanities research today and what kind of data we want for better research in the future. It introduces what smart data and big data are, and thus draws out their advantages and limitations. While big data can satisfy our need to cover a large range of text or research data, it smoothes out many of the detailed features of specific data- which is not expected by researchers. Similarly, smart data is more advantageous in highlighting features of single data because it has annotation and other forms of content processed by human intelligence. As we cannot create a large amount of smart data efficiently, our existing smart data is necessarily small-sized and selected- which kind of contradicts the methodology of humanities research. By proposing the concept of big smart data, the authors want to answer the question of where the data that Humanities wants should come from and what characteristics it should have: the data should come from a wide range, covering a large amount of content, but at the same time the data should also have a certain structure and be processed by human cognition (e.g., classification, labeling, etc.).

Humanities Data: A Necessary Contradiction

I think an important viewpoint of this article is that data in Humanities has different characteristics compared with experimental data, and it needs to be processed due to a different methodology. These differences can be seen in multiple aspects: the source of the data, whether the data tend to be more precise or ambiguous, how data is organized, and the need for data network building. Interdisciplinary communication between humanities and other fields helps humanities research to gain more methodological support. I was impressed by the last paragraph: number or text alone cannot fully represent reality, and we need to be mindful of the nature of these data. In addition to analyzing data directly, what digital humanities want is methods that can create different ways for scholars to observe or “perceive” what is included in data to generate insights.