Scale and Cultural Analytics - Daniel Kessler
1. Scale: The Law of Large Numbers I am particularly interested in the means by which we can aggregate and assess large bodies of humanistic data at-scale. As mentioned in this section, “the humanities have historically been the province of close analysis of limited data sets,” but with larger datasets becoming more widely accessible, we can now ask new questions of our data, looking at large-scale trends and themes (narrative-form data, in which I am most fascinated, now also takes new forms in tweets, video game recordings, and other content that is shared in any number of peer-to-peer forums). Furthermore, I am primarily interested in theories that can be proven, and it can be challenging to prove the generalizability of a theory when your dataset is limited, perhaps especially in the humanities. “The Law of Large Numbers” introduces another unique aspect of this problem: what does an experiment look like in the humanities? How can we test humanistic research artefacts and subjects, such that we have quantitative findings that can be clearly proven or disproven? In social science experiments, we often try to evaluate whether a human being will behave differently when their environments or contexts are altered just so. But to test the means by which variables in the equation of a humanistic artefact affect its “outcomes,” would we not have to fabricate new artefacts which do, or don’t, include specific components, and observe the differences in their effects? In any case, and as this section makes clear, new research and design paradigms, and expectations, are required when analyzing humanistic data at scale, and in doing so, we can ask and answer new kinds of questions that tell us more about the “behaviors” of such subjects, so to speak, on the scale of societies and civilizations and across large periods of time.
2. Cultural Analytics, Aggregation, and Data-Mining Data mining and aggregation, in this case, appear to encompass an initial stage of data acquisition and cleaning (and filtering, through applying unique parameters, to pre-select only data that contain variables useful to our inquiry). The displaying of these data can make clear relationships that wouldn’t be able to be seen without its visualization—making this an important step in the analysis of these data, and in a sense making this a part of formal data analysis although it is also a part of data collection. But the “final” stage in this process, cultural analytics, is particularly interesting in that it uses computational tools (e.g., topic modeling, term frequency analysis) in ways that are not traditionally used in humanistic content. Of note, and as mentioned here, the benefit of these approaches is not that they are large-scale, but that they are large-scale and can be combined with “close reading” approaches to gain a more holistic view of the content being studied. Some observations will only be made when content is evaluated at-scale, while others will naturally be observed (more subjectively) in close analysis. These processes are made synergistic through the acquisition and cleaning of structured data that observes both patterns and individual cases of interest.