Text Mining Reponse

While Hearst’s piece was a useful primer on the initial concepts involved in text mining, she ends her piece by saying that the fundamental limitation of text mining is that “we will not be able to write programs that fully interpret text for a very long time,” a concept which I think completely obscures the meaning and use of these text mining tools. Binder’s chapter on Alien Readings addresses the fundamental issue with this statement. Ultimately, there is no “full interpretation” of any text, not one that any single human could ever generate, let alone any machine. But when thinking beyond the human, the drive to use algorithms to understand text based on any concrete semantic understanding will always be futile, as the tools themselves always will have tendencies towards interpreting text in various ways based on the kinds of language and writing style for which they were designed, among many other implicit biases. In “Alien Reading,” we have the example of Latent Dirichlet Allocation (LDA), which is based off of research from a DARPA initiative in the mid-nineties to conduct topic-based text analysis of news feeds. Though the updated topic-modeling tool can provide useful interpretation of texts beyond news feeds all the way to eighteenth-century essays, it breaks down when looking at various forms of poetry.

Binder’s point here is one that is quite useful not only for this context, but particularly when looking at contemporary modes of machine learning and creation beyond text. He suggests that any output from these tools can absolutely have use and provide additional interpretations, but should always be understood within the context of the tool itself. Any text scraping analysis can only be taken as part of the “truth” of a given text within the context of the tool that is being used to scrape and analyze. This idea isn’t particularly new – just as one would expect a psychoanalyst to provide a different reading of a text than a biologist, one must take these interpretive tools as interpretive! This understanding of the situatedness of tools for analysis is a practice that is pushed both by STS and Media Studies, and I think the tendency to forget this property of machine readings is one of the remaining frictions from the 20th century that we are slowly overcoming. I think there could be (and likely already has been) fascinating work with comparing the outputs from a variety of different textual analysis tools when inputting the same text. I would be interested to examine some of these examples.