Anna 11/12 Reading Response
Both Hearst and Gitelman provided important background information and challenged the assumptions we may have about data. Hearst’s essay was relatively straightforward but was still very useful. I know that I have merged what Hearst calls “real” text mining with what she calls “approaches that find overall trends in textual data.” Based only on this piece, I can’t say exactly where the boundary is between these two types of work- and I’m sure there’s a grey area- but I do think it’s important to keep in mind that text mining is more specific than we often consider it to be. I’ve read the Gitelman introduction before, so it’s hard to respond to on its own, but I think it presents a lot of really useful ideas that set up discussions of data well. It makes sense that it’s referenced so often! She really emphasizes the agency that exists behind any data whether that be in capturing or in mobilizing data graphically.
I found the Binder piece to be especially relevant to my own work. I definitely have used text mining carelessly in the past, focusing more on the implications of my results than on the implications of the method I used to get those results. The most alarming examples, for me, were those about ‘non-standard’ speech being excluded from a dataset.
As with a few of the pieces throughout this semester, I found myself agreeing with the premise, but wondering how the presented ideas would work in practice. I completely agree that humanists need to engage historically and critically with the tools we are using, but I am unsure how to do that without reducing that engagement to either a cursory comment/acknowledgment of the problems or to a significant diversion from a piece of scholarship. I don’t know how to engage in the type of interchange with media studies that Binder (and Alan Liu) suggest while maintaining a coherent humanistic argument, at least in shorter pieces of scholarship.
The closest I’ve come to doing the type of work Binder suggested is in a project last year on the Russian elegiac canon, which does not seem far from his example of Lisa Marie Rhody’s work. I think it was possible to do so as I was looking at a closed set of data and did not make claims about Russian speech or even Russian poetry. My paper listed the most commonly used “unusual” words, then explored why each of them was used in different contexts, how different poets used these words, and when they became part of the canon. It also looked into poets whose individual elegiac canons significantly differed from the ‘traditional’ elegiac canon. That said, this entire paper was about the sort of problems addressed here, so I’m not sure how one would be able to incorporate text mining strategies successfully- and ethically-without going into each word’s specific uses, contexts, and connotations. I’m sure that I have thrown in word frequencies or collocations as evidence in papers other than this elegiac one without fully thinking of the necessary explanation. Furthermore, even in this paper, I definitely didn’t fully explore the historical and cultural implications of text mining as a practice.
Across the pieces, something that seemed unfortunately consistent was, as Hearst writes, that “to get further though we need more sophisticated language analysis.” This is related to her point- and Binder’s- that the tools we have work best for certain types of texts and fields, especially in the sciences. This semester of digital humanities has definitely emphasized that nuance is important to humanistic inquiry, and that definitely is true in text mining.