Commentary on Jeffrey M. Binder, “Alien Reading: Text Mining, Language Standardization, and the Humanities”

This article explained how text mining was transforming the humanities, but warns that the algorithms and statistical methods used for text mining behave very differently from humans. We should be careful about how they are designed and what aspects of the text they are capturing because it could reflect false or skewing assumptions. I can see a danger in people applying technical tools without understanding how they work and then drawing incorrect conclusions. I’m really curious about how the language processing algorithms work, given that I have taken a number of machine learning and statistical inference classes. Unlike many problems in machine learning, it does not appear that this problem can just be solved with a large enough quanitity of data. Longer texts could have more complex meanings, weaving together multiple different narratives, and it is difficult to establish “ground truths” to evaluate exactly how well a computational approach “understands” the text.