Alien Reading: Text Mining, Language Standardization, and the Humanities

I’ve been really interested in Natural Language Processing since high school so I’m familiar with many of it’s really cool applications in a really general sense(https://medium.com/@ianminoso/a-textual-analysis-of-harry-potter-by-an-amateur-data-analyst-6f02c09617e0). But reading this article was my first “formal” introduction to the subject and the technical terms associated with it. I agree with Binder that Topic Modeling (for example with MALLET) may not actually give an accurate representation of what a text is discussing, but I do think it’s useful for getting a very general sense of what a text is discussing. However, with poetry and other texts that depend on metaphor, the so called “bag of words” approach, which doesn’t even account for the relative position of words, let alone their syntax, may do the text a huge disservice. Moreover, it’s important to keep in mind that the nature of English grammar is such that improper syntax interpretation (or the lack thereof) can result in a meaning opposite to the intended one even when a text doesn’t contain metaphors. For example, the difference in meaning between “let’s eat, grampa” and “let’s eat grampa” is in a sense, a matter of life or death, but the only difference between the two sentences is the crucial placement of a comma. I think this emphasizes the importance of accounting for syntax and semantics in natural language processing algorithms, a project which many labs here at MIT (especially in the course 9 cognition labs) are working on.