Assignment 8 Reading & Project
Binder - “Alien Reading: Text Mining, Language Standardization, and the Humanities”
I thought this reading was a very necessary examination of the text mining and other analyses being done on large quantities of text, and the implications that can arise by relying on them alone for analyses previously done by humans and their very human paradigms and language comprehension skills. One of the various approaches I thought was very interesting was when Lisa Marie Rhody uses LDA to produce topic models for poetry, she can’t use the resulting models in the same way that someone using LDA for more scientific documents might. However, that isn’t to say that the information is not useful. Rather than accurately determining what the poems are about, the topic model reveals how the poem is structured and possible traditions emerge. I thought this was very cool, and a good nod to the idea of using these models to “suggest” or “reveal” something about the text, rather than providing absolute analysis. I think this mindset is one to hold when using various computational text analysis methods. While continuing to read on, I wonder if there are text analysis methods that more successfully understand more complex forms of language, such as poetry and fiction. As I read on, I realized that Bogost agrees with me, as the text states “this approach would involve encountering text mining as an alien form of reading—alien both in the fact that it emerged from a discipline with very different concerns from our own and the fact that it is performed by a machine, the sort of nonhuman agent that Ian Bogost has sought to understand with his idea of alien phenomenology”. I also was pulled into Sturm and Turner’s idea of thinking of “computation as a ‘a caricature of thinking’”, and would definitely want to apply that ideology to other computational methods, like neural nets. But as this article concludes, we can’t use these caricatures for real understanding unless we also situate them with human research and analysis that looks at historical backgrounds of the text as well.
Mini Project: Text Mining & NLP
For this mini project, I decided to use Voyant to analyze the text I chose, mostly because it seemed more complex than the other tool I was exploring, Topic Graph. The text I chose to analyze was Mozart: The Man and the Artist, as Revealed in his own Words, by Friedrich Kerst (translated by Henry Edward Krehbiel), obtained off of Project Gutenberg. When I was pulling the document from the site, I was wondering if I should remove the project gutenberg text before the actual book text, but decided against it. After doing my initial analysis however, I decided to go and remove the project gutenberg text, as well as the urls that would continue to appear throughout the rest of the document. This turned out to be a smart move, as it gave me much more accurate results. Voyant revealed that the document had 32,688 total words and 4,714 unique word forms. It also revealed that the most frequently used words in the corpus were “father, mozart, vienna, music, and opera”. This makes sense, and also displayed in the word cloud that Voyant generates. I think this definitely gives a basic understanding of what the text is about, which is Mozart, his life, and his music. However, I think this information could have just as easily been understood from just reading the title. I guess the frequent words list reveals more that Mozart is specifically a musician who lived in Vienna. It also revealed that Mozart talks a lot about his father, which was unexpected. The trends graph was also very revealing, as it showed various trends of the main themes throughout the document. For example, the obvious trend of mozart stayed rather consistent throughout the entire document, but opera is heavily discussed in the first third of the book. His discussion of vienna and his father continuously dip and rise in an upward motion through the novel. Because this is a musical-themed text, I think it would be very interesting to develop a specific machine learning technique for scraping the musical themes and definitions Mozart uses, and extract his “musical intuition” and inspirations from the text. This technique could be translated to other musical texts as well.