Assignment 8 Commentaries and Mini-Project
Alien Reading: Text Mining, Language Standardization, and the Humanities
This article takes a proceed with caution approach to using text mining software in the humanities. It points out that these softwares are a best fit model that was trained to scientific and news focused texts and fails on humanistic test data. Binder notes that many ignore the fact that these text mining tools can’t be separated from the circumstances of their creation when they are applied to non-standardized works. For example, when a software looks for topics and associates texts with topics, it ignores nuances associated with creative writing. These bag-of-words type assumptions, which ignore syntax, ignore the human aspect of writing and limit interpretability of humanities texts that prioritize aesthetics over pure information. I thought the conversation about how these models need to avoid overfitting but have accurate results was interesting given that there isn’t any ideal way to interpret certain works. Clearly the critical approach described is necessary, but I wonder if there are any other types of statistical method that would work other than this (seemingly) tree based model.
Text as Data: A Modest Proposal, JSTOR TopicGraph
I tried out the TopicGraph for the mini project and decided to analyze A Modest Proposal by Jonathan Swift because I was interested in how the tool would work with a piece of satire, especially given its goals of enabling you to quickly understand the topics covering in texts
The tool is straightforward, allowing users to either choose from a selection of documents or upload a pdf version of their own. After uploading, it pretty quickly sends an email notifying you that the document is ready. You’re linked to an easy to read and well-structured website which lists phrases that occurred most frequently. It also allows you to go to individual pages, which I found particularly useful.
Some of the words/word strings it picked up were “Irish Nationalism” “Meats” “Christian History” and “Body Fat” which are definitely relevant but on their own don’t contribute much to understanding the text itself.
Ultimately, this tool isn’t one that can be used for a deep and involved analysis of a text, since it misses important contextual clues and priorities word count instead. That being said, it is a nice way to get a vague idea of what is going on in the text, and how the author is trying to get their point across.