Alien Reading: Text Mining, Language Standardization, and the Humanities

In this article, Binder introduced LDA,a useful tool applying text mining technology for topic finding and stated the fact that LDA performed perticularly better when it was fed with scientific texts. The possible reason for this phenomenon, according to Binder is that the vocabularies of scientific texts correlate with their topics in a more uniform fashion than that of a poem or a prose, which accords with his argument that “there’s a congruity between text mining and the language standardization efforts”, both of these methods “tend to reinforce the ‘literal’ conceptions of language and meaning” and marginalize the “nonstandard linguistic conventions and modes of expression”. He further analysed the statistic nature of the tool that output only meanings with largest probabilities and ignored other meanings that valuable but with smaller probabilities. This marks the inherent limitation of topic modeling tool and other statistic analysis based tools in dealing with the existence of “non-literal” language which consists a significent part of a certain humanistic database.”If we are to adopt text-mining tools in humanistic research, we will need to take account of the assumptions they make about language and how those assumptions could serve ideological interests”.

It is interesting that Binder associated the newly emerged text mining technology with the language standardization effort dated back several centuries ago. This historical insight has profoundly demonstrated the efforts people are making to tackle information abstraction from large pools of database.The technology of text mining is still in its early year, despite the limitation, it works well in analysing and associating numerous amount of literature or achive, where the overall trend and feature of the database comes more important than details.

Enter text in Markdown. Use the toolbar above, or click the ? button for formatting help.