Alien Reading: Text Mining, Language Standardization, and the Humanities

Text Mining Mindfulness

Much of the field of natural language processing is focused on taking the “natural” representation of information and converting it into structured data, which computer scientists are more comfortable working with. Two summers ago, I worked at a social robotics startup called Jibo and spent a good portion of my summer working on an open-ended conversation handler based on work done with IBM Watson.

In order to do this, we ran a classic approach of using natural language understanding (NLU) to get structured data, used a dialog manager to build a structured output, and then used a natural language generator (NLG) to randomly convert the structured data into more natural sounding sentences. While this abstraction works really well for computer scientists (the intermediate structured step allows for easy storage and a model of the brain that we can understand), it definitely does not pass the Turing test (it is easy to tell that you aren’t having a conversation with a real human).

In this sense, I agree with Binder’s analysis that our current models of language do not address certain human-qualities of language, and I think that this is not a fundamental shortcoming of “topic modeling” or other approaches that will not be fixed in the future. We are still in the “early adopter” phase of widespread NLP, just like how lots of people doubted that people could communicate effectively without being face-to-face before technologies began enabling it the “right” way.

Alien Reading Approach

I only have one quick point here about the approach Binder suggests. Essentially, he just wants to make sure we approach these newly-enabled humanistic problems in a multidisciplinary fashion, and I just wanted to say that we want to make sure to preserve the processes that we’ve used for technology-free learning. We culturally forget how people “used to do things” rather quickly, and we just need to make sure to preserve it. This is just in case we are losing some quality of analysis that we won’t notice is missing in the short-term, but could be useful farther down the road. Analogy: People who forget to ride a bike when they learn how to drive, since they don’t see any benefits to biking short-term.