Text mining
Serving as a clarification of text mining for people new to the topic, ‘What is text mining’ provides her definitions to differentiate text mining with easy-to-understand analogies. To me this is something clear when you read it, but not that easy to come up with if you’re trying to articulate the concepts yourself. In ‘Untangling text data mining’ Mart Hearst further contrasts different applications of non-textual and textual related data processing tasks in terms of what kind of information is extracted at the end, where real text data mining lands within finding ‘nuggets’ with novel conclusions. It is a very nice conceptual model to start with, although I wonder if it’s too ideal to say only the result of having new findings constitutes real text mining: the attempt of doing so may not always end up in the result wanted. Should these attempts be rejected because of not having the right results? Some boundaries may be blurry. For example, before secondary patterns are discovered, standard classification and segmentation may need to be performed. And sometimes new information doesn’t implicate meaningful finding. Moving beyond the terminologies, these classifications may be better discussed on a case-by-case basis.
In ‘”Raw data” is an Oxymoron’, the final parts of introducing ‘dataveillence’ particularly interests me. It goes back to our previous readings and discussions about the monopoly of data access of big companies. The competition of data accumulation has already begun. I’ve seen the phrase ‘digital twin’ been used in many industries. For instance, in healthcare, it refers to building the digital version of every person, where simulation of body condition is possible for prevention and treatment of illnesses. While promising it may sound for more precision in treatment, whether our identities will be used in the wrong way when it’s in the wrong hand is another topic. And it’s probably just a matter of time before such thing happens.