Assignment 18- data mining- Sally Chen
Data mining is the process of discovering patterns in data, and the main approaches include clustering, association, word cloud, etc. These methods are based on the results of human cognitive science research that has been conducted, and common human cognitive strategies are simulated by computer algorithms. I think the similarities between the algorithms and human cognitive strategies are the reason they stand out as the most commonly used methods- because the algorithms are more likely to come out with “human-like” insights and decisions. However, the results of data mining are necessarily different from the results of human thinking, especially when academia has not yet fully researched the psychological mechanisms of more complex psychological processes, e.g. common sense and creativity. One of the major advantages of data mining is that it helps to answer questions we did not know to ask, especially to obtain some insights that may be counter-intuitive or counter-conventional. In addition, the interpretation of the results and the decision-making process still require human involvement. For example, there are many parts of the text analysis process for large samples that need to be done manually, and these operations may indirectly affect the results of data mining. The biggest problem of data mining in humanities studies is that there is a fundamental difference between computer programs and human cognitive processes. Although computer programs are designed based on human cognitive strategies, they do not fully simulate human thought processes, which may lead to biased results. The most typical problem is that human analysis of “topics” is not based on the frequency analysis and combination of individual words, but also on the context in which the words occur, which is not taken into account in the frequency calculation by the computer. Other challenges include the accessibility of certain databases and the non-transparency of data mining algorithms. Some databases are not always accessible to scholars, so data mining analysis may be potentially biased within the scope of available data.