Thoughts on “Six Provocations for Big Data”
2. Claims to Objectivity and Accuracy are Misleading
I mostly agree with this claim. Often big data is made out to be almost a magical source of information and analysis, as if it provides the truth simply because of its sheer quantity. However, like the paper says, “a dataset may have many millions of pieces of data, but this does not mean it is random or representative.” In other words, just because we look at a large dataset, this large sample could still inherently be biased toward a certain subset of the population (such as in the case of public tweets or tweets that have been filtered). The paper also advises that “researchers must be able to account for the biases in their interpretation of the data.” I do think that big data is still a very useful tool as long as those who use it are aware of the assumptions and implications of the biases in their analyses.
6. Limited Access to Big Data Creates New Digital Divides
I found this point interesting as it is something I’ve never consciously thought about before. While I don’t necessarily disagree with the issues that are raised regarding the digital divides, I wonder why this is being highlighted especially for big data, when it seems like it could be applied to almost any kind of specific technical tool or methodology. It is true that there is “unevenness in the system” as the author states - some people have more access to data than others, and “new hierarchies around ‘who can read the numbers’”. But the same can be said for essentially all specialized types of analysis. What about statisticians who have been trained how to use R and SQL to run very powerful queries, or biologists who have access to the most cutting-edge equipment and technologies? The fact that not everyone has complete access to “all big data” does not imply that big data is not valid or useful. I do commend the author on bringing up this point, however, because it is something that I have overlooked, and perhaps we should do more to ensure that more people at least have the technical skills to do computational analyses on datasets, regardless of their career.