Comment on Six Provocations for Big Data-Elva Si
I mostly agree with Danah Boyd’s arguments for big data. In particular, I resonate a lot with the second argument: Claims to Objectivity and Accuracy are Misleading and the fifth argument: Just Because it is Accessible Doesn’t Make it Ethical.
There remains a mistaken belief that qualitative researchers are in the business of interpreting stories and quantitative researchers are in the business produce facts.
Quantitative research methods are so often deemed as “facts.” With more and more big data joining the party, the division between these two scientific methods may become wider. However, as Boyd mentioned, all researchers are interpreters of data. A model may be mathematically sound, an experiment may seem valid, but as soon as a researcher seeks to understand what it means, the process of interpretation has begun. We need to remind ourselves that data is primarily human-made. “Data-driven” doesn’t mean “unmistakably true.” We should get rid of the mindset of absolute control and universal truth of big data and, instead, embrace an understanding that big data is another form of subjectivity.
With Big Data emerging as a research field, little is understood about the ethical implications of the research being done. Should someone be included as a part of a large aggregate of data? What if someone’s ‘public’ blog post is taken out of context and analyzed in a way that the author never imagined?
While I am doing the database research for the final project, I was stunned by this argument. Many of my current project interests lay in areas like entertainment topics like TV production, social media (which bring up another question on data accessibility), which stopped me from keeping ethical issues in mind. But Boyd’s perspective brought up an important point that just because content is publicly accessible doesn’t mean that it was meant to be consumed by just anyone. There are definitely some csv. files that documents data of those vulnerable ones, those who don’t want to be included in the public data. We need to constantly ask ourselves about the ethics of their data collection, analysis, and publication.