Reading Response to 6 Provocations

I am most taken with the author’s notes on the accessibility and equity of Big Data: “who gets access to it, how it is deployed, and to what ends,” which is contextualized by the fact that “an anthropologist working for Facebook or a sociologist working for Google will have access to data that the rest of the scholarly community will not.” This is complicated by the fact that “automated research changes the definition of knowledge, but nonetheless, data extraction, cleaning, and analysis tools must be made available to those outside of isolated academic and industry centers, especially if we wish our findings, our archives, and our work to represent broader cultural imperatives than our own. This is also, inherently, an issue of bias that reflects the methods applied within these contexts. As the author later states, “regardless of the size of a data set, it is subject to limitation and bias. Without those biases and limitations being understood and outlined, misinterpretation is the result.”

Joi Ito’s cited point that “Big Data is about exactly right now, with no historical context that is predictive” is also appreciated and noteworthy, especially as we continue to overvalue social media and other aggregate online data that only goes back, at most, several decades, with these data being filtered behind any number of black box processes that are not made clear to journalists, researchers, or community members who with to utilize these data.

I was particularly “excited” by the author’s claim that, “without taking into account the sample of a dataset, the size of the dataset is meaningless.” This is an enormous problem, frankly, in my own research group, where my colleagues use primarily large-scale social media datasets, but don’t take much care in how the data are collected, by whom, or even what keywords are used for text extraction (this last one, to me, seems particularly egregious). Our team also makes the “mistake” of comparing possible incomparable network data from different platforms. Then again we will always try to make connections where we can. But it is important to note the limitations of our approaches.

I was also in agreement with the author’s “claim” of the value of single-case design experiments: “research insights can be found at any level, including at very modest scales. In some cases, focusing just on a single individual can be extraordinarily valuable.” I’m interested in using potential single-case designs in my own research.

I also agree with the author’s later insight into how “the current ecosystem around Big Data creates a new kind of digital divide: the Big Data rich and the Big Data poor,” contextualized by Manovich’s writing of the three classes of people who access and utilize Big Data. MIT community members are particularly privileged to belong to the third class.