Do numbers speak for themselves?

Boyd and Crawford criticize the idea that numbers speak for themselves, but that’s a pretty over-arching statement. The question of whether or not numbers speak for themself doesn’t have a binary answer. Boyd and Crawford are correct in identifying that numbers do need interpretation, but even without human-driven data-wrangling, machine learning techniques have provided some incredible insights into the details and predicitions that can be made given enough data. The canonical Target example (http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html) reveals how computers processing “Big Data” can reveal trends without human training or intrepretation. The Target example is interesting because it also ties into the question of “just because it’s accessible, doesn’t make it ethical.” Our shopping habits and behaviors are public to the stores we shop at, and people in general appreciate relevant advertisements, but where is the line?

Is Big Data self-explanatory?

On the question, I am inclined to agree with Boyd and Crawford. When unstructured Big Data is computationaly analyzed, it is necessary for human to provide some constraint and strucutre to the massive quantites of information. Any form of filtering is subjective, and even data “cleaning” introduces inherent biases. I worked on a project related to Twitter tweet analysis, and by the nature of tweets, the messages are messy and unstructured. In order to gain any meaningful insight I needed to clean the data, for example, eliminating “stop words”, and any words I included in this list were in one way or another subjective. This form of analysis further touches on Boyd and Crawford’s critique on limited access to Big Data creating a digital divide. Some companies have more access to data than others, and it is extremely valid to point out that research done using better datasets and more comprehensive data has an advantage.

Conclusion

Although I disagreed with many of the points brought up in the reading, the general message is commendable. More and more research is moving towards computational methods of analysis on “Big Data”, and researchers need to do a more thorough job of understanding how to perform this analysis, what the limitations are, associated ethical concerns, and what biases and values might be propogated through its exapnding use.