Comment on Big? Smart? Clean? Messy? Data in the Humanities- Elva Si
Christof Schöch: Big? Smart? Clean? Messy? Data in the Humanities
This is a very powerful article as it introduces us to both smart and big data, which seemingly have opposing characteristics, advantages, and limitations, and proposes a new future concept that could be empowering and enlightening.
I used to work a lot with smart data in my former company when I built and constantly revised the metadata base for character representations in our digital textbooks. I experienced something similar as Schöch mentioned in his article. I added descriptions to hundreds of texts, added tags,revising the tagging systems and modifying the tags regularly. However, several questions remained: how to come up with a tagging system at the first place? How to keep the tagging system consistent and comprehensive as it would cover hundreds or thousands of humanistic texts? How could others who later use the tags understand the original intentions of creating such tags?
Another point that I would echo with Schöch is the challenge of generating smart data. It is time-consuming and does not scale well. It can only be partially automated, but ultimately smart data depends on manual work by real people. Classifying descriptions in their context according to formal, semantic and narratologic categories is not something computers can do just yet. I like the idea of machine learning yet still wonder if and how it could come into reality.
According to Vannevar Bush, “it may be well to mention one such possibility, not to prophesy but merely to suggest, for prophecy based on extension of the known has substance, while prophecy founded on the unknown is only a doubly involved guess.” As people have already extensively investigated smart data and big data, it is worth imagining the possibility of smart big data. I hope to one day see a sufficient amount of data to enable quantitative methods of inquiry and a level of precision to enable scholars’ relevant features of humanistic objects of inquiry.