Karrie & Ali - HATHITrust

We changed the focus of our project – we’re exploring the limitations of HATHITrust, a well-known content repository, as a basis for DH projects (rather than our earlier goal of using a HATHITrust collection to create a dataset that others could explore).

Problems encountered: There are unexpected restrictions on the data (licensing, that trumps copyright and takes large swaths of data out of reach for public scholarship), which means we will have even less than we originally imagined. The documentation for creating worksets is so poor, we had a hard time recreating work we already did. The interface is not helpful for knowing what you are putting into your workset, which makes for tedious and brittle workflows.

New Concept: Document the barriers for doing simple things with a repository of content that is promoted to scholars as a useful resource. Document the barriers by trying to accomplish a series of relatively simple tasks.

Next Steps: We need to see if we can actually get a dataset that has all the pices we need, possibly by Friday. Karrie will document all the barriers already encountered. Ali will work on creating a serch tool and some visualizations and document how working with repository content was difficult. Look for recommendations for massaging the data (text markup?) that might allow for more user-friendly ways to explore and visualize the content.

Project ppt