Webscraping with Octoparse

Webscraping Activity

For this activity, I used Octoparse to srape data off of of the MIT Museum’s website. One helpful feature was that Octoparse will auto-detect the webpage’s data for you and then create a workflow to extract more data based on this. However, I found it difficult to extract more specific data because when I would click on a piece in a collection, I continued to get the error “Application error: a client-side exception has occurred”, which I was not sure about. One other difficulty I had with Octoparse was that I could not find a back button when I was in browse mode, meaning I had to redo my set of tasks or restart again whenever I got an error message. Although Octoparse and similar services seems very helpful in webscraping, collecting such metadata seems like a very difficult process.