Octoparse data scraping
Octoparse data scraping
I used the Octoparse software to derive data about Dutch Art from the Harvard Art Museum Collection website. It’s efficient to sort out art collections with detailed information related to each item’s title, image URL, artwork, and author (https://harvardartmuseums.org/collections?q=dutch+art). The data could also be downloaded in various formats: EXCL, CSV, and JSON. However, it only filters limited data that appeared on the page, which is restricted to 48 items per page. As the user needs to click on ‘load more to see more collections related to the search, the Octoparse software doesn’t have the function to allow users to click while scraping the data. Furthermore, there’s also limited input for each URL per line.