Assignment 10- web scraping- Sally Chen
I use parsehub for web scraping and from my experience, it is a very easy-to-use and effective tool. The first-time tutorials provide detailed steps so that beginners can follow the process. I chose the Institute of Contemporary Art Boston collection(https://www.icaboston.org/collection) to work with. On the browse page, each work displays basic data including image, name, and artist name. When clicked, detailed information about the work shows up, including the year, work type, and size. I followed the tutorial to try to get as much data as possible, and the results seemed to be satisfactory. Since the structure of the web page is relatively simple and the relative positions of the various types of metadata are almost fixed, further processing of the data will be easier. One of the advantages of web scraping is that it can quickly retrieve the information contained in the web pages and assemble them into an excel archive, which is very helpful for some preliminary and basic analysis. The collection of social media information through web scraping is a good example. However, a major limitation is that this method can only extract information on the surface of the web, mainly in the form of text and images. Therefore, if more metadata or information about the web page itself is wanted, web scraping is of limited help.