A “Data-Driven Game” from the Comédie-Française Registers Project

Harihar Subramanyam, Yang Chen, James Hobin

Our approach to this project is unique in two ways. First, we will target a generalist audience, rather than historians or other experts. Our goal is to engage the audience and encourage them to explore the overall trends of the data without showing them any statistics. We aim to answer the questions: what was it like to manage the theaters? and why did the theater have the shows and locations that it did? We will achieve this by building a game in which the player manages a set of theaters around France. The mechanics of the game will be governed by statistics we have computed on the Comédie-Française Registers Project (CFRP).

First, we must digitize the scanned registers. This poses two challenges. First, the registers are images, not text. Second, the registers are in French. Our approach to digitization is the following. Since the registers tend to follow a specific format, we know which section of the register contains the prices, number of tickets, sold, play name, etc. We can identify these sections and tag them accordingly, which avoids the need for translating from French. Next, we will write some basic software to examine these sections and perform OCR (using OpenCV or Tesseract OCR). Anything that fails to be digitized will either be done manually (if there are only a few registers) or using crowdsourced techniques like CAPTCHA or Amazon Turk (if there are many registers). Anything that fails to be translated will be translated using a translation API (ex. Google, Bing) or by an interpreter. Since the dataset is small (conservative estimate 100 entries/register * 500 registers/year * 100 years * 100 bytes/entry = 500 MB), we can store it in a document database (ex. MongoDB - well suited for semi-structured data) and replicate it on a few machines.

After digitization, we’ll compute our statistics of interest. There are many statistics and visualizations we would compute, but the key ones are:

  • Performance (attendees, revenue) vs. time to identify how each play/theater performed over the years
  • Clustering to identify which play was associated with which distribution of ticket sales. This could help tell us if a play appealed to a particular demographic (ex. unusually high sales of expensive seats for a given play may indicate that the aristocracy liked it).
  • Map to visualize how plays performed in different regions of France (may require some historical research to see where the theater)
  • Visualization to highlight theater to see occupancy of different seat types during different plays We will make these publicly available, but their main goal is to guide us in building our game.

Our final task is to build our game. The premise is this: You are the manager of the Comédie-Française and you want to maximize the number of attendees and revenue generated by your theaters over your career. The gameplay is as follows. The player is presented with a map of France and given information about the plays and audience demographics. Every turn (one turn for every year), they make a set of moves. Moves include building a new theater (and determining the number and types of seats), playing a set of plays at a given theater, closing a theater, hiring/firing actors, and training actors for a new play. After they make their moves, a year in the game proceeds and the players sees an animation of how their theaters are doing (which plays are popular? which locations are succeeding? what are the demographics for the audience?). Using the profit generated from the year, the player can fund their moves for the next year (the player begins with an initial amount of money). The simulation of the year (ex. popularity of plays, success of locations, audience demographic) will be built heavily on the insights we made into our initial statistics and visualizations. We will include demo gameplay based on the actual history of the Comédie-Française.

In playing this game, we utilize the player’s desire to win in order to teach them. In order to win, they will learn which plays were popular with which demographics, which locations were good spots for the theater, what was the “lifecycle of play” over the years, and more. More importantly than the “what”, they will learn WHY the life of the theater was what it was.