Compare Two Data Sets Using Box and Whisker Plots

3 teachers like this lesson
Print Lesson


SWBAT compare, analyze, and make inferences about two sets of data using a Box and Whisker Plot.

Big Idea

To Delve Deeper Into Statistics! Beyond Mean, Median, Mode, and Range by Self-assessing and critiquing their own work and their partner(s).

Warm up

15 minutes

The goal of this lesson is for students to compare and analyze two sets of data using Parallel Box and Whisker Plots.  The Warm Up is intended to take about 10 to 15 minutes for the students to write and self assess their work from a rubric that we create as a class based off of the Warm Up. Students use the rubric created from the Warm Up to help self-assess and critique their own work and their partner(s) in the next activity. This allows students to take ownership of their own learning.

There are 3 categories that I expect students to be able to identify, state answers, and provide reasoning(MP3).

The 3 categories are:

  1. Center of Data
  2. Dispersion
  3. Shape of the Data  

I have scaffold these skills in the tasks in previous lessons in this unit to help students develop the skills and tools necessary(MP5) to compare and analyze the data.  I also use the Warm Up to model this process again for them prior to working the activity today(MP4).  Again, which is to compare, analyze, and draw conclusions from two sets of data using box and whisker plots. 

When analyzing these two Box Plots, it is important to note to students that Parallel Box Plots are easy to compare because they are on the same number line.  The top Box Plot represents the water levels of the Arizona reservoirs and the bottom Box Plot represents Colorado.  

Both of these Box Plots show the same spread of the data if you look at the range.  However, when looking at the Interquartile Range, Colorado shows a lower Interquartile Range.  Arizona's is more spread out.  The Center of the Colorado Box Plot is the median, and it is also higher than Arizona's median.  Therefore, it looks as if Colorado reservoirs stay at consistently higher levels than Arizona.

I have provided a sample copy of the rubric.  The information may vary slightly from class to class as we create the Rubric. Again though, it is focused on the three categories of Center, Dispersion, and Shape of the data.  




Compare Two Data Sets

20 minutes

After creating the rubric with students using the Warm Up, I introduce the new Data Sets Activity. The focus of this activity is for students to be able to find the best representation for the center of the data, analyze how spread out the data is, and describe the shape of the data.  Students create a Box and Whisker Plot for each data set to provide a summary of each data set to draw the conclusions. Students use appropriate formulas and techniques to compare, analyze, and draw conclusions from the data of the revenue of the Harry Potter movies versus the Avengers movies in Box Offices in the U.S.  I begin introducing the activity by playing a short clip of two popular movies from each of the series.  One has had 8 sequels and the other has had 9 sequels. Here is a short clip from each of the movies: 

I only show clips that are a couple of minutes long because I want it to be a Hook into the lesson, but not the focus.  I use it to gain students' interest.

I instruct students to draw Parallel Box Plots on the same number line on Poster Paper.  I clarify that is to draw a Box Plot to represent each set of data.  One for the money made on the Harry Potter Movies, and the other for the Avengers Movies.  However, the Box Plots should be Parallel Box Plots, on the same number line, so they will be easy to compare.  Students should also state their driving question, and show all of their work and analysis on the Poster.  

Students struggled with creating their own scale for a number line that would represent both movies.  I had to questions students repeatedly about what to look for to determine the scale.  For example:

What was the lowest number in both data sets?

What was the highest number in both data sets?  

What interval in between numbers would be a good representations for both data sets?


Even after stressing to students to create Parallel Box Plots on the same number line, some students still created two number lines with two different Box Plots.  These students had a more difficult time analyzing the data.  The driving question that students were trying to answer was, "Which set of movies were more popular based on the money that was made?"

Here are some sample responses from two different pairs of students.

1.  The first pair of students drew the Parallel Box Plot correctly.

a.  They compared the Centers of 294 for Harry Potter and 206 for Avengers using the Medians

b.  They stated that the money made on Harry Potter movies was closer together and not spread out as far as the Avengers.  They justified their answer by showing the range of Harry Potter Movies to be 131 million to the Avengers range of 493 million.

c.  They stated the shape for both data sets as right skewed.

d.  Their conclusion to the driving question was that the Harry Potter's Movies consistently made more money even though one of the Avengers Movies had a maximum revenue.


2.  The second pair of students did not draw the Parallel Box Plots correctly.  They drew two different Box Plots on different number lines that were scaled differently.  This made it difficult to compare.

a.  This pair of Students had the same numbers for Center and stated that they were both right skewed.  So the statistics were the same.

b.  However, because the scales were different these Box Plots looked close to the same.  

c.  Their Conclusion to the driving question was that the Avengers Movies were more Popular because the Avengers Box Plot had the highest maximum

The first pair of students were more accurate in their conclusions.  The second pair of students could have stated that at least one of the Avengers Movies were more popular due to it having the maximum value.  However, this did not provide an overall description of the data based on Center, Spread, and Shape.







Exit Ticket

15 minutes

I use the Exit slip at the end of this lesson as an individual formative assessment.  I am checking for each students ability to:

  • Create a Box and Whisker Plot from the Data
  • Find the best representation of the Center of the Data
  • Find the Dispersion of the Data
  • Find the Shape of the Data

I am assessing if the goal for today has been met.  Again, is the student able to compare the two data sets by analyzing each of the above components. Then, use that analysis if given in context, to make inferences or draw conclusions to answer the driving question.  I will be checking the posters in the class activity for the students' ability to draw conclusions since this data set was not given in context.  If time becomes an issue, I will assign the Exit Slip as homework.