Wednesday, November 12, 2008

Creating Accurate Venn Diagrams in Excel, Part 2

This post is an extention of an earlier post. If you are interested in this, you may be interested in my book Data Analysis Using SQL and Excel.

This post is about creating a Venn diagram using two circles. A Venn diagram is used to explain data such as:
  • Group A has 81 members.
  • Group B has 25 members.
  • There are 15 members in both groups A and B.
The above data is shown as a Venn diagram as:


Unfortunately, creating a simple Venn diagram is not built into Excel, so we need to create one manually. This is another example that shows off the power of Excel charting to do unexpected things.

Specifically, creating the above diagram requires the following capabilities:
  1. We need to draw a circle with a given radius and a center at any point.
  2. We need to fill in the circle with appropriate shading.
  3. We need to calculate the appropriate centers and radii given data.
  4. We need to annotate the chart with text.
Each of these are explained below. All of the charts and formulas are available in the accompanying Excel file.
Drawing a Circle Using Scatter Plots

To create the circle, we start with a bunch of points, that when connected with smoothed lines will look like a circle. To get the points, we'll create a table with values from 0 to 360 degrees, and borrow some formulas from trigonometry. These say:

  • X = radius*sin() + X-offset
  • Y = radius*cos() + Y-offset
The only slight complication is that the functions SIN() and COS() take their arguments in something called radian rather than degrees. This makes the formula look like:
  • X = radius*sin(*2*PI/360) + X-offset
  • Y = radius*cos(*2*PI/360) + Y-offset
The following shows the formulas:


Degrees X-Value Y-Value

0 =$E$4+SIN(2*PI()*B11/360)*$D$4 =$F$4+COS(2*PI()*B11/360)*$D$4

5 =$E$4+SIN(2*PI()*B12/360)*$D$4 =$F$4+COS(2*PI()*B12/360)*$D$4

10 =$E$4+SIN(2*PI()*B13/360)*$D$4 =$F$4+COS(2*PI()*B13/360)*$D$4

15 =$E$4+SIN(2*PI()*B14/360)*$D$4 =$F$4+COS(2*PI()*B14/360)*$D$4

20 =$E$4+SIN(2*PI()*B15/360)*$D$4 =$F$4+COS(2*PI()*B15/360)*$D$4

25 =$E$4+SIN(2*PI()*B16/360)*$D$4 =$F$4+COS(2*PI()*B16/360)*$D$4

30 =$E$4+SIN(2*PI()*B17/360)*$D$4 =$F$4+COS(2*PI()*B17/360)*$D$4

Where E4 contains the X-offset; F4 contains the Y-offset; and D4 contains the radius.

The degree values need to extend all the way to 360 to get a full circle, which can then be plotted as a scatter plot. When choosing which variety of the scatter plot, choose the option of points connected with smoothed lines.

The following chart shows the resulting circle with the points highlighted, along with axis labels and grid lines (which should be removed before creating the final version):


Creating a second circle is as easy as creating one, by just adding a second set of series onto the chart.


Filling in the Circle with Appropriate Shading

Unfortunately, to Excel, the circle is really just a collection of points, and we cannot fill it with shading. However, with a clever idea of using error bars, we can put in a pattern, such as:



The idea is to create X error bars for horizontal lines and Y error bars for vertical lines. To do this. right click on the circle and choose "Format Data Series". Then go to the "X Error Bars" or "Y Error Bars" tab (whichever is appropriate). Put 101 in the "Percent" box.

This adds the error bars. To format then, double click on one of them. You can set the color for them and also remove the little line at the edge.

You will notice that these bars are not evenly spaced. The spacing is related to the degrees. With the proper choice of degrees, the points would be evenly spaced. However, I do not mind the uneven spacing, and have not bothered to figure out a better set of points for even spacing.


Calculating Where the Circles Should Be

Given the area of a circle, calculating the radius is a simple matter of reversing the area formula. So, we have:
  • radius = SQRT(area/PI())
So, getting the radii for the two circles is easy. The questions is: where should the second circle be place to get the right overlap?

Unfortunately, there is no easy solution. First, we have to apply some complicated arithmetic to calculate the overlap between two circles, given a width of the overlap. Then we have to find the overlap that gives the correct area.

The first part is solved by finding the area of overlap between two circules, at a site such as Wolfram Math World.

The second is solved by using the "Goal Seek" functionality under the tools bar. We simple set up a worksheet that calculates the area of the overlap, given the width of the overlap and the two radii. One of the cells has the difference between this value and the area that we want. We then use Goal Seek to set this value to 0.

Annotating the Chart with Text
The final step is annotating the chart with text, such as "A Only: 65". First, we put this string in a cell, using a formula such as:
  • ="A Only: "&C4-C6
Then, we inlcude this text in the chart by selecting the chart, and typing "=" and followed by the cell address (or using the mouse).
Publish Post

In the end, we are able to create an accurate Venn diagram with two circles, of any size and overlap.



venn-20080112.xls

9 comments:

  1. I've found several uses for this just in the past week. Thank you!

    ReplyDelete
  2. 4.36 is the distance between the two centers of the circles. How do you calculate it if you know the radii and the area that the two circles overlap?
    Thanks,
    Jose

    ReplyDelete
  3. Thanks very much!

    ReplyDelete
  4. Thank you just what I needed. Can you explain how I should go about creating a 3 circle Venn diagram?

    Many thanks
    Caroline

    ReplyDelete
  5. Thank you just what I needed. Can you explain how I should go about creating a 3 circle Venn diagram?

    Many thanks
    Caroline

    ReplyDelete
  6. THAAAAAAANK YOOUU very much! I would never have managed to calculate this on my own

    ReplyDelete
  7. Just in case you ever have the need for this. All you would have to do is open the workbook venn-20080112.xlsm and then go to worksheet Circle Venn, fill in the values for C4, C5 and C6. Then run Goal Seek for K9.

    Turns out to be handy for some things. I use it as a quick AND ACCURATE visual to represent coverage of requirements in a design document.

    Thanks to the author for a useful tool.

    ReplyDelete
    Replies
    1. Please explain Goal Seek
      Set cell: K9
      To Value:???
      By Changing cell:???

      thanks

      Delete
  8. To add the pattern fill, I find it much easier to put a text box inside the square/circle and then fill the text box.

    ReplyDelete

Your comment will appear when it has been reviewed by the moderators.