A problem that frequently comes up when trying to visualize data is the need to examine more than a couple of dimensions at once. Often times I find myself trying to undertand the relationship between 3 or more variables, and just wishing there were _good_ ways to really see the underlying relationships. Of course color and size variations for things like scatterplots can come close, but the eye isn’t as good at understanding those patterns as it is understanding spatial patterns. So unless there is a clear trend and relatively sparse data, often including size or color variations doesn’t add much to the chart.
There is however one way to visualize a specific type of three dimensional data called a ternary plot. In fact we employee these in our upcoming RiskRecon Risk Surface Report, specifically Figure 13 which appears below.
We are going to build up to fulling understanding the chart above, so don’t spend too much time pouring over it yet. Our goal is to understand the distribution of value of internet facing hosts an organization has. RiskRecon categorizes hosts into 3 different value categories Low, Medium and High, and so each organization has some fraction of each that we can calculate. This leaves us with three variables to try to visualize. Let’s start by making a scatterplot and use color as our 3rd dimension.
Each organization here is a point, its percentage of low valued hosts is on the horizontal axis, it’s percentage of medium value on the vertical, and it is colored by the percentage of high value hosts.
1. There are some interesting patterns within the points. These emerge because in our calculation of fractions like 1/2 and 1/3 that emerge because of how proportions of calculated out of a total number.
1. No organization lies above the diagonal of the line bisected from (0,1) to (1,0). This boundary represents organizations have that have all Low or Medium value hosts. Anything above that would imply that the sum of a firm’s value proportions exceeds 100%, something that isn’t possible.
1. Finally, we note that the color smoothly follows from dark purple (0% High value hosts) to yellow moving away from that boundary line.
Unfortunately because of the density of points there isn’t much we can say about the distribution of host value here. Perhaps it’s a bit more dense towards the lower right (many low value hosts), but it’s difficult to tell. What we really need is a three dimensional plot of this data (courtesy of plotly).