In our last post, we shared a technique to make operational business metrics easier to analyze. As mentioned, plotting the expected value of a metric alongside the actual value of the metric can reveal anomalies in your data. In this post, we will continue to ask the question “Is everything healthy?” In our last post we looked at events that occurred millions of times per day. What if we wanted look at events that only occurred hundreds of times per day? In our case, this happens when we scope our analysis to a small segment of users. It’s as important for us to answer these questions of health for small subsets of users because they drive value to our clients.
In our first attempt, we plotted the sparse data directly (Fig A). We found these charts to be unusable for analysis. Fig. A shows a downward trend only if you know what you’re looking for (the patch of blue toward the end of the series). Fig. B plots the same data in a much clearer format: a daily cumulative view. In Fig. A, we draw a data point at the actual value for every ten minute sample. In Fig. B, we draw the running total per day – so each drawn data point is the count of all of the prior observed events for the day. The running total peaks ten minutes before midnight and resets to zero at midnight.
A common reaction to noisy charts is smoothing: applying a transformation that hides the peaks and valleys. Fig. C is derived from the same data and transformed with a one hour rolling average (each point is the average of the data within the hour). Smoothing works when the outliers are easy to distinguish – which isn’t true for this dataset. The trend is at the same point of time but just as difficult to find because of the otherwise spiky line.
The daily cumulative view is clearer for assessing the health of a metric with sparse data. First, it makes the gap between the observed and expected more prevalent so we can catch trends faster. Once a trend is found, we can estimate its size much faster. Another surprising but useful effect is: the value at the peak is is the total for that day! This is handy when you want to answer “how many clicks have we seen so far today?”
In analyzing operational business metrics, it’s imperative to answer questions of health as quickly as possible – if we can assess the health of our business in a quick glance, we can spend more of our time growing our accounts. It’s as important for us to understand data for small subsets of users as it is for large subsets. Enabling our Business Analysts to find trends and estimate the size of trends faster became simpler with the daily cumulative view.
Join the team turning terabytes of information into revenue! Check out our careers site.
Pratik Prasad is a software engineer at TellApart.