“It’s all fine and good to see an anomaly when it is one customer for one material. However, what if you want to see not only that, but also anomalies across all customers. For instance, what if you want to see when a drug, such as fentanyl has an anomalous ordering pattern across all customers? Or what if you want to see that same metric across a subset of customers? Or…what if you want to see anomalous ordering patterns from the big three distributors (McKesson, Cardinal and AmerisourceBergen) for a drug classification such as Opioids?” My critical co-worker asked me after reading the Spooky Orders blog. “Well,” I responded, “then what you need is a dashboard and the new R package anomalize released in April by Business Science.” He shrugged his shoulders…I think he really just wanted to bash my little blog.
It is truly great when the interests of both the client and the data science community align. The anomalize package was created by Business Science for a client and that client had the rare disposition of allowing others to benefit from this work. I have mixed feelings about anomalilze. It is a fantastic package, yet, it is so easy to use and deploy that one might do so without even understanding what is happening under the hood. Basically…it’s too easy. It is the data scientist’s responsibility to truly understand the packages, libraries and the APIs they employ. Don’t underestimate this challenge. When you have piles of work and something just works well out-of-the-box, the temptation is to use it and move on. I’m certainly not innocent of this transgression.
I’d love nothing more than to dive deep into this package and explain all that I understand. However, I’d not do as well as Matt Dancho and this article. Take a side journey and read this if you want to look under the hood (and you should).
In short, the anomalize package is a scalable adaptation of the Twitter AnomalyDetection package. That’s what was used in the original Spooky Orders blog so it fits right into the progression of this theme. In this blog, I’ll take the same data I used in that blog, use the new anomalize R package and create a dashboard of it in PowerBI. It’ll be fun. Let’s go!
First, we start with a simple csv comprised of Material, Order Date, Customer and Quantity. If we wanted to also do anomaly detection based on Material Classification and/or Customer Group (distributors or direct customers) we would have that in the csv as well. However, we’ll leave that out for our proof of concept.
To incorporate that into PowerBI is simple. Just upload the data. Start PowerBI Desktop and click on the Get Data button.
Select the Text/CSV option and then click on the Connect button.
Navigate to where you have your file, select it and then hit the Open button.
PowerBI will then ask if you want to load or edit the data. Because our CSV is so simple, there is no editing necessary.
We’ll just select the Load button.
On the PowerBI canvas we will start by adding a couple of slicers. Select the Slicer button. Then select the field you want to filter/slice by. We will start with Material.
Repeat the same thing and add a filter/slicer for Customer.
So far everything has been mundane. Now though, we get to the fun stuff. Select the R button from the visualizations to add R to our PowerBI report.
There is a panel at the bottom of the page into which you can put your code.
Now select the fields that will be used to detect anomalies. For us, that is Quantity and Date.
Note: Make sure your date is of type date and not date hierarchy.
Your selections will show up in the R script editor.
Now you can put in your code. My code is very simple, I’ll give a summary of it here but remember a more detailed description is in the link above by Matt Dancho.
We call two libraries. Of course, the anomalize package as well as the tidyverse package because we need our dataframe to be of type ‘tibble’. Then the simple process to visualize anomalies is to time_decompose(), anomalize(), time_recompose() and finally plot_anomalies().
library(anomalize) library(tidyverse) ord <- dataset ord$Date <- as.Date(ord$Date) ord_ts <- as.tibble(ord) ord_ts %>% time_decompose(Quantity) %>% anomalize(remainder) %>% time_recompose() %>% plot_anomalies(time_recomposed = TRUE, ncol=3, alpha_dots = 0.5) + ggtitle("Spooky Orders: Wooooooo")
Note: it is the job of time_recomposed = TRUE to set the lower and upper bounds. For this example, though, a lower bounds really isn’t necessary but we’ll keep it for illustrative purposes.
Repeat the same process and add another R section. This one will be a table of the anomalies. The only difference between this code and the one above is that we do not plot_anomalies(), instead we print a table using the library(gridExtra).
library(anomalize) library(tidyverse) ord <- dataset ord$Date <- as.Date(ord$Date) ord_ts <- as.tibble(ord) anomalies <- ord_ts %>% time_decompose(Quantity) %>% anomalize(remainder) %>% time_recompose() %>% filter(anomaly == 'Yes') library(gridExtra) tt <- ttheme_default(colhead=list(fg_params = list(parse=TRUE))) grid.table(anomalies, theme=tt)
Finally, we add yet one more R section. This one will break out the various pieces of the anomaly detection and allow us to visualize what is happening.
library(anomalize) library(tidyverse) ord <- dataset ord$Date <- as.Date(ord$Date) ord_ts <- as.tibble(ord) ord_ts %>% time_decompose(Quantity, method="stl", frequency = "auto", trend = "auto") %>% anomalize(remainder, method = "gesd", alpha= 0.05, max_anoms = 0.2) %>% plot_anomaly_decomposition() + ggtitle("Spooky Order Detection")
Now that we have everything in place, you may ask yourself, “Why wouldn’t I just do this in R Studio?” You could, but then you’d be missing out on the cool feature we have not shown yet. Those filters we added in the beginning will subset the data before the R script runs. This allows the user to select assorted options and the Anomaly detection will run on that subset of data. You could do the same in Shiny or R Markdown, but I don’t think as easily. Perhaps I’ll consider a blog doing the same in several ways so the reader can decide on their own what they would prefer.
In the final PowerBI dashboard the user can select the Material and/or Customer combination to see if there are anomalies in the ordering. It is a dynamic report and will automatically rerun the R code if the selections change.
I’m sure my friend at work will find something missing/wrong with this too. However, to me it is a another proof-of-concept that shows the advantages of integrating technologies like R and PowerBI.