The reason the opioid crisis reached such a point is because the pharmaceutical industry in the 1990s vehemently pushed opioids as a safe and effective drug for pain management.
Upon further investigation, the real reason is that the distributors shipped copious amounts of opioids to pharmacies.
Wait…after deeper research it becomes clear that doctors are most culpable, after all if they had not prescribed so much, the demand wouldn’t have been there.
Scratch all that…isn’t there something to be said about individual responsibility? After all, only a small percentage of those prescribed opioids become addicted. More so, it is estimated that 67% of surgical patients don’t even take all the opioids they are prescribed.
It doesn’t end there. The finger-pointing blame game continues. Pretty soon it’s going to be Sushi’s fault. (She’s pictured below. Personally, we think Mr. P looks more suspicious. We’ll need to investigate.).
We are going to steer away from blame and cynicism and try and look at what we believe could help us out of this situation. Data Science (well, of course!). This is going to be a technical Proof Of Concept on an idea around suspicious order monitoring, not some fluff piece about how we need to change our perspective or punish big pharma without providing any tangible details. For now, we want to see if it can be done and if we should invest more time on it.
The DEA has regulations in place to provide guidelines around monitoring for suspicious orders. Suspicious Order Monitoring Regulations (21 C.F.R. 1301.74(b)). If you’re having trouble sleeping…give it a read. What exactly constitutes a suspicious order? The DEA sums this up in the term “Know Your Customer”.
“It is fundamental for sound operations that handlers take reasonable measures to identify their customers, understand the normal and expected transactions typically conducted by those customers, and, consequently, identify those transactions conducted by their customers that are suspicious in nature.”
There are numerous ways to try and detect anomalies or suspicious orders in our supply chain. The simplest would be to simply set a threshold and when that threshold is reached, we flag the order for review.
However, this is lazy and would require setting the threshold relatively low (to be safe), thereby flagging too many orders. It would also require us to define if a single order was considered or an aggregation of orders over a period. We need something more substantial than this. After all, how is this Knowing Your Customer?
What we would like to identify is if an ordering pattern for a ship-to location has an anomaly. Perhaps a location typically orders one case a week, and then they suddenly order two. Should that be an anomaly? Well, what if the previous week they didn’t order at all and therefore were making up for a shortage. Then no, this is within a frequency pattern that should be allowed.
We are going to try and use the open source AnomalyDetection package in R from Twitter. This package is designed to detect both local and global anomalies. A local anomaly is one that veers off the pattern of its close by neighbors. Such as orders placed by a customer that are 5x the quantity of those around it. A global anomaly considers things such as seasonality and trend; it is triggered when that overall pattern changes. Orders that increase every fall and suddenly start increasing in the winter instead would trigger this anomaly. Surges in frequency would also be a global anomaly, unless of course that increase in frequency is seasonal.
The AnomalyDetection package is for univariate time series, that is, the input is a series timestamp and count pairs. About as simple as you can get. There are some key parameters that we will use.
|max_anoms||The percentage of anomalies found to the percentage of the overall set. We will set this to 0.05 so that 5% of our set could possibly be anomalies.|
|direction||The direction of the anomaly. The options are ‘pos’, ‘neg’ and ‘both’. For our purposes, we want to know if there are surges either positive or negative so we’ll use ‘both.’|
|threshold||Only report positive going anomalies above the threshold specified. Options are: ‘None’ , ‘med_max’, ‘p95’, ‘p99’. We will start with 99% and then try 95% to see the difference.|
|plot||Set this to ‘TRUE’ if you want to plot the results.|
Note: there are many other parameters that could be useful, like alpha which sets the level of statistical significance with which to reject or accept anomalies. Again, this is a POC so we’ll keep it as simple as possible.
- We will take controlled substance orders from an ERP system and just download them as a txt file. My file was pretty messy in a rough download. Often in these articles, the steps to clean and position the data are skipped. I’m not going to do that…be ready for some garbage cleaning in R.
- Install and call library AnomalyDetection.
install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") library(AnomalyDetection)
- Read in your txt file. In my case, I had to cut off some random lines at the top.
ord <- read.csv("d:/data/orders_c.txt", skip = 5, header = TRUE, sep = '\t', stringsAsFactors = FALSE)
- Remove some extraneous columns that were not read properly.
ord$X <- NULL ord$X.1 <- NULL ord$X.2 <- NULL ord <- ord[complete.cases(ord),]
- Change the date column to a proper date type.
ord$Created.on <- as.Date(ord$Created.on, format = '%m/%d/%Y')
- The time element in our file is down to the second. That is too granular for us. We want to stay at an hourly level. This regex changes all minutes and seconds to 00:00.
ord$Time <- gsub('[0-9][0-9]\\:[0-9][0-9]$', '00:00', ord$Time)
- Merge the date and the time columns. Then, using the ‘lubridate’ package in R, convert that field to the format ymd_hms (year-month-day hour:minutes:seconds). Finally drop the unused date and time columns.
ord$DateTime <- paste(ord$Created.on, ord$Time) library(lubridate) ord$DateTime <- ymd_hms(ord$DateTime) ord$Created.on <- NULL ord$Time <- NULL
- Give the column names something decent.
colnames(ord) <- c("Material", "Customer", "Quantity", "UoM", "datetime")
- Make the order quantity numeric.
ord$Quantity <- as.numeric(ord$Quantity)
- Use the package ‘dplyr’ to group the orders and summarize the quantities.
library(dplyr) ord_1 <- ord %>% group_by(Material, Customer, datetime ) %>% summarize_at(vars(Quantity),sum)
- Subset the orders to get just one customer for one material.
ord <- ord[which(ord$Customer == '99999' & ord$Material == '9999-9999-99'),]
- Take just the datetime and the quantity fields.
ord <- ord[,c(3:4)] ord <- tbl_df(ord) #not totally necessary
- Because this table of orders for a single customer and a single material contains gaps, we have to fill those gaps before applying AnomalyDetection. We fill the gaps with zeros because a date and time that has no orders would be zero. It would be wrong to apply a mean or median in this case. First we create a data frame of all hours from the beginning of our dataset to the end.
seq_df <- data_frame(datetime = seq.POSIXt(as.POSIXct("2013-01-08 0:00:00", tz = 'GMT'), as.POSIXct("2017-11-09 23:59:59", tz = 'GMT'), by="hour",tz="GMT"))
- Next we merge our orders dataframe with the sequence dataframe and replace all missing values with zero.
ord_4 <- merge(seq_df, ord_3, all.x=TRUE) ord$Quantity[is.na(ord$Quantity)] <- 0
- Do a simple plot of the data to see if things are ok.
Things look good so far. We can see that most of the data is for zero order quantity, which makes sense. We also see that there are some clear outliers, such as the one just before 2014.
- Let’s do a quick application of AnomalyDetection.
res = AnomalyDetectionTs(ord, max_anoms=0.05, direction='both', plot=TRUE, threshold = 'p99') res$plot
Not a bad start…let’s play with the parameters and see if we can make it better. Also, the axis is defaulting to Month day, we’d have to fix that but we’re not going to work on it now.
Removing the threshold to see if that is limiting the number of anomalies we are seeing.
res = AnomalyDetectionTs(ord, max_anoms=0.05, direction='both', plot=TRUE) res$plot
That didn’t work well. It even flagged zero quantities as anomalies.
Note: what I don’t like is that the max_anoms seems to not have held. I’ll have to play with that later, this is still just a POC.
res = AnomalyDetectionTs(ord, max_anoms=0.05, direction='both', plot=TRUE, threshold = 'p95') res$plot
We think this Proof Of Concept has borne fruit! It appears to have identified not only local spikes in ordering, it also highlighted nearly all orders in a segment where the frequency appears to have increased. Granted, this is a simple example of identifying suspicious orders using an open source R package. However, it does open the door for further research into applying machine learning to this pressing issue.
To operationalize this work we’d need to have a feed of orders from an ERP system such as SAP so we could analyze the order as it was created. Then we’d need to perform realtime analysis using more sophisticated techniques than we have done here, such as RNN (recursive neural networks) or autoencoders. Then the presentation layer of the Spooky-Order Dashboard would need to perform actionable items back in the ERP system, such as blocking, releasing and/or attaching a xml/json report to the order itself. This would be necessary in case the DEA came a knockin’ and wanted to know why a particular order was or was not flagged.
All of this is doable and would help pharmaceutical manufacturers and distributors to more accurately monitor their controlled substance orders and shipments. One might take a cynical view that they knew of places like the pharmacy in Kermit West Virginia that was shipped over 9 million opioids in a two year period. However, we prefer to look at the data and solution, not the blame. Did the manufacturers and distributors simply not recognize the anomalies hidden in their masses of data?
We threw up some concepts on a whiteboard and came up with this highly detailed dashboard. 😉
This would be a great application of not only data science but also integrating R with SAP. If you have some ideas to add to this, please let us know. Some food for thought…what other reasons might you want to know if your customers ordering pattern suddenly changes? Your main customer starts ordering less or misses a few orders…this should interest you. Or, conversely, what if small customer starts surges for a period? There are many applications for this besides the “spooky orders”. Drop us a comment if you have other ideas or questions.
Thanks for reading!
Note: If you made it this far then why “Spooky Orders?” My brother recently had surgery and when in recovery (and a little loopy) he said to me, “Uh…hey…you workin’ on that thing?” I said, “What thing is that?” “You know…” he responded, “spooky orders stuff.” I laughed and said, “You mean Suspicious Order Monitoring…yes…and, you just gave me a great title.”