News and Articles

Embrace the Abstraction

“You haven’t posted in a while.” A good friend said to me. “Well, I’m trying to get some good material gathered.” I responded. He laughed and said, “You know, it doesn’t have to be War & Peace.”

Ok then, that freed me up and my writer’s block disappeared. I’ve wanted to write about the abstraction layers that are out there for data science and why I have avoided them up to now, one in particular.

In 2014 upon returning from Nepal I started studying data science in earnest. One of my very first misguided forays (and boy were there a lot) was to start with Azure Machine Learning Studio. I spent about a month on it before realizing, “I don’t honestly know what those little drag-and-drop boxes are doing.” This is not the way to learn data science; this is how to learn an application.

I didn’t want that. So, into the trash it went, along with any other abstraction layers. I decided, it’s all bare metal for me. I turned to R and R studio along with Python, some statistics learning materials and tons of online reading and podcasting. It was a worthwhile and rewarding path to take.

Very recently a colleague asked if I used Azure Machine Learning Studio and I scoffed. “What, I write code dammit! Get away from me with your toys.” I may have slapped him. Later, something about his query bugged me and I remembered…

If you think you are absolutely right, then it is likely you aren’t open to all options, and possibly not the best options. Consider that you are wrong on everything first.” – The gist of something I read or heard

This weekend I decided to spend a couple hours in AMLS, just to see if I was wrong. My conclusion? I was not wrong in the way I learned, but now it is time to embrace abstraction tools. As I was working through a couple tutorials and moving the little boxes around I thought this time, “I know what those boxes are doing!” Take for example a Data Transformation function that normalizes the data. Well, I know how to do this in R and Python and have done it more times than I can count. Do I need to continue doing it? Perhaps it’s time to quit repeating manual code and just do the hardier pieces of code in R or Python.

That’s my conclusion…the approach to learning data science wasn’t wrong, but it was wrong to ignore hugely productive tools just because I was a bare metal programmer. Blue Diesel needs productivity tools like AMLS to increase the amount, consistence and the quality of the work we do.

Leave a Reply

Your email address will not be published. Required fields are marked *