A Large Investment or a Questionable Fad?
When I was on the eve of making the jump from Finance to Analytics I had two big fears. The first fear was that my chosen path to Analytics, a full-Master's Program, was excessive given that I was already an experienced Finance professional. The second fear was that Analytics just might be a fad that wouldn't last, or worse, it would not be useful. Both of these fears seemed to come from one basic (and very typically Finance) misconception. I couldn't rid myself of the idea that Analytics would basically boil down to using a few tools which would then be replaced by other tools based on the next new fad.
The paths to data science are many and some of the online training routes are free and short. So why was I taking an expensive two year Master's Program? These early doubts turned to confusion during my interview with the Master's program director. During the interview, I explained that I was well prepared for the program and I detailed my Finance accomplishments. The program director listened politely to my explanation of financial modelling, BI, and planning systems and then she helpfully suggested that perhaps I should take the less quantitative Marketing concentration since my background obviously hadn't prepared me for data science. My shock was emotional rather than intellectual. In my head, I knew that data science is the combination of multiple statistics and computer science domains and that each one of those domains alone could take many years to learn, but deep down inside I couldn't shake the emotional preconception that data science would be some quick and easy niche such as activity-based costing, driver-based budgeting, or rolling forecasts.
"A data scientist is someone who is better at statistics than most software engineers and better at software engineering than most statisticians."
Bad Data and Data Monkeys
Shaking off any doubts, I plunged into the program and was lucky enough to have a series of classes with a brilliant data scientist who taught textbook statistics, the perils of statistics, and the trap of being a data monkey. As it turns out once you learn statistics, and linear regression in particular, then you are prepared with the vocabulary and the concepts to learn the truth. The truth is, plain vanilla statistics including regression doesn't help much with most business problems because in the majority of cases plain vanilla statistics demands cleaner, simpler data than most business processes provide. It is very easy to dump data into a tool with an easy front-end like Excel or SPSS and get statistical results such as a variance or a regression line. Push the right button and SPSS or SAS will churn through dozens of variables and spit out both a predictive equation and a score that seems to indicate that the model is great. It is very easy, and also very wrong. This is the data monkey trap. (The idea here is a play on the idea that an infinite number of monkeys pounding on an infinitive number of typewriters could produce the works of Shakespeare, but that this is a ready poor method for producing literature.)
From the Trough of Disillusionment to Zoo Keeping
It is very disconcerting to suddenly realize you have just spent nine months learning statistics just to get to the point where you can understand its limitations. Disillusion can set in and you may be left wondering, "Can data science really add value in the real world of business data? Or, did I just study Stats, programming, Linear Algebra, and Calculus for nothing?" Luckily this is also the moment that you know enough to begin to understand where the real analytic magic can be found.
Once you know the vocabulary, the concepts, and the pitfalls then you can understand and appreciate the higher level techniques. Clustering can group your cost centers so that performance variances become meaningful. LASSO regression addresses many of the hidden flaws with regression analysis for sales forecasting. Support Vector Machines (SVM) and Naïve Bayes are vast improvements over logistic regression for classifications such as "Fraud vs. Not-Fraud" or "Hire vs. Don't Hire."
Unfortunately, it's at that moment when you see how everything begins to come together that you realize that there is still a lot of hard work in front of you. Most data science work is not really the rocket science so much as it is the simple cleaning and transformation of data so that it can be properly digested by the different techniques (aka, the algorithms). Each technique may require a different transformation of the data. At times being a data science can even seem as if you are a zoo keeper that specializes in the care and feeding of different mathematical algorithms.
Returning Home to Finance
Coming home to Finance, however, could be the hardest part of my journey into data science. It is as if you are Marco Polo who has travelled to a land of wonders and sophisticated inventions, but returning home you find that even the basic elements of your journey can sound bizarre. In Marco Polo's case, few believed that money could be made of paper or that coal from under the earth could power industry. In the case of explaining data science to finance, both the language and the concepts can be strange. Finance, and particularly accounting, is all about altering and summarizing data to simplify and add information. Data science is about getting the most detailed, purest data possible and then extracting information from it on the fly based on each analytic objective. When cultures successfully combine the result can be transformational, but getting to that point can involve a substantial amount of culture shock.