What is the Future for SQL Developers in a Machine Learning World?
When I graduated from college in the late 1990s, it was just in time to enjoy the Y2K crisis. If you remember those fun times, then you are old enough to enjoy this blog. I graduated with a Management Information Systems (MIS) degree, which is a cross between Computer Science (CS) and Business Management, and although I was stronger in CS than Business Management, I survived. There was a class spanning both disciplines that I partially excelled in called Database Theory, which taught the basics of Relational Database Management Systems (RDBMS). We learned everything from proper table structures, Primary Keys and Foreign Keys, to basic modeling techniques. It is also where we first heard of the term SQL “sequel” (or Squirrel as some people think it is pronounced).
SQL stands for Structured Query Language and is supported by a set of standards, although they seem to be implemented slightly differently by every database vendor. Even though SQL is always a little different depending on if you are using MySQL, Oracle, DB2 or whatever vendor tool you have, if you are good at writing SQL and know the database model, you can adapt quickly to get whatever data you need.
In my career, I have spent about 14 years in various integration roles, almost always using some type of RDBMS system as my source and targets. I excelled at building different data models to support reporting, data marts and operational data stores (ODSs). All these data models were supporting operations, financial consolidation, and other diverse business needs. I became VERY, VERY good at writing complex and efficient SQL throughout my career in IT.
Today, I still enjoy trying out different systems and databases that all claim SQL support in some form or another. For example, I recently gained my Data Vault 2.0 Certification, in which I built a Data Vault for our corporate needs using Snowflake that is a fully supported ANSI SQL Cloud Data Warehouse system. Happily, I have not lost my skills.
But the question that all this is leading up to is: Can someone like myself still find a place in this world of new platforms and processing?
To SQL or not to SQL
The database paradigm has changed. There is now NoSQL, Document Databases, Columnar Databases, Graph Databases, Hadoop, Spark, and many other Massively Parallel Processing (MPP) platforms popping up daily. They all provide great benefits for many different use cases that just don't work well with traditional RDBMSs.
Big data platforms provide a way to process more diverse data faster than we could have thought of in 1999, when most IT professionals had to know SQL to meet business needs. Today, you need to know many more platforms and environments to take advantage of all the capabilities and benefits that Big Data vendors are promising. Can those of us who have depended on SQL compliant systems survive, or do we need to learn Scala, Python, R, Java, or whatever the next cool language and platform needs?
There are the saving graces of tools like Hive and Impala that allow you to use your SQL skills to find and access data on Hadoop platforms and Data Lakes, but tools like Hive all come with their restrictions. You can only do so many functions on the data — the defined functions that SQL has always supported. Of course, you can use User Defined Functions, but then you get into programming quickly.
Is Python in your future?
Where SQL-supported systems fall short is when you start applying the latest machine learning methods on your data or when you want to take advantage of huge volumes of streaming data and query data in motion. Yes, for those of us who love RDBMSs, data is not always at rest.
Times have changed and so must our skills. I personally have started to learn Python as it is an easier language to use than Java, and many machine learning methods are supported in Python. In five to 10 years, every information worker or IT support person will have to know how to use machine learning, or at least how to support it. You will have to support MPP systems like Hadoop and Spark in some form for data processing. Machine learning will be key to support data-driven decision making and to get the competitive insights required to win your market.
SQL is dead, long live SQL
There is still a very strong need for SQL as I see methodologies such as data vaults evolve and become widely popular in the NoSQL and HDFS/storage spaces. There will always be structured systems, e.g. ERP and CRM systems that will need structured data warehouses. You can count on that not going away. But, when your CxO comes to you to predict the future and its business implications, or to understand highly automated optimization solutions that get smarter over time, you may want to stop looking in the same old usual places. You may have to start looking at all the data available in its most natural forms (unstructured or semi-structured) and find ways to become more predictive, prescriptive, and even cognitive. So, while SQL and RDBMSs, like the mainframe, will exist for many, many years, the tide is shifting towards tools for real-time analytics as SQL, currently, falls short!