Data Engineer is now the new star. Is that the death of Business Intelligence Engineer

  • Post by Lan
  • Apr 26, 2022
post-thumb

Data Engineer is now the new star. Is that the death of Business Intelligence Engineer?

Suddenly there is this Big Data movement. Data Engineers are the new stars. This post explains the skills gap to transition from BI to DE.

It seems these days that every person I talk to is either a data scientist or a data engineer…Since Data Engineer is rocketing in popularity, is there still a place for traditional functions such as BI and ETL Developer? In this post, I want to point out the overlaps and differences between BI Engineer and Data Engineer. If you want to make the change to become a data engineer, you will also know what the skills gaps are.

Well, let’s me first ask you a question. Is your working day usually looking like this:

And do you mainly work with SQL as a BI Engineer and have lots of insecurities for not getting any respect because:

· You are seen as “second class citizens” and feel left behind by oh-so-smart Computer Science degree engineers

· SQL data engineering — are you kidding me?

· Come on! It’s drag drop. That is so low skill. A 10 year-old kid could do it.

I can feel you. At least deep-down inside me, I know that moving data from A to B using a drag and drop tool is probably not the sexiest job one can think of…

What is a BI Engineer?

First, let’s try to understand who the BI Engineers are and what they do. A BI Engineer does ETL. The term ETL has been around since the 90s, supporting a whole ecosystem of Business Intelligence, which is a discipline of Data Management. The ultimate goal of a BI implementation is to turn operational data into meaningful information. A BI Engineer turns raw data scattered from different operational databases into insights that are stored in a destination database like DataWarehouse. If you want to know more about DataWarehouse (and Data Lake) architecture for BI, please feel free to check my previous post.

Now, if you come from a BI background, you must be very familiar with Microsoft SQL and stored procedures. SQL is the language to communicate with databases. A BI engineer uses SQL to extract raw data from a data source, and transform and load it to DataWarehouse or Data Lake as required by the business. You probably have also heard about different ETL tools such as Informatica, IBM DataStage, Microsoft SSIS, and so on. These so-called ETL tools are usually icon-based, and have a graphical interface, along with drag and drop functionality that helps developers communicate complex ETL jobs in a straightforward and visual way, which makes life much easier. A traditional BI Engineer also has solid knowledge of data modeling techniques such as star schema or snowflake schema. It is all about how you structure the data into tables and describe the relationships at the right grains that correctly capture the business and answer all business questions. BI engineers also visualize the information in the form of reports, and dashboards.

BI has been a very mature field and lots of people have built successful careers as BI Engineers. Everyone was happy until when suddenly there is this movement called Big Data. While traditional ETL tools have proven their values in building the back-end DWH to organize a company’s all information, over the last decade, we have seen more modern ways of getting your data from here to there due to the birth of Big Data. Data Engineers have now become the new star…

What is a Data Engineer?

According to Maxime Beauchemin in his famous post The Raise of Data Engineer, the term “Data Engineer” was firstly introduced by Facebook around 2013. The BI Engineers at Facebook at the time realized that they were using very different approaches and toolings compared to traditional ETL tools to process massive amounts of data that Facebook has. The Business Intelligence Engineer role was, therefore, refactored to Data Engineer. The biggest reason for this change was the rapid shift in the data that companies work with.

Big data has driven changes in how organizations process, store and analyze their data. This required a completely new discipline because there is simply no tool out there that can manage the massive amount of data, which may go up to hundreds of Terabytes or a velocity of 100 GB per day for instance. It required companies to build their own tools and become more code based instead of using traditional ETL Tools. Plus, lots of these ETL tools do not support all components in CI/CD pipeline, for example not having version control and partial deployment to deploy a small component/change to production.

Maxime Beauchemin defines a data engineer as a software engineer that is specialized in data. They focus on designing, building, and maintaining data infrastructures and platforms which consist of servers, applications, tools, platforms, and databases to keep the business going. Data stores such as DataWarehouse, Data Mart, Data Lake, or even Excel are part of data infrastructure. A data Engineer also implements ETL pipeline and makes data available for further analysis. They don’t spend much time playing and analyzing the data.

You know what, let us now have a look at Wikipedia for the definition of data engineer and see what is written there. So it starts with:

A data engineer is someone who creates big data ETL pipelines, and makes it possible to take huge amounts of data and translate it into insights.”

OK, we are doing good. It sounds exactly like the job of a BI Engineer. But wait a minute! The story continues:

“Data engineers usually hail from a software engineering background and are proficient in programming languages like Java, Python, and Scala.

Oooops!!!…

I recently talked to a friend who works as DE at Booking.com. What he describes his work at first feels exactly like the job of a BI Engineer who engineers the data pipelines based on business requirements. However, as a data engineer, he also specializes in building data platforms and infrastructure. This includes tasks like setting up and operating platforms like Hadoop/Hive/HBase, Spark, Scheduler System, and Data Analytics Platform. I can imagine in small companies — where there is no data engineer team, the IT team may cover the workload of the Data engineers around setting up and operating the organization’s data infrastructure.

Python, SQL and Scala are the languages used for daily tasks of my friend, which include but not entirely limited to:

· Extracting data from the source databases using SQL.

· Using airflow for system scheduler which is written in python

· Designing the data models and ELT pipeline for data processing. The pipeline is written in spark which is a combination of Python and Scala. Pyspark and SparkSQL are must-haves.

· Deploying Machine Learning Model to productions. Data engineers and Machine Learning Engineers write production codes for Machine Learning models, which again need to understand Python.

So, Wikipedia seems correct! (As it is always, isn’t it?!)

Overlap and differences between BI engineer and Data Engineer

In my humble opinion, it really depends on the company you work for. Most large companies will have engineers to take care of the infrastructure and ETL Pipeline. The main function of BI is then focus on building reports and dashboards for business. But at my employer, there is no Data Engineering function and BI Engineers do everything from the back end to the front end.

On one hand, a Data Engineer would have some similar expertise as BI Engineer. First, the data warehouse is as relevant to Data Engineers as it is for BI Engineers. It is the focal point for both DE and BIE as they work and gravitate around it. They both oversee many aspects of its design, construction, and operations. Data modeling (Data Vault, Dimensional Data Modelling) is a core skill for both parties. It is all about how you structure the data, and how you model your data at the correct grains that can answer all business questions. In addition, Data engineers and BI Engineers need to have good knowledge of databases and SQL in order to query and extract the data.

But the data engineering role was born from the Big Data movement and is equipped with additional skills borrowed from the software engineering discipline. Traditional technology like Informatica, IBM Datastage, or Microsoft SSIS — which can’t handle big data is no longer relevant to modern data engineers (Funny how my friend said he has never heard of Microsoft SSIS!). These people are equipped with more generic software engineering skills such as understanding and being able to write programming languages like Python, Scala, and Java. A Data Engineer would also be familiar with version control (Git) and continuous integration, continuous deployment.

Is BI Engineer a Data Engineer?

In my humble opinion, a BI Engineer is a data engineer in the sense that, at the end of the day, they both engineer data pipelines based on business requirements. However, if you come from a Business Intelligence background and want to make a change in your career to the sexy Data Engineer who works with the so-called Big Data, you need to recycle yourself and develop skills in writing codes and version controls. Plus, a Data Engineer needs to have an understanding at a high level of how different tools and technologies work to make the right choices when designing solutions.

Thank you for reading. It takes hours of hard work to write. So, suppose you find my post useful and are thinking of becoming a Medium member. In that case, you can consider supporting me through this Referred Membership link :) I’ll receive a portion of your membership fee at no extra cost to you.

By Lan Chu on April 26, 2022.

Canonical link