The harsh reality of being a Data Scientist.

  • Post by Lan
  • Dec 26, 2022
post-thumb

The harsh reality of being a Data Scientist.

That you should know.

That you should know.

Data science is like an ever-evolving River, from one day to the next is never the same. It keeps moving. One day there were statistics. Then there were Machine Learning and Artificial Intelligence. Then the Transformers became a new star and now, ChatGPT is blowing up the internet. It must be cool to work as a Data Scientist, is not it?

In 2019, I was working as a Business Intelligence Developer and I really wanted to become a data scientist. I thought that the title sounds prestigious, and that building machine learning models to predict the future must be so sexy. I wanted to try it out so bad that I made a big investment in it — I followed a Data Science Master’s degree and landed a Data Scientist job. Only to find out that…

1. Data science is the wrong focus! You can’t do DS without any (good quality) data!

It is no doubt that trying to make sense out of data and AI with bad data is not a good idea. Garbage in, garbage out. It is like starting your day with a shot of tequila at 8:00 am” (I stole this from the computer scientist Bill Inmon) It is just not a good idea.

In my experience, many organizations still lack the Data Governance processes and technical foundations that often lead to dirty and disconnected data. So Machine Learning is just the wrong focus. More value is to be built from building the foundations and the focus should be on the discipline of making data usable and accessible by data scientists_._ There are a lot of jobs that need to be done before one can start building a machine-learning model. Everything from setting up the infrastructure to building the data modeling to storing data in the data warehouse, and making that data accessible to end-users (usually the job of IT a data engineering), etc.

Machine Learning is just the wrong focus. The focus should be on the discipline of making data usable and accessible by data scientists.

If Data Scientists have access to well-defined and high-quality data, and the data dictionary to tell them exactly what that data means, it can save them huge amounts of time, and they can truly be data scientists and not data garbagemen.

2. Expectation vs harsh reality

A study by 365 Data Science shows that data scientists, on average, switch employers in 1.7 years. In big organizations, it would be the time needed to be completely onboard!

I think the expectations of Data Scientists are all wrong. The data scientists go to school and learn all these fancy neural networks or bayesian theories then come into the real world and feel robbed of their careers. Companies hire data scientists by painting aspirational visions of the work they will be doing - such as driving business value with cutting-edge AI and ML. A dozen of folks thought they would land and deliver high-impact solutions using state-of-the-art machine learning and AI - only to find out that - they spend most of their time on:

  • Searching for and gathering the data
  • Running an ad-hoc data request that is urgent
  • Deeping dive into data analysis and building dashboards.
  • Building data engineer pipelines.
  • Spending way too much time figuring out what data they need and where and how to get it, trying to figure out what the data means in business terms, etc.
  • 100 other things before they can actually start building a model.

Such work is so important to any organization and requires an equally high level of skill sets but unfortunately, they are not always the expectations of Data scientists. The data scientist gets to be a data scientist for only a small fraction of their time.

_Join the_** Medium membership** program to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, at no extra cost to you.

Join Medium with my referral link — Lan Chu
_Read every story from Lan Chu (and thousands of other writers on Medium). Your membership fee directly supports Lan Chu…_huonglanchu.medium.com

3. What can you do?

Tip #1: Be clear on what you want and ask yourself: which flavor of data professional do you want?

If you get hired to be a data scientist in an organization where there is no data and no data engineering — who makes clean data accessible, guess who is going to be the data engineer? Yes, you are right, it is you! But data science and data engineering are completely different disciplines and being a data scientist does not equip you with the skills to become a data engineer.

Ask yourself, are you willing to roll up your sleeves and do whatever it takes to create value out of data, even if it means starting with making the data usable and accessible in the first place? Then this is an incredible opportunity to learn the new in-demand skill called data engineering! But be prepared yourself to start from scratch and be realistic about how efficient you may be because hey, you are learning a completely new skill!

Are you willing to roll up your sleeves and do whatever it takes to create value out of data?

Tip #2: Learn how to influence as a Data Scientist

Position yourself to have some influence over the scope of the tasks you will do. Let us embrace working together instead of trying to do things alone. The data scientist needs to be paired with data engineers to be free from having to wrangle with data. Data scientists need to work with domain experts to ask the right business questions. Consider negotiating for ways to hold your data engineering colleagues (if there are any!) accountable for collaborating with you. Share your concerns with the manager, and reasons why you should be doing what you are hired for and get all the support that you need.

Tip #3: Proactive in identifying problems and proposing solutions.

Listen, I know data scientists are not often trained on asking the right questions in the first place! We are trained to provide the answer to a question. But very often, the business is clueless about what data scientists can do. And instead of complaining, being miserable, and forgetting our data science skills, why not learn the business, and learn how to solve problems by starting to ask questions to come up with valuable business ideas?

This may involve working with domain experts or business stakeholders to understand what they are doing, their needs and concerns, and how you can help with your DS skills. Most importantly, follow up on it, present your ideas, and make an MVP. Being curious will take you far more than you can ever imagine.

Tip #4: Know what you are getting into & manage your expectation

I promise you that if you keep changing jobs, there is a high chance that you will be in the same situation. Give the job and the organization the time. Like all mature professions, in the beginning, every profession was once immature. So do data and data science. There is reason to believe that in the future, data scientists can land and start making real contributions to businesses with high-quality and accessible data.

Thanks for reading!

If you are keen on reading more of my writing but can’t choose which one, no worries, I have picked one for you:

Implementing a Data Lake or Data Warehouse Architecture for Business Intelligence?
_This article explains what business intelligence is, the process to deliver BI, and compares a DWH and Data Lake…_towardsdatascience.com

By Lan Chu on December 26, 2022.

Canonical link