How to become a data scientist: A data-driven approach to careers in data science

Many devs and IT professionals looking for the next career wonder how to become a data scientist. Ashwin Thota matches up skills to job titles.

Imagine a world without any data. Now imagine the extent of all the available data. Both are equally unimaginable. Every second of the day, petabytes of data are being produced. However, copious amounts of data, also known as big data, in and of itself are not useful until it is computed. As John Allen Paulos famously once said – “Data data everywhere but not a thought to think.”, we are surrounded by an overwhelming amount of data. Still, the real value of having data and analytics teams is in generating this data’s insights. 

Big data needs to be explored, tamed, analyzed, and interpreted before it can be of use. In the recent past, organizations have hired unicorns to help them solve this problem. Unicorns named data scientists that can help them to store, retrieve, and tame their unstructured data. They are also expected to generate predictive insights from the data using advanced analytics.

What does a data scientist do?

Data science combines various disciplines such as data analysis, data engineering, mathematics, statistics, domain expertise, advanced computing, visualization, and much more. A data scientist is expected to have mastery in four fundamental areas: business/domain, mathematics, computer science, and communication.

Thus, a data scientist is someone who can leverage the seemingly infinite amount of data to achieve business goals. They are expected to uncover the latest trends in unstructured data through microscopic attention and analysis. The analysis is followed by effective communication and interpretation of the results, enabling the organization to drive change.

Why is it an exciting time to start a career in data science?

Data science became the rite of passage for many data enthusiasts after the Harvard Business Review called it “the sexiest job of the 21st century”. The field of data science has made enormous strides since this article in 2012. When no universities were offering a degree in data science, Harvard scholars predicted the rapid growth and popularity of this field. Immediately after, LinkedIn saw a rise in the number of people identifying as data scientists.

Data science has been the most sought after career in the US for four consecutive years, according to Glassdoor. Bloomberg reported a 75% increase in job listings for data scientists. In 2018, the demand for data scientists in India shot up by an incredible 417%. According to the US Bureau of Labor Statistics, 11.5 million new jobs are expected to be created in the field of data science by 2026.

Considering the supersonic speed at which data is being produced, the demand for data scientists is not expected to take a dip anytime soon. With technological advancements and the high tide of data submerging every form of industry, organization, and enterprise, it is no surprise that experts who can crunch data are required everywhere. With more and more companies realizing the importance of big data, AI, and machine learning, they are rapidly implementing data science capabilities.

Not just tech for companies, data scientists are being sought by every sector. Tech giants like Amazon, Apple, Facebook, Google, and Microsoft employ only one-half of one percent of the data scientists in the U.S. While there is no shortage of data, there is a shortage of data scientists. According to DataRobot, 73% of organizations surveyed do not have any data scientists or AI specialists.

Since data science is all the rage, it creates a demand for individuals with advanced educational qualifications. All the big organizations look for people with at least a Master’s or a PhD. The higher the educational qualification, the greater is the depth of knowledge required to be a data scientist. According to KDnuggets, 88% of data scientists have a Master’s degree, and 46% have a PhD. However, with an increasing demand for this position and not enough highly qualified professionals to fill them, small companies accept citizen data scientists.

Apart from the high demand and the handsome pay, data science is a satisfying career because of the vast array of problems you can solve with it.

Roles that are close to data science

There are multiple career paths that one could take to become a data scientist. The most obvious route is through education. According to the Bureau of Labor Statistics, most data scientists possess a master’s degree or higher in computer science. Thanks to the diversity of skills required in data science organizations, not everyone needs to get an advanced degree to get into the data science space. There are many paths a working professional can take to become a data scientist. Most of the modern and data-driven organizations have realized the value of data. This realization led to the creation of multiple specialized roles that deliver value using data. Below are some of the data related roles that exist in most organizations:

Data Analyst

A data analyst is required to collect, process, and transform data into usable forms. They help companies make better business decisions. Depending on the organization, an analyst’s job could include tracking web analytics, extracting insights from consumer datasets, analyzing A/B testing, making strategic recommendations based on financial data, or merely organizing messy, unstructured data. A qualified data analyst’s essential requirements are to understand Python, C/C++, HTML, visualization tools, and SQL.

Data Engineer

Data engineers are architects of big data and information pipelines. They strive to create a reliable, interconnected network of data for use within an organization. They design, build, and manage systems that analyze and process data for the organization. They also ensure that the systems run smoothly. This job is different from typical data science careers because it focuses more on hardware and data warehousing than analysis. It is an advanced position and requires a background in software engineering. Data engineers must be skilled in SQL, Databases, Big Data, Cloud computing, Java, Python, Ruby, Matlab, Hive, Pig, SAS, etc.

Database Administrator

Database administrators (DBAs) are responsible for effectively storing and organizing an organization’s data. DBAs are often seen as the guardian angels of the company’s data as they – create highly available databases, design security rules and protocols to safeguard data and perform upgrades to keep databases up to date. DBAs are expected to have at least a bachelor’s degree in computer science or information technology. DBAs are skilled in SQL, Database management, Big Data, Algorithms, Optimization, UNIX/ Linux, Data Analysis, AWS, Python, etc.

Business Analyst

Business Analysis is a great career choice for someone with an educational background in business and a strong foundation in numbers. Business analysis is a less technical position, requiring collaboration between business and IT. A business analyst requires a knowledge of business processes, data visualization tools, and data modeling. They are expected to understand and map out business processes, identify key business problems that can be solved with analytics, and collaborate with other technical groups to translate business problems into solutions.

Business Intelligence Developer

This position requires extensive knowledge about business and the ability to translate data into consumable data products. BI developers and analysts gather data, design and develop systems to increase a business’ efficiency, and help management make good decisions. They do this by either mining data from a company’s software or reviewing competitor data and industry trends. These experts are expected to be tech-savvy, and they can either use the existing BI tools or develop their own BI analytic applications.

Machine Learning Engineer

Machine learning engineers deliver machine learning-focused software solutions to meet an organization’s needs. They design and build machine learning systems that can interpret data and make predictions or draw conclusions. In addition to creating data funnels and machine learning software, they also run tests and monitor the system’s performance to ensure accuracy. These engineers require advanced skills in statistics, programming, and data science.

Data Architect

With the increasing importance of big data, this position is becoming more and more critical. Data architects engineer new database systems, design analytics applications, and create blueprints to integrate, centralize, maintain, and protect data. They ensure the performance of data solutions, improve the functionality of existing systems, and provide access to database analysts and administrators. The position requires a solid understanding of languages like Hive, XML, SQL, Pig, Spark, systems development, and database architecture skills.

How to break into data science?

As I alluded to in the previous section, many roles come close to data science. What are the similarities and differences between these roles? What skills should one add to their profile to become a data scientist? This section will show a data-driven approach explaining the career path options for a citizen data scientist.

Data scientists skills profile

I performed an analysis of the skills that are required to become a data scientist. The information represented below is manually collected from LinkedIn. Figure 1 summarizes the skills of 200 data scientists from leading technology destinations such as the San Francisco Bay Area, New York City Metropolitan Area, Seattle, Dallas-Fort Worth Metroplex, Raleigh-Durham-Chapel Hill Area, Greater Chicago Area, Greater Boston, London Area, Bengaluru, and New Delhi.

Top 20 skills for data science

Figure 1: Top 20 skills of data scientists

It is abundantly clear from Figure 1 that the top skill that most data scientists have is Python, followed by data mining. Figure 2 represents the fastest-growing skills among data scientists for the last 12 months. It’s interesting to observe that members of other data teams traditionally possess four out of the top 20 growing skills. For instance, “Data Analytics”, and “Data Visualization” skills are very common among Data Analysts, Business Analysts, and Business Intelligence Engineers. “AWS” and “Data management” skills are commonly found among Data Engineers and Database Administrators.

Fastest growing skills for data scientists

Figure 2: Fastest growing skills among data scientists

Motivated by these growing skills, I conducted a separate analysis to understand the common job titles of these skills. In other words, I manually collected the job titles of people who have at least one of the top 20 skills that data scientists have. I have excluded data science-related jobs such as Statistician, AI Engineer, ML Developer, etc., from this analysis. Figure 3 summarizes the top 10 job titles for these skills. It is clear from Figure 3 that people who are already in roles such as Software Engineers, Data Analysts, Business Analysts, Data Engineers, Business Intelligence Developers, and Database Administrators have the highest chances of getting into data science.

Job titles with skills that overlap data science

Figure 3: Top 10 Job Titles with skills that of a data scientist

Another data point I looked at is the annual Kaggle Data Science and Machine Learning survey. The latest 2020 survey garnered 20,036 survey responses. The survey attracted global participants at all career and educational levels. This survey asks an extensive list of questions related to their careers, tools usage, nature of work, etc. The specific question that caught my attention is – “Select any activities that make up an important part of your role at work: (Select all that apply)”. The results of this question are summarized in Figure 4. Like the previous analysis, I have excluded data science-related jobs such as Statistician, AI Engineer, ML Developer, etc., from Figure 4. The spider chart from Figure 4 shows that Data Analysts and Business Analysts spend a significant amount of their time performing Data Analysis. Data Engineers balance their time building and running data infrastructure and performing data analysis. DBAs spend a significant amount of their time building and running data infrastructure. DBAs are also seen supporting the data science function by conducting data analysis and assisting the building of ML prototypes. But what’s interesting is the duties of data scientists (yellow line in spider chart). It seems like data scientists are performing duties that are distributed across many skillsets: data analysis, experimentation, ML prototyping, and building ML services. The chart also shows that data scientists spend most of their time performing data analysis than other duties.

Kaggle- Duties performed vs. roles.

Figure 4: Kaggle- Duties performed vs. roles.

Career paths for citizen data scientists:

Based on the above data points and analysis, I created four different career paths for citizen data scientists. Data analysts, data engineers & DBAs, business analysts, and software engineers have the most direct career paths to become a data scientist. Figure 5 outlines the options for professionals in data careers to become data scientists and identifies the transferable skills and the gaps that existing data professionals might have to fill.

How to become a data scientist pathways for roles

Figure 5: Career paths for citizen Data Scientists

How to become a data scientist

Data science has become a hot field in the recent past, and it will continue to be one for the foreseeable future. The shortage of people with data science skills will be amplified soon as more companies realize the value of investing in data scientists. This article provides a view into the data science related careers, and it also conducts a detailed analysis of the skills and duties of these related fields. Finally, the article also provides career path options for data professionals to become a data scientist.

If you liked this article, you might also like Building Machine Learning Models to Solve Practical Problems.