What’s a data scientist? Explaining roles in big data

The term “big data” has become a bit of a buzzword in the last decade, and the amount of job listings for roles like data scientist and data engineer has multiplied rapidly. There is no doubt that the data sector will continue to grow, but there is a lot of confusion about similar sounding roles.  What’s a machine learning engineer? How is that different from a data engineer? And what is the difference between a data scientist, a data analyst, and a business analyst?

Untangling Big Data

The dominant trait among data scientists is an intense curiosity—a desire to go beneath the surface of a problem, find the questions at its heart, and distil them into a very clear set of hypotheses that can be tested.

Thomas Davenport and DJ Patil in the Harvard Business Review.

While the explosion of roles in big data is relatively recent, the idea of the data science is not. As far back as the sixties and seventies some mathematicians and statisticians were suggesting it as an alternative name for computer science or statistics. But the modern idea of data science started to become more visible in the early 2000s, with the foundation of publications like the Data Science Journal. By 2012 Thomas Davenport and DJ Patil had labelled it data scientist as “the sexiest job of the 21st century”.

This flowchart highlights some of the tasks that can differentiate between different data roles. Source

They predicted that the field of big data and the demand for data scientists would only grow in the coming years – looking at the quantity of job posting involving data, that certainly feels true. Many of these roles have similar sounding descriptions and varying degrees of overlap between them, which is not a coincidence. As the field of data grew, its practitioners diversified and specialised. Data needs to not only be interpreted, but collected, processed, and stored. Databases need to be built and maintained. And even the interpretation and analysis can be done in different ways, giving rise to a multitude of people with similar job titles and responsibilities.

A radar chart showing the overlapping responsibilities between data engineers, analysts, scientists and machine learning engineers. Source

Ranking the Roles

The following ranking is of seven common roles in data, going in ascending order based on salary estimates. Many of these roles are closely related, often differing only in their slight specialisations and implementation. Ultimately what these roles look like might differ on a case-by-case basis, depending on the needs of the business or project at hand.

1. Data Analyst

This role is one of the common “archetypes” in the field of big data and is perhaps most closely linked with data science. Unlike data scientists who work with large amounts of unstructured data, an analyst works with smaller, structured data sets to solve specific problems. Their role can cover anything from cleaning and visualising the data, to creating predictive models or actionable recommendations and reports. 

Many data analysts have degrees which teach analytical skills, such as finance, maths or economics. Others come into data analytics from fields like software development, and have done additional bootcamp courses or certifications. In fact, an IBM/BHEF study from 2017 found that only 6% of data analyst job postings required a masters degree or higher qualifications.

2. Business Intelligence Analyst

The business (intelligence) analyst is a hybrid role that is very similar to that of the data analyst. There may be a lot of overlap with a data analyst, but their focus is more on business aspects, and on utilising the data to improve organisation and workflows. The table below gives a brief insight into the potential differences between the two roles.

While some business intelligence analysts come in with a business background, most of them have undergraduate or advanced degrees in a STEM field similar to data analysts.

A comparison of data and business analysts tasks and tools. Source

3. Database Administrator

This job title is perhaps the most self-explanatory: an administrator is responsible for maintenance, as well as ensuring the uptime and reliability of the database. This might include anything from backup and data recovery, to installation, migration, and and performance monitoring. Ultimately its about recognising and meeting the needs of the users, while maintaining the integrity of the database.

Database administrators usually have degrees in information or computer science, and have completed further relevant software certifications and advanced training courses.

4. Data Scientist

Being the poster child of big data, the title of data scientist can act a little bit as a catch-all. Depending on how the role is implemented it may have significant overlap with the tasks and responsibilities of the other roles listed here. This would have been especially true in the past, before the field began to diversify. Similar to statisticians they might design and test hypotheses, or use machine-learning techniques to create predictive models based on their data.

These roles usually require a minimum of a bachelors degree in a quantitative field like statistics, engineering or computer science.

5. Data Architect

Data architects are the ones who design and develop a data infrastructure – the structures needed to collect, process and analyse data. But their responsibility is not just the literal infrastructure needed to house the data, but the organisational framework within the business, which might be as far as overseeing integration, governance and compliance procedures.

Data architects usually hold degrees in computer science or engineering, and have experience in relevant fields like data modelling, engineering and warehousing.

6. Data Engineer

In many ways a data engineer is similar to an architect or even an administrator – their concern too lies with the infrastructures the data comes in contact with. Simply said, their responsibility is to make data available for analysis and processing. This may mean being responsible for database and pipeline maintenance and testing, as well as developing analytical tools.

Data engineering roles usually require a degree in a quantitative field, as well as a mix of experiences in software development, as well as proficiency with a variety of programming languages and data tools.

7. Machine Learning Engineer

Sometimes also known as “applied scientists”, these roles share quite some overlap with data scientists and engineers. Their focus is on the development, deployment and monitoring of machine learning algorithms – essentially automating certain processes. This may also mean realising the models created by data scientists into workable code, which requires a proficiency in programming and software development.

Like data scientists and engineers, these roles require bachelors or graduate degrees in a quantitative fields and fluency in programming languages. This is also one of the roles that may require further certifications and training courses in machine learning frameworks and platforms.

This diagram shows the potential points of overlap between different roles in data. Source

For anyone looking to learn more about data, analytics, or business intelligence, it is worth exploring the TDWI conference in Munich which will be taking place from June 20-22.

Sources

8 Data and Analytics Careers to Check Out | The Muse

A Guide to Data Roles — Data Captains

Data Analyst vs. Business Analyst: What’s the Difference? | Coursera

Data Roles – Friends but not the same – Wizeline

Data science – Wikipedia

Data Scientist: The Sexiest Job of the 21st Century

The Anatomy of a Data Team — Different Data Roles | DataCamp

The Quant Crunch: How the Demand for Data Science Skills Is Disrupting the Job Market | BHEF

What Does a Data Architect Do? A Career Guide | Coursera

What Is a Database Administrator? | Indeed.com