Page 1 of 2 next >>

A Road Map to Closing the Data Science Skills Gap

Let’s cut right to the chase. Today, “big data” is just data, and the majority of organizations recognize the importance of being data-driven. But data, of whatever size, is only valuable if it’s accessible, trustworthy, and usable.

That’s where data scientists enter the picture. Data scientists today are expected to go beyond using and interpreting data. They are responsible for acting as the human intermediary, guaranteeing that the average user can accurately secure the answers they need from the data.

To ensure the best results, the modern data scientist must do more than think critically about data—they must also promote a positive culture around data that leads to more people becoming “citizen data scientists” in their own roles. Siloed data isn’t valuable. Data from multiple sources—that users can aggregate and collaborate on—is critical to making data-driven decisions that will positively impact the business.

The Overarching Role of the Data Scientist

In today’s complex, data-rich world, data scientists are essential. They understand the data and can act as guides through the data landscape.

A basic response to the question of what data scientists do would be that they model data and get results. At least, that’s the part that everyone talks about. But there is so much more that goes into the interrogative process of data analysis.

First, they must evaluate and understand the data, asking detailed questions such as: How is it generated? How trustworthy is it? What are the limitations? When is data missing?

Next, they must see if there’s a fit between the available data and the question that is being posed and consider whether the data can answer the original question and whether assumptions need to be made. Even though the data has been identified, it still might need to be transformed into the correct format, and then be aggregated, combined, restricted, or compared.

For more articles like this one, go to the 2020 Data Sourcebook

Next, data scientists might get to apply the skills they pride themselves in: modeling. And once there’s a model and results, the final and most important step is answering the original question. If there was a question about which investment to make, for example, then the answer needs to be a recommended investment, not just a model and coefficient—and data scientists must be able to communicate those results, including how to interpret what they’ve done and its limitations, especially to a business audience.

Those are just the basics. Data science goes well beyond these traditional tasks—having a team of data scientists closes the loop on data and facilitates the use of data as an active, living process throughout the organization. Once people begin accessing and using data, they start to understand not only the gaps and limitations of that data but also the potential for what the data can elucidate. Data scientists not only activate the potential of an organization’s current data, they prepare the organization for the future of even more advanced and impactful analysis.

Today’s organizations strive to be data-driven, and data scientists often serve as the human-in-the-loop for data, an important role in driving this data-driven culture. Increasing collaboration around data and enabling users to make confident, data-driven decisions lead to more people understanding and using data in their roles, which leads us to ...

The Rise of the Citizen Data Scientist

Data isn’t just for “data scientists” in the traditional sense anymore. Fundamentally, data science comes down to thinking critically about data, and that’s something everyone can accomplish. There are aspects of what all individuals do in their organizations that generate data, and therefore they can (and should) do their part to aim for the highest data quality.

With this, the oft-used phrase “garbage in, garbage out” comes to mind. If a salesperson inputs data into a CRM system, the data must be accurate or it can’t be effectively leveraged in downstream analysis. Bad data leads to bad results.

This makes it necessary to establish and empower citizen data scientists across the organization. In terms of developing these individuals, exposure to data and awareness of its impact are important and should be driven by communication and education from the data science team.

Not everyone needs to know how to run a regression, but everyone should work toward those goals:

  1. Identifying how data can help inform a decision.
  2. Finding the necessary data and understanding the prior analysis that has been done on it.
  3. Thinking critically about data, including data that others share in support of their ideas.
  4. Understanding their system and the data it generates, and then thinking about how to use it to measure effectiveness.

A citizen data scientist also has the power to advocate for ensuring that the data they generate is easy to find, track, and measure. This type of initiative begins with recognizing the value of data, whether it might confirm a suspicion or be a total surprise.

For example, if a product manager introduces a new feature, the intended impact of that feature should be considered as well as the way user behaviors are expected to change. When a new feature or enhancement is framed in the context of the value it will generate or the behavior it will change, the logical next step is to track those behavioral changes with defined key metrics and measure the impact.

One thing to keep in mind is that even if the analysis doesn’t show the expected result, it still provides value. Maybe assumptions about the user and/or feature need to be re-evaluated. Is the product being used as expected? Even if the results aren’t “positive,” this data culture around understanding what the data says will build on itself and, ultimately, lead to better results. Incorrect assumptions can be corrected, leading to better decisions in the future.

Page 1 of 2 next >>


Subscribe to Big Data Quarterly E-Edition