Understanding the Basics of Data Science

Bookmark and Share

The interest in data science is expanding due to the explosion in the volume, variety, and velocity of data, commonly described as big data. In 2013, Science Daily reported that 90% of the data in the world had been created in the last 2 years alone. And that trend has only continued.

To help provide information on how to harness the vast potential this data represents, Joe Caserta, president and founder of Caserta Concepts, will present an “Introduction to Data Science” workshop  at the Data Summit 2015 conference in New York City and also participate in a panel discussion titled “The Data Lake: From Hype to Reality.”

“I’m looking forward to the conference. It is in my hometown and both of these topics are kind of mysterious to a lot of people.  I’m going to be more than happy to share what I know about them,” stated Caserta.

Data science, famously described as the sexiest job of the 21st century in the Harvard Business Review, represents the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions. The catch is that it requires an understanding of many elements of data analytics

Fortunately, said Caserta, many colleges have begun offering data science programs and masters of data science degrees. “We’re seeing some of the best data scientists coming out of those programs,” he said.

However, the area is still fairly new and academics alone are not enough to make a great data scientist, Caserta emphasized. It is important to not only have experience with data science but experience in a particular subject as well. “The best candidate to be a data scientist is someone who has been working in a specific field such as pharmaceuticals, healthcare, or manufacturing, and then, goes to school, becoming educated in data science and learning statistical algorithms and how to apply them to data to retrieve insight. That is typically how a data scientist is born,” explained Caserta.  

The challenge of breaking into the data science industry is simple but being a data scientist is hard. “There are lots of people who are educated in statistics, but there are very few people who make good data scientists,” Caserta said. This is due to the many different aspects that a highly proficient data scientist must possess. “Data scientists need to have really good analytical, statistical, and mathematical skills. But you also need to be a data engineer and a subject manner expert, and to find someone with all of those capabilities is very challenging.”

The data lake strategy has been born because of the relentless growth in data volumes and types. The most common method of data storage in the past had been data warehouse, which are very highly structured and highly governed. However, he said, “When dealing with data you don’t necessarily have control over, you need to come up with a new way of doing things. If you’re streaming data from sensors or social media, you don’t have control over it. The data can’t be structured until it gets ‘interrogated’ first. The way to interrogate it is to ingest the data into a place you own (e.g., a data lake) and determine what is valuable.” However, data governance can be a challenge when working with data lakes because it is difficult to govern data that has no structure and you have no control over.

Data Summit 2015 will take place on Tuesday, May 12, and Wednesday, May 13, with preconference workshops on Monday, May 11, at the New York Hilton Midtown.

To hear more from Joe Caserta, register for Data Summit 2015 at

Caserta will present the “Introduction to Data Science” workshop  on Monday, May 11, at 1:30 pm, and will participate in the  panel discussion on “The Data Lake: From Hype to Reality” along with Venkat Eswara, director of product marketing - Big Data and Cloud Solutions, GE Software; and George Corugedo, RedPoint Global.