At Data Summit 2015 in New York City, Tony Shan, chief architect, Wipro, gave a talk on the key components of a successful big data methodology and shared lessons learned from real world big data implementations. According to Shan, there is an 8-step process for a methodical big data framework and specific techniques and methods used in these steps.
- Problem - Defining the problem is an obvious but important first step. Hadoop is a buzzword that many executives throw around, but have no real knowledge of whether or not it is the key to their problem.
- Assess the problem - After defining the problem, being able to break it down into smaller sets makes it easier to assess. “It’s very similar to a doctor. When you see a doctor, he goes through a check list to try and determine how to treat your illness,” explained Shan.
- Pertinent Facts - Collect the pertinent facts. When dealing with big data issues, there will be numerous amounts of data and information. Not all of the data and information will be pertinent to the problem though.
- Analysis - Even with all of the pertinent facts, performing critical thinking on the facts is crucial. Asking questions and analyzing the facts may lead to some discoveries that you may have overlooked.
- Formulate Hypothesis - Ascertain what you are sure of and you are unsure of after the analysis. After this you can formulate a hypothesis on the issue.
- Design Solution - The solution will never be crystal clear and may take numerous attempts to be correct.
- Build Prototype - Many people will only see the successful end product but the majority of the time there will be many prototypes. It is important to “fail fast” so you do not waste resources on an option that might not work. This will allow you to grow faster as well.
- Develop Implementation - The final step is implementing the solution. “It is fairly easy once it is implemented. You must finish by checking operations, quality, and security,” explained Shan.
According to Shan, if you view data in a holistic approach and follow these 8 steps organizations can solve their big data issues. The session concluded with a short presentation by Casey Gwodz, senior consultant with CA Technologies, a Platinum Sponsor of Data Summit 2015, on the importance of data with context and communicating the benefits of data to business users. “Data without context is meaningless and if the data is not making the business money, it is pointless,” said Gwodz.