<< back Page 2 of 3 next >>

Don’t Get Washed Out by the Overflowing Data Lake: 5 Key Considerations

By Steve Sarsfield

Dec 13, 2018

Much of today’s big data is time-series data, so having specific functions for time is crucial. No matter if you’re looking at IoT data, financial services data, or data from your IT infrastructure, data that is created at regular intervals presents its own challenges for data quality and analytics. For example, a handful of systems provide gap-filling functionality, constructing new datapoints through interpolation algorithms within the range of a discrete set of known datapoints. Another example is event-based windows that let you break time series data into windows that flag significant events within the data. This is especially relevant in financial data, where analysis often focuses on specific events as triggers to other potential nefarious activity. If you need to analyze time series data, make sure your analytics system has features that can actually do it. Otherwise, you may be burdened with extra custom coding and extensive data preparation.

For more articles on big data trends, access the BIG DATA SOURCEBOOK

Predictive analytics is changing the way companies across every industry operate, grow, and stay competitive. Consider early whether you need predictive analytics now, or anticipate needing it in the future, and whether your analytics architecture supports it. Organizations are applying predictive analytics to everything from improving machine uptime to reducing customer churn. With advanced platforms that offer data analytics without limits and newly announced industry solutions that enable interactive analysis of massively large datasets within the industry, analysts can now leverage SQL to natively create and deploy machine learning models based on larger datasets without down sampling to accelerate the decision making process.

3-Consider Deploying Anywhere, Not Just in the Cloud

Particularly for public cloud deployments, many analytical solutions mandate that you bring the data into the database to perform analytics. No big deal, but there’s a catch. Moving your data out of a public cloud, even for normal transactions, costs you real money for every gigabyte. It harkens back to the days when companies would buy an appliance for data warehousing and load all of their data into it. The appliance was a locked system that made it difficult to export data, and so are many cloud platforms. Watch out for systems that lock you into one solution.

It’s imperative that you choose tools that have the widest range of deployment models. It shouldn’t matter if you deploy on premise, in the cloud, and on Hadoop. You should be able to bring analytics to the data without making any copies. This can save you not only on the time and costs of moving the data but also the licensing costs to store the data in a commercial database.

Looking back 3 years at the deployment strategy then as compared to the current strategy, most companies would admit that they had no idea where they would be today. Cloud deployment is popular today, but tomorrow, who knows? If you can deploy anywhere, including on premise, in the cloud, or on virtual machines, you will have little to worry about with future deployments.

4-Consider Storing Data in Multiple Tiers, Not Just One Tier

It’s well known that storing data in different tiers offers varied costs. In-memory databases, such as Spark or SAP Hana, for example, are high-end because they require lots of expensive memory and generally more expensive hardware to run. Hadoop and S3 storage are low cost by comparison, but generally don’t offer the analytical performance of an in-memory database or columnar database. Enterprise architects need to store data on the correct tier, one that will meet the service level agreements of the enterprise while keeping costs low. Companies will frequently store data in Amazon S3 or Hadoop without knowing the value of much of it. They may peel off portions of the data to their database, usually a data warehouse, to perform analytics.

<< back Page 2 of 3 next >>