Hadoop has evolved into a major part of the data management landscape over the last decade. Yet for some organizations, Hadoop has required many more resources than originally planned, both for implementation and for management.
Additionally, accessing and analyzing data from Hadoop has proven to be a challenge, with many enterprises succeeding in analyzing only a fraction of their data, with complex queries often taking way too long to execute to be effective.
DBTA recently held a webinar with David Leichner, CMO, SQream and Arnon Shimoni, Product Manager and Solution Architect, SQream, who discussed the future of Hadoop along with strategies for companies using the technology.
Hadoop is firmly entrenched in the landscape - it is deployed in thousands of organizations - but using it effectively has proved to be a key challenge, and it is not being deployed with widespread success in many organizations, Leichner and Shimoni explained.
Hadoop users are saying:
- Hadoop ecosystem is complex and expensive to manage
- Data management – 2X data engineers for every data scientist
- Steep learning curve, inefficient with small datasets, slow analytics
- Long response times on real-world data volumes
- Needs hard to find and costly skilled resources
- Configuration and scaling extremely difficult and expensive
- Hard to secure and govern
- Offers a messy mix of roles, authentication, permissions
To choose the right modernization approach enterprises need to re-engineer all applications to match the new infrastructure. Take a hybrid approach by investing in some modernization, like accelerated hardware, improved applications, some SaaS and some microservices. Applications need to be rehosted as-is on a new platform and old applications are retired.
Migrating Hadoop to the cloud is an option, Leichner and Shimoni said. The availability of latest infrastructure technology can come at a fraction of the cost of on-premises infrastructure. Data can stream seamlessly from sources to the cloud processing services. Users can scale up or out during heavy traffic periods, making it easier to process large amounts of data.
A hybrid approach is a sweet spot that most companies can see the most benefit from, Leichner and Shimoni explained.
Hadoop and SQream together can make this happen. Hadoop offers distributed, cost-effective storage, is well established as a data lake, already deployed, and is well-integrated while SQream is scalable in both compute and storage, offers high-performance ad-hoc queries, leverages existing SQL skills, and is easier to use.
An archived on-demand replay of this webinar is available here.