Research@DBTA: Hadoop Increasingly Landing in the Cloud, But the Ride Is Bumpy

One of the challenges of working with Hadoop environments has been maintaining the infrastruc­ture for big data projects. That’s where cloud makes things easier and, increas­ingly, has served as the underlying infra­structure platform of choice for Hadoop initiatives. At the same time, not every­thing has moved to the cloud just yet for big data environments. Many IT managers expect to live in a hybrid environment. They are planning for multi-cloud data management to deliver business value and are also still relying on old-school approaches and manual tools to support their data environments.

These are some takeaways from a recent survey of 216 data managers fielded by Unisphere Research, a division of Infor­mation Today, Inc., in partnership with Radiant Advisors, and sponsored by WAN­disco. The survey collected responses from data users already working with Hadoopor considering a Hadoop data migration to the cloud, including those who have previously migrated their Hadoop deploy­ment (“2021 Hadoop-to-Cloud Migration Benchmark Report”).

The survey found that most IT leaders are pursuing cloud migration to lay the foun­dation for future business value creation, including cloud-scale analytics and data modernization strategies. The survey also advises that we should expect a significant wave of Hadoop data migrations. At present, 21% of enterprises in the survey now have Hadoop running in the cloud, and another 32% are either planning to move or in the process of moving Hadoop to the cloud.

At the same time, Hadoop-to-cloud migration projects can take a toll on enter­prises. About one-quarter of respondents indicated their projects were on time and on budget. Close to one-third, 31%, said their projects went over budget. Addi­tionally, 44% said their efforts took longer than originally planned.

The two biggest factors impacting the cost of overall Hadoop migration projects came from the complexities of handling on-prem data changes during migration and from acquiring the IT resources to perform and manage data migration. These two chal­lenges were followed closely by the cost of custom code development and maintenance of data migration scripts/programs.

The survey revealed that the challenges due to handling custom code development and the complexity of on-premdata changes during the migration “most likely led to the 55% of completed projects being over-time or over-budget,” noted the survey report’s authors, John O’Brien and Lindy Ryan, both with Radiant Advisors. “This aligns to the leading concerns companies have when planning to migrate their Hadoop data to the cloud—business disruption due to down­time, hidden costs and complexities, loss of critical business data during migration, and custom code development. Utilizing a Hadoop data migration-to-the-cloud tool would have likely controlled costs while elim­inating complexities and potential loss of data changes during migration.”

After their companies complete the Hadoop data migration to the cloud, 42% of respondents said they planned to main­tain a hybrid environment for the next 1–3 years, while 36% of respondents plan to maintain it indefinitely. The remaining 21% of respondents plan to shut down their on-prem Hadoop as soon as possible follow­ing cloud migration.

There is considerable manual work associated with Hadoop onsite-to-cloud migrations. For recent and planned Hadoop migrations to the cloud, 50% of respon­dents planned to handle, or have handled, data changes during migration with the use of software to automate migration of data changes. In contrast, 33% of respondents planned to manually reconcile any data changes, and 14% of respondents opted to not allow data changes during migration. Only 3% of respondents have no plan to rec­oncile the changes.

When asked which cloud providers respondents were leveraging at the time of the survey, the three major public cloud vendors led the way with Amazon Web Ser­vices (48%) followed by Microsoft Azure (38%) and Google Cloud Platform (21%). Other cloud vendors selected by respondents included Oracle (12%), IBM (10%), and Ali­baba (4%).