▼ Scroll to Site ▼

Newsletters




Hadoop

The Apache Hadoop framework for the processing of data on commodity hardware is at the center of the Big Data picture today. Key solutions and technologies include the Hadoop Distributed File System (HDFS), YARN, MapReduce, Pig, Hive, Security, as well as a growing spectrum of solutions that support Business Intelligence (BI) and Analytics.



Hadoop Articles

What does the Oracle DBA need to know about NoSQL? Charles Pack, technical director, CSX Technology, answered that question in a session at Data Summit 2016 titled "Oracle NoSQL for the Oracle RDBMS DBA." The Oracle NoSQL Database offers benefits and Oracle DBAs have the opportunity to add it to their portfolios, according to Pack, who covered where the NoSQL database fits in within the overall Oracle ecosystem. "It is not a replacement for all your databases. It is a piece in the puzzle," emphasized Pack.

Posted May 10, 2016

At the center of the new big data movement is the Hadoop framework, which provides an efficient file system and related ecosystem of solutions to store and analyze big datasets. The Hadoop ecosystem was addressed from two points of view in a session at Data Summit 2016. James Casaletto, principal solutions architect, Professional Services at MapR, presented a talk titled "Harnessing the Hadoop Ecosystem," and Tassos Sarbanes, mathematician / data scientist, Investment Banking at Credit Suisse, covered the advantages of HBase in a talk titled "HBase Data Model - The Ultimate Model on Hadoop."

Posted May 10, 2016

Despite the increasing focus on offering more access to more users in organizations, ad hoc querying of big data remains a problem for most, according to Jair Aguirre, data scientist at Booz Allen Hamilton, who presented a session at Data Summit 2016 titled "De-Siloing Data Using Apache Drill."

Posted May 10, 2016

IT and businesses don't always see eye to eye when it comes to overall goals within an enterprise. To address this glaring issue, Anne Buff, business solutions manager and thought leader for SAS Best Practices, a thought leadership organization at SAS Institute, discussed aligning data strategy goals at Data Summit 2016.

Posted May 10, 2016

Data Summit 2016 kicked off at the New York Hilton Midtown earlier this month with keynote presentations by Ben Wellington, the creator of I Quant NY, and Nicholas Chandra, vice president of Cloud Customer Success at Oracle.

Posted May 10, 2016

EMC Corp.'s Enterprise Content Division (ECD) is releasing an upgraded version of its EMC InfoArchive platform, enhancing the ability to secure and leverage large amounts of critical data and content.

Posted May 09, 2016

Sinequa has announced the general availability of Sinequa ES Version 10. Powered by machine learning capabilities, the new version aims to deliver deep analytics of contents and user behavior, and offer information with continually improving relevance to users in their work environments. In order to achieve this advancement into the world of cognitive computing, with this new version, Sinequa has integrated the Spark platform in its distributed architecture and implemented machine learning algorithms on Spark within the core of its product

Posted May 05, 2016

Say what you will about Oracle, it certainly can't be accused of failing to move with the times. Typically, Oracle comes late to a technology party but arrives dressed to kill.

Posted May 04, 2016

Oracle database migration can pose a variety of learning curve challenges. However, a platform does exist that can make the transition easier. In a recent DBTA webinar, Bill Brunt, product manager of SharePlex at Dell, discussed how users can reduce downtime, migrate at speed, eliminate risk, and validate success by tapping into SharePlex.

Posted May 03, 2016

The new name for Dell after it merges with EMC later in 2016 will be Dell Technologies. The new name was announced by Michael Dell, chairman and CEO of Dell Inc., at EMC World and in a letter to Dell team members.

Posted May 02, 2016

Qubole is announcing two major changes. It is releasing an open sourced version of its StreamX tool and forming a partnership with Looker.

Posted May 02, 2016

Enterprises are constantly searching for ways to capture, leverage, and analyze data effectively. However, bottlenecks can wreak havoc on the application development process.

Posted April 29, 2016

Enabled by a partnership with Pentaho, a Hitachi Group Company, and integration with Pentaho's Big Data Integration and Analytics platform, Melissa Data's data quality tools and services can now be scaled across the Hadoop cluster to cleanse and verify data center records.

Posted April 27, 2016

Cisco is launching an appliance that includes the MapR Converged Data Platform for SAP HANA, making it easier and faster for users to take advantage of big data. The UCS Integrated Infrastructure for SAP HANA is made easy to deploy, speeds time to market, and will reduce operational expenses along with providing users with the flexibility to choose a scale-up (on-premises) or scale-out (cloud) storage strategy.

Posted April 27, 2016

Voting has opened for the 2016 DBTA Readers' Choice Awards. Cloud, in-memory, real-time, virtualization, SaaS, IoT - today, there are many opportunities for data-driven companies to take advantage of more data in more varieties flowing at greater velocity than ever before.

Posted April 27, 2016

Cloudera, provider of a data management and analytics platform built on Apache Hadoop and open source technologies, has announced the general availability of Cloudera Enterprise 5.7. According to the vendor, the new release offers an average 3x improvement for data processing with added support of Hive-on-Spark, and an average 2x improvement for business intelligence analytics with updates to Apache Impala (incubating).

Posted April 26, 2016

Neo Technology, creator of Neo4j, is releasing an improved version of its signature platform, enhancing its scalability, introducing new language drivers and a host of other developer friendly features.

Posted April 26, 2016

Along with an increasing flow of big data that needs to be captured and analyzed, IT departments today also have more solution choices than ever before. However, before making a solution selection, organizations need to understand their requirements and also evaluate the attributes of the possible tools.

Posted April 25, 2016

The COLLABORATE 16 conference for Oracle users kicked off with a presentation by Unisphere Research analyst Joe McKendrick who shared insights from a ground-breaking study that examined future trends and technology among 690 members of three major Oracle users groups.

Posted April 25, 2016

The need for data integration has never been more intense than it has been recently. The Internet of Things and its muscular sibling, the Industrial Internet of Things, are now being embraced as a way to better understand the status and working order of products, services, partners, and customers. Mobile technology is ubiquitous, pouring in a treasure trove of geolocation and usage data. Analytics has become the only way to compete, and with it comes a need for terabytes—and gigabytes—worth of data. The organization of 2016, in essence, has become a data machine, with an insatiable appetite for all the data that can be ingested.

Posted April 25, 2016

GridGain Systems, provider of enterprise-grade in-memory data fabric solutions based on Apache Ignite, is releasing a new version of its platform. GridGain Professional Edition includes the latest version of Apache Ignite plus LGPL libraries, along with a subscription that includes monthly maintenance releases with bug fixes that have been contributed to the Apache Ignite project but will be included only with the next quarterly Ignite release.

Posted April 20, 2016

Dataguise, a provider of data security solutions, is making DgSecure available for the detection, monitoring, and protection of sensitive data across Amazon Web Services (AWS) Simple Storage Service (S3) and all Elastic MapReduce (EMR) platforms that use AWS S3.

Posted April 19, 2016

Sumo Logic, a provider of cloud-native, machine data analytics services, is unveiling a new platform that natively ingests, indexes, and analyzes structured metrics data, and unstructured log data together in real-time.

Posted April 18, 2016

Hortonworks is making several key updates to its platform along with furthering its mission as being a leading innovator of open and connected data solutions by enhancing partnerships with Pivotal and expanding upon established integrations with Syncsort.

Posted April 15, 2016

Everyone within an enterprise agrees that data is an asset, but it's what to do with it that causes divisiveness between business leaders and IT personnel.

Posted April 15, 2016

First created as part of a research project at UC Berkeley AMPLab, Spark is an open source project in the big data space, built for sophisticated analytics, speed, and ease of use. It unifies critical data analytics capabilities such as SQL, advanced analytics, and streaming in a single framework. Databricks is a company that was founded by the team that created and continues to lead both the development and training around Apache Spark.

Posted April 14, 2016

There are many different definitions of the term "big data," some of them reasonable, others not so much. However, the overriding issue for many data professionals, especially those who use more traditional data management tools, is confusion about what to do with big data and how to get the most out of it.

Posted April 08, 2016

Thanks to the digital business transformation, the world around us is changing—and quickly—to a very consumer- and data-centric economy, where companies must transform to remain competitive and survive. The upshot is that for many companies today, it is a full-on Darwinian experience of survival of the fittest.

Posted April 08, 2016

Qubole, a big data-as-a-service company, is open sourcing its Quark platform, a cost-based SQL optimizer. The Quark project is also available in a SaaS implementation via the Qubole Data Service (QDS).

Posted April 07, 2016

IBM says it is making it easier and faster for organizations to access and analyze data in-place on the IBM z Systems mainframe with a new z/OS Platform for Apache Spark. The platform enables Spark to run natively on the z/OS mainframe operating system.

Posted April 04, 2016

Databricks, the company behind Apache Spark, is releasing a new set of APIs that will enable enterprises to automate their Spark infrastructure to accelerate the deployment of production data-driven applications.

Posted April 01, 2016

ManageEngine is introducing a new application performance monitoring solution, enabling IT operations teams in enterprises to gain operational intelligence into big data platforms. Applications Manager enables performance monitoring of Hadoop clusters to minimize downtime and performance degradation. Additionally, the platform's monitoring support for Oracle Coherence provides insights into the health and performance of Coherence clusters and facilitates troubleshooting of issues.

Posted April 01, 2016

It's become almost a standard career path in Silicon Valley: A talented engineer creates a valuable open source software commodity inside of a larger organization, then leaves that company to create a new startup to commercialize the open source product. Indeed, this is virtually the plot line for the hilarious HBO comedy series, Silicon Valley. Jay Krepes, a well-known engineer at LinkedIn and creator of the NoSQL database system, Voldemort, has such a story.

Posted March 31, 2016

With well over a hundred open source projects now part of the Hadoop ecosystem, it can be hard to know which technologies are best for which requirements. To help users get started with Hadoop and understand their technology choices, James Casaletto will present "Harnessing the Hadoop Ecosystem" at Data Summit 2016 in NYC. Casaletto is a solutions architect for MapR, where he develops and deploys big data solutions with Apache Hadoop.

Posted March 31, 2016

What can you learn from the structure of an email and what really constitutes as a "good" post? What kinds of data can you grab to create the best marketing campaign? Matt Laudato will address those questions during his presentation, titled "Supercharging Your Marketing with Big Data," at Data Summit 2016 in NYC.

Posted March 31, 2016

Denodo, a provider of data virtualization software, is releasing Denodo Platform 6.0, further accelerating its "fast data" strategy. "It's a major release for us," said Ravi Shankar, Denodo CMO. There are three important areas that nobody else is focusing on in the industry, he noted. "This, we hope, will change how data virtualization, and in a broader sense, data integration will shape up this year."

Posted March 31, 2016

NoSQL databases were born out of the need to scale transactional persistence stores more efficiently. In a world where the relational database management system (RDBMS) was king, this was easier said than done.

Posted March 29, 2016

MapR is now available as part of Bigstep's big data platform-as-a-service, supporting a wide range of Hadoop applications.

Posted March 29, 2016

Reltio is releasing an enhanced version of Reltio Cloud 2016.1, adding new analytics integration, collaboration, and recommendation capabilities to help companies be right faster.

Posted March 29, 2016

Teradata has introduced a new "design pattern" approach for data lake deployment. The company says its concept of a data lake pattern leverages IP from its client engagements, as well as services and technology to help organizations more quickly and securely get to successful data lake deployment.

Posted March 28, 2016

The data lake has been the subject of more than its fair share of critics since its inception. Pundits claim it's a source of chaos and risk. Analysts often slam the concept, calling it a "data swamp" or "data dump." As a result of this scrutiny, the definition and understanding of the definition of the data lake are rather murky.

Posted March 24, 2016

The rise of big data technologies in enterprise IT is now seen as an inevitability, but adoption has occurred at a slower pace than expected, according to Joe Caserta, president and CEO of Caserta Concepts, a firm focused on big data strategy consulting and technology implementation. Caserta recently discussed the trends in big data projects, the technologies that offer key advantages now, and why he thinks big data is reaching a turning point.

Posted March 23, 2016

Informatica has launched an end-to-end solution to help customers gain greater insight from big data.

Posted March 23, 2016

SAP SE's newest in memory query engine, SAP HANA Vora, is now generally available, equipping enterprises with contextual analytics across all data stored in Hadoop, enterprise systems, and other distributed data sources.

Posted March 23, 2016

Pages
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Sponsors