Big Data Guidance for Relational DBAs

Bookmark and Share

The current driving force for many IT projects is big data and analytics. Organizations are looking to exploit the growing mountain of data by creating systems of insight that can help businesses to make better business decisions. Big data analytics can be used to discover patterns in data that can be used to take advantage of heretofore unknown opportunities.

So how will the job of DBA be impacted as their companies deploy big data analytics systems? The answer, is quite a bit, but don’t forget everything you already know!

Life is always changing for DBAs. The DBA is at the center of new application development and therefore is always learning new technologies – and those technologies are not always exclusively database-related. Big data will have a similar impact. There is a lot of new technology to learn. Of course, not every DBA will have to learn each and every type of technology.

The first thing most DBAs should start learning about is NoSQL DBMS technologies. But it is important to understand that NoSQL will not be replacing relational. NoSQL database technologies (key/value, wide column, document store, and graph) are currently very common in big data and analytics projects. But these products are not designed to be wholesale replacements for the rich, in-depth technology embedded within relational systems.

The RDBMS is adaptable, reliable, and has been used for decades in Fortune 500 businesses. Relational offers stability and integrity in the form of atomicity, consistency, isolation and durability (ACID) in transactions. ACID compliance guarantees that all transactions are completed correctly and quickly. The RDBMS will continue to be the bellwether data management platform for most applications today and into the foreseeable future.

But the stability of relational comes with a cost. RDBMS offerings are costly and with a lot of built-in technology. A NoSQL offering can be lightweight, without all of the bells and whistles built into the RDBMS, thereby offering high performance and suitability for certain types of applications, such as those used for big data analytics.

That means that DBAs must be capable of managing relational as well as NoSQL database systems. And they will have to adapt as the market consolidates and the existing RDBMSes adopt NoSQL capabilities (just as they adopted Object-Oriented capabilities in the 1990s). So instead of offering only a relational database engine, a future RDBMS (such as Oracle or DB2) will offer additional engines, such as key/value or document store.

And DBAs who spend the time to learn what the NoSQL database technologies do today will be well-prepared for the multi-engine DBMS of the future. Not only will the NoSQL-knowledgeable DBA be able to help implement projects where organizations are using NoSQL databases today, but they will also be ahead of their peers when NoSQL functionality is added to their RDBMS product(s).

DBAs should also take the time to learn Hadoop, MapReduce and Spark. Hadoop is not a DBMS, but it is likely to be a long-term mainstay for data management, particularly for managing big data. An education in Hadoop and MapReduce will bolster a DBA’s career and make them more employable long-term. Spark also appears to be here for the long run, too. So learning how Spark can speed up big data requests with in-memory capabilities is also a good career bet.

It would also be a good idea for DBAs to read up on analytics and data science. Although most DBAs will not become data scientists, some of their significant users will be. And learning what your users do – and want to do with the data – will make for a better DBA.

And, of course, a DBA should be able to reasonably discuss what is meant by the term “Big Data.” Industry analyst firms have come up with their definitions of what it means to be processing “Big Data”, the most famous of which talks about the 4 “V”s: volume, variety, velocity, and veracity. As interesting as these definitions may be, and as much discussion as they create, you can’t really figure out whether you are working with big data by counting up “V”s!

Analytics and insight are the motivating factor for big data. As with all of the other systems that DBAs must manage, there is data (in this case big data) and processes/programs (in this case analytics). We don’t just store or access a bunch of data because we can, we do it to learn something that will give us a business advantage. That is the purpose of analytics. And every good DBA knows that understanding the business purpose for the data will make you a better DBA. So understanding the analytics systems and applications used on your big data is also an appropriate use of time for DBAs.

Finally, I would urge DBAs to automate as many data management tasks as possible. The more automated existing management tasks become, the more available DBAs become to learn about, and work on the newer, more interesting projects. So automating traditional and time consuming processes that must be performed on your relational systems will open up more time for you to devote to learning the new technologies being brought into your organization to develop big data analytics systems.