Cloud Computing and Advanced Relationship Analytics

Bookmark and Share

There is a wealth of information, connections and relationships within the terabytes and petabytes of data being collected by organizations on distributed cloud platforms. Utilizing these complex, multi-dimensional relationships will be the key to developing systems to perform advanced relationship analysis. From predictive analytics to the next generation of business intelligence, "walking" the social and professional graphs will be critical to the success of these endeavors.

Most applications and data layers today are only capable of simple analytics, finding out who-did-what or who-bought-what. Advanced analytics yields far more knowledge at a much deeper level, understanding who, where, what, how and why - right now. 

Google, Yahoo and other leaders in online and personalized services are looking to use complex, multi-dimensional graph data to improve everything they do - from supporting vast indexes and catalogs of content, to providing the best value and return to their advertising users. Graph technology offers a superior solution for relationship analytics requirements, enabling organizations to traverse any number and complexity of relationships in virtually any amount of distributed data, from any number of sources and types, in near real-time, and on the same commodity hardware obtained  through cloud computing platform providers.

Solve the problem.

The NoSQL (or "not only SQL") movement is defined by a simple premise: Use the solution that best suits the problem and objectives. If the data structure is more appropriately accessed through key-value pairs, then the best solution is likely a dedicated key-value pair database. If the objective is to quickly find connections within data containing objects and relationships, then the best solution is a graph database that can get results without any need for translation (Object/Relational mapping). Today's availability of numerous technologies that finally support this simple premise are helping to simplify the application environment and enable solutions that actually exceed the requirements, while also supporting performance and scalability objectives far into the future.

Cloud computing has adopted a broad variety of NoSQL technologies to support these leading-edge requirements. By using solutions designed to support specific tasks and requirements, organizations can more easily achieve often significant reductions in complexity and costs associated with their systems. And by using more targeted and capable components, these systems are also able to achieve even greater levels of performance and scalability.

Graph databases may be the most important part of the No SQL movement.

Graph databases typically solve problems related to complexity of data, while key-value and column-store solutions seek to address common issues encountered as data volumes grow in size. The technology can address both the complexity and scalability requirements to give users the best of both worlds, managing complex and big data.

Graph data is represented by nodes or vertices and edges, where any node could be connected to any number of other nodes via the edges between them.

Graph database technologies can support rapid traversal of these edges to get results in a matter of seconds (or less). Because the data is persisted where relationships are first class citizens, performance is no longer an issue. But when this type of work is done in a relational database or key-value environment, there are very expensive constraints and limitations to performance. And, of course, if the graph database architecture is distributed, then scalability limits are also addressed very nicely.

As social media, personalized web and advertising services, business intelligence and organizations in other spaces understand the importance of the deeper relationships within their data, their success in utilizing this information will depend on the technology. Trying to scale out graph data and relationship analysis using relational technology is simply not the answer. In addition, the complex custom code, high-end server hardware, map reduce layers, and administration overhead required to support these architectures can significantly increase costs and overhead. 

The days of compromising requirements to support centralized database server architectures or relational systems that simply aren't designed to solve some problems, are over. Today, there are other options and solutions.