The Advantages of Using Structured Data for E-Discovery

While emails have been "the smoking gun" in many recent court cases, the new big wave in what is "discoverable" is structured (database) data.  Accessing data is simpler and much faster from structured data than non-structured data. If the response to e-discovery can come from a structured data format, it is usually much faster than the alternatives and can mitigate the risk of steep fines due to delayed response time.

A new combination of stringent regulations and new technology are giving judges and litigators more muscle to subpoena more data. Structured data in any application and database, no matter how old or obsolete, can be used in court as evidence, and increasingly it is being asked for. For example, in a wrongful termination suit, a judge can request an audit of PeopleSoft HR application from a decade ago, including any comments in the free text fields. Litigation aside, companies also often need to engage in internal investigations around personnel, security and corporate policy matters, and require the ability to search through an awful lot of data from systems of record.

This leads to a rather big question: How can all this data, some of it years old and residing in obsolete or dormant systems, be cost-effectively stored and kept available for ready access in the event of legal actions and investigations? It's an increasingly difficult problem to solve as e-discovery becomes less limited in scope. Soon it will encompass everything under the sun. Already the cost of being able to deliver all the required information is overwhelming, and the penalties for not delivering it quickly and efficiently are even more crushing.

Imagine you are a pharmaceutical company named in a major class action suit requiring you to produce every relevant piece of data for a certain drug, from development to clinical testing to marketing. Can you produce all the subpoenaed data in just 48 hours? Can you imagine swallowing $1 million penalties for each day you fall behind in doing so? What happens when you suddenly uncover relevant data a week or two late? That's not farfetched when the data is in old systems that are rarely or no longer in use, which is frequently the case.

Legacy/redundant systems present their own challenges for e-discovery.  They are usually kept around exclusively to provide access to their data for such things as reporting, e-discovery, etc.  Keeping these legacy/redundant systems running can be expensive because of the costs of software maintenance and licenses, application and infrastructure support, dependency on obsolete technical and application skills (that become rarer as time progresses). If the system goes down, and if the support person responsible has just left the company or is on vacation and there is an investigation that requires response within 48 hours, there is a huge risk of incurring large fines. The way to retain  the data, keep it accessible, yet retire the application, database and its entire infrastructure is by archiving the application data to a common format. This could save your organization literally millions of dollars in maintenance fees, database licenses, infrastructure costs, and steep legal fines.

Live application archiving is relocating inactive data from your live production systems (transactional and/or data warehouse) on a schedule based on predefined retention policies (as defined by regulations and business requirements) to minimize production database growth, while maintaining easy access to the archived data. Application retirement and live application archiving are two separate use cases. However, the solution based on database archiving is the same. In both cases, archiving converts the archived data to a common format. The data archive is a format that is fully compliant, highly compressed (up to 98%), secure, and easily accessible. These optimized archive files are universally accessible via ODBC/JDBC compatible analytics/reporting tools.  The archive files are storage independent and can be placed on any file system. There are tools that allow you to rapidly build interfaces for accessing the archive that can be similar to your retired application retaining much of the application familiarity for the users. 

Why Structured Data?

Enterprises today simply need to preserve more information, without overloading their systems. Whether the data is stored on premise or in the cloud, structured data is a critical part of an organization's managed information that needs to be considered  in responding to any legal discovery. Structured data is both queried and archived as business entities and can be retrieved as a complete response to legal discovery (i.e., complete invoices, purchase orders, etc.). As the courts become savvy about digital data, they will increasingly expect legal discovery responses from application databases. Structured data adds to the breadth of information provided for the response, mitigating risk for further auditing/discovery. Some data archiving solutions include the capability of automating the retention management process, ensuring retention of appropriate data for compliance as well as ensuring deletion of expired data that can be a liability to the company.

Your checklist for structured data e-discovery

Here's what you want to look for when implementing archiving for e-discovery:

  • On-demand data access - Your archived data should be fully accessible, on-demand, via standard SQL querying, standard (ODBC/JDBC compatible) reporting and discovery tools including your enterprise-standard e-discovery tool.
  • Ability to establish a chain of custody - You want to be able to facilitate the establishment of a chain of custody for legal cases by auditing, and validating the archival process, data access, retention, purging, and hold activities for the archived data.
  • Strong data security and full integrity - Data security/privacy should be maintained through end-to-end encryption, and data integrity ensured through application-specific business rules.
  • Retention management at a granular level - Retention management including data disposal should be available down to the granular business entity level.
  • Ensured performance - Time counts in e-discovery, so you want a structured data archiving solution that provides performance that meets eDiscovery legal timeframes.
  • High compression ratios - Reducing storage space, cost and maintenance
  • Broad platform support - This allows you to be able to archive data from any database platform
  • Archiving of all structured data types - It doesn't matter what form the data is in, it can be archived and accessible for e-discovery.
  • Federation capability - Archive complete business entities from multiple data sources even if spread across multiple database platforms
  • Support for legal hold -  Prevents deletion of expired data when they are relevant to legal cases

Data archiving solutions supporting e-discovery need to incorporate all of the features mentioned above.


Archiving structured data can speed up e-discovery, ensure compliance, reduce application overhead, and improve production system performance. A comprehensive data archiving strategy and solution needs to be an integral part of your e-discovery strategy as more structured data from more systems is required as more regulations are published. The amount of data that you will have to keep accessible for e-discovery is only going to continue mounting, and you are not going to be able to keep it in production systems and databases forever. Archiving your structured database and application data onto lower cost storage with significant compression, full query access and audit-ability, and built-in retention management is the only way out of the increasing e-discovery dilemma of massive data growth combined with stiffening e-discovery requirements.

About the author

Adam Wilson is senior vice president and general manager, Information Lifecycle Management at Informatica. He can be followed on Twitter @a_adam_wilson