When Archives Are Arcane: Communications Repositories Built a Decade Ago for SOX May Fail You Today

As the old saying goes, the more things change, the more they stay the same. Many companies have been in a frenzy the past 2 years preparing for regulations that are either turning or about to turn the corporate world upside down. Financial services firms have been scrambling to deliver the transparency required by the second Markets in Financial Instruments Directive (MiFID II), while everyone else serving EU residents has been lining their ducks in a row to comply with the General Data Protection Regulation (GDPR).

Thanks to the near-meltdown of our financial system a decade ago and proliferating security hacks, breaches, and scams that are making cyberspace seem similar to a scene out of Mad Max, governments have raised the compliance bar. As a result, companies are still figuring out what changes they need to make in order to meet the new standards. 

But for all that’s different in these new directives, one central tenet remains core to regulatory compliance: Companies need to track employees’ digital communications so that they can get to the bottom of real or alleged negligence or malfeasance, should regulators come calling.

The more things change, the more they stay the same.

Today’s Dynamic Messaging Options Cause "Static" for Yesterday’s Archives

Of course, retrieving correspondence is a much different animal today. Those once state-of-the-art archives that the world’s biggest banks, healthcare networks, and other regulated companies built in the early 2000s to comply with the Sarbanes-Oxley Act (SOX), the Health Insurance Portability and Accountability Act (HIPAA), and Securities and Exchange Commission (SEC) codes of that time are now unwieldy thanks to the changing nature of communications technology.

Back then, IT was charged with finding storage space for what was at the time considered voluminous numbers of email messages. The great archiving challenge of the day arguably was figuring out how to record messages sent through then-newfangled instant messenger (IM) applications, such as AOL IM (AIM), Yahoo Messenger, and MSN Messenger, which were designed to vanish without a trace. Either way, compliance functions were dealing solely with static text-based messages.

Fast-forward a decade and a half, and now organizations need to account for dynamic social media posts, unified communication (UC), video conferencing, personal mobile devices, and other forms of communication that all contain emojis, memes, links, and other non-text interaction. Now, not only are IT and compliance teams challenged to find space for exponentially growing volumes of communications data, they must have a way of retrieving and organizing it in chronological order and in context.

Linear Archives Struggle With Labyrinthine Paths of Digital Conversations

It is no small task to organize digital communication today, especially considering how a basic dialogue between, say, a stock trader and a client commonly plays out today. A broker might start a conversation about a trade with a client over email, follow up with a Skype message, and then take the conversation to a personal cellphone. Along the way, that trader may add the client to a post on LinkedIn about the stock in question, and then revise or even delete that post after another round of emails and voicemails.

Where finding a place to simply store static emails and IMs was once half the battle, saving these new types of messages is now only a small first step. If a regulator were ever to question anything related to this trade, the employer would need to do more than just pluck these messages out of an archive, it would have to reassemble the parts of conversation as if they were pieces of a jigsaw puzzle. Unfortunately, reconstructing all of this activity is difficult if only the text of the messages is preserved and the messages themselves are stored in no particular order.
LinkedIn, Facebook, Microsoft Teams, WhatsApp, WeChat, Slack, Jabber, and Facebook Workplace simply don’t work that way. 

Fail to piece these threads together in a manner that makes the intent and proper context of all communication clear, and you could spend months and thousands of dollars arguing over the implication of a sentence or paragraph. It’s critical to minimize the number of instances communication is open to interpretation—emojis alone are currently the source of significant legal headaches for many companies.

Assembling the Conversation

So, how do organizations catalog this activity? First, their archives need to evolve away from a text-centric orientation, just as digital communication itself has over the years. This does not mean that traditional email channels should be de-emphasized—on the contrary, email is still the primary mode of communication for many. However, archives must also be able to capture “snapshots” of posts made on popular social media apps such as LinkedIn or Facebook, images of all kinds included in IMs and UC messages, and unstructured files and documents that enter discussions taking place on collaboration platforms.

Equally important, archives now must optimize the metadata of timestamps, images, videos, spreadsheets, and webpages contained in correspondence made over popular applications and services, such as Jive, Cisco Spark, SharePoint, Slack, Microsoft Teams, and Jabber. This helps preserve changes, revisions, deletions, and insertion of embedded files in the order they were made on these channels and in the context of corresponding email communication. In addition, identity management and data mapping need to be unified in order to expediently link Facebook, LinkedIn, Twitter, Instagram, Facebook Workplace, Slack, and WeChat social profiles and user accounts to correspondence. Without this, you would be left with nothing but text of these messages in no particular order. Again, this won’t help you much in a legal setting; if you are in litigation, you can find yourself on shakier legal ground if all you did was throw the Lego pieces into a bucket rather than assembling the entire structure, so to speak.

As is the case with just about every other enterprise technology, analytics is an indispensible part of archiving today. AI and machine learning are necessary to automatically detect suspicious patterns of behavior amidst terabytes of data in real time. Big data applications are capable of smelling the smoke of fires caused by IP leakage, money laundering, bribery, and other forms of white-collar crime that manual keywords simply can’t detect. Having said that, no regulated business should halt their own search terms and put all of their detection eggs in the advanced analytics basket—those same messages, posts, and files concerning sensitive topics that were forbidden 10 years ago still must be blocked today. Those rules that have prohibited certain teams, users, or groups from communicating with each other (i.e., traders and researchers in an investment firm) via email and IM all of these years must be applied to these newer channels. 

These modern communications tools bring new legal and technological challenges. Courts are still establishing norms as they relate to retrieving and interpreting multidimensional image-heavy digital workplace correspondence. For organizations, it is a different beast gathering this type of data for e-discovery. Nevertheless, it will always be true that the financial services firms, healthcare entities, governments, and other regulated organizations that produce these communications as clearly and quickly as possible have the best chance of avoiding regulatory fines.

The more things change, the more they stay the same.


Subscribe to Big Data Quarterly E-Edition