Enabling AI for Real World Results at Data Summit Connect Fall 2020

Comprehending natural language text with its first-hand challenges of ambiguity, synonymity, and co-reference has been a long-standing problem in natural language processing.

Transfer learning uses some of the models that have been pre-trained on terabytes of data and fine-tunes them based on the problem at hand. It's the new way to efficiently implement machine learning solutions without spending months on data cleaning pipeline.

Jayeeta Putatunda, senior data scientist, Indellient US Inc., discussed how to implement language model BERT during his Data Summit Connect Fall 2020 session, “The Power of Transfer Learning in NLP using BERT.”

Videos of presentations from Data Summit Connect Fall 2020, a free series of data management and analytics webinars presented by DBTA and Big Data Quarterly, are available for viewing on the DBTA YouTube channel.

The general idea is to transfer learned feature representations from the pre-trained model (trained on the big dataset) and fine-tune the task specific data in the last layers, so as to create a new output layer.

NLP helps machines understand and communicate back in free flowing human speech, she explained. It typically integrates AI and machine learning into the technology.

It’s used for a few different purposes such as machine translation from one language to another, chatbots, natural text generation, topic modeling, and text classification, she said.

NLP works by:

  • Cleaning the data
  • Tokenization
  • Performing Spell check
  • Contraction mapping
  • Stem/Lemme
  • Reducing stop-words
  • Creating case based PP

Transfer learning is a machine learning technique where a model that is trained on one take is re-purposed on a second related task.

This method can help with speech recognition, image recognition, and text recognition, she explained.

BERT (Bidirectional Encoder Representations from Transformer) can use word embedding to help connect NLP to clusters of the same words or contextual background, she said.

The concept behind BERT means that it learns information from both left and right side of a tokens context during training.  It can also model relationships between sentences.

After Putatunda’s presentation Ben Sharma, co-founder and chief product officer, Zaloni took viewers on a journey of the technology evolution that’s enabled today’s DataOps.

He discussed how to streamline the data supply chain to efficiently and securely deliver analytics-ready data while reducing costs, during his session.

Many organizations are facing data sprawl with a lack of tools to sift through the information needed to uncover the insights they are looking for, he said.

The key aspects to streamlining dataops for analytics success includes turning multi-cloud data sprawl into an end-to-end system. Companies need to keep in mind that automated and resusable data pipelines improve efficiency.

Choosing a platform that delivers secure trusted data and enables data democratization with standard governance is key to accelerate analytics with self-service data consumption.

According to Sharma, Zaloni offers a platform called Arena that provides End-to-end DataOps built on an agile platform that improves and safeguards data assets. Another platform called EndZone provides governance that is scalable and future-proof.