Thinkdeeply provides AI as a Service for Business Users. Our mission is to accelerate the adoption of AI/ML. We make AI Easy by simplifying AI development with No Code AI Platform. We accelerate it through our Industry Solution packs and AI Hub
We intend to discuss best practices for AI/ML adoption, latest advancements in the field. Our goal of intended audience of these posts include
Executives who are interested in learning about how AI/ML can help and accelerate its adoption
AI/ML Practitioner such as Data scientists, ML Engineers, ML Ops teams
Summary
Foundation Models and Prompt based Model Development have revolutionized the field of Natural Language Processing (NLP). Majority of ML Tasks can achieve reasonable performance with little or no additional tuning of the models. These will help to infuse AI/ML into delivering better customer experience or increased automation or generating insights. Some of the ML tasks and their (common) use cases include:
Classification
Classifying a given content (e.g. an email, document, or product) into a predefined list of classes. Alternately, generate class definitions using translation methods (i.e., translating text to a target language like taxonomy)
Entity Extraction
Extracting relevant entities (e.g., People/Places, Product Attributes, Key Facts, Tables) from various data sources such as text blogs or OCR outputs
Entity Matching
Finding and deleting duplicate records
Linking two partial records of the same subject to create a more complete record
Conversational AI
Chatbots etc.
Generative Tasks such as Translation, Generation
Translating one language into another
Changing writing style of text from one author to another
Generating an image from a text description
Summarization, Error Detection and Data Imputation and some more.
In most of the above use cases, traditionally, the field has progressed from standard tokenization/TF-IDF representation towards embeddings (either character or token embeddings or both) and lately, transformers. This progression has significantly improved accuracies while using less data. However, these are still task-specific architectures, and still require data engineering on task-specific labeled data. While the algorithm is better, the infrastructure is still a significant engineering effort and leads to siloed and hard-to-maintain systems.
Foundation Models and Prompt based Model development helps to accelerate adoption by using Task-agnostic architecture, Limited to no labeled Data. Before, we delve further, we need to understand the following concepts:
Foundation Models
Extremely large language models such as GPT-3, BERT, GPT-2 etc are not trained to answer specific instructions. These models are typically trained to predict the next word (or token) to follow a given set of text. Yet they have been shown to respond well to all sorts of instructions. The assumption is that these models have seen so much data, that not only do they understand language, but they also have seen many varied examples of responses to instructions. You can imagine GPT-3 has probablyseen conversations that include “Tell me a story: Well, once upon a time…” and “Can you tell me where the Burlington Mall is? It’s on Fifth Street.”
Zero-Shot
Zero-Shot learning allows a model, at test time, to infer responses to inputs without any training. They typically take some context and prompt and provide an inference. Here are a couple of examples:
Example 1 (Error Detection)
Context
Country: US, City: Bangkok
Prompt
Is there an error in Country?
Inference
Yes
Example 2 (Typical OCR output)
Context
Type_of_insurance
COMMERCIAL GENERAL LIABILITY: true
CLAIMS-MADE: false
OCCUR: true
PRO- JECT: false
CLAIMS-MADE: false
2500$ deductible
Prompt
What is the type of insurance?
COMMERCIAL GENERAL LIABILITY
Is it per Claim?
False
What is the deductible?
$2500
Few-Shot
If zero-shot doesn’t provide the required accuracy, we should be able to fine-tune Foundation Models with only a few examples. Few-shot learning differs from transfer learning. The goal of transfer learning is to “transfer” features to learn various downstream discriminative tasks. We may still need large amounts of labeled data for achieving desired outcomes in transfer learning. The goal of Few-Shot is to generate models that can generalize with few-samples.
Prompt based Model
Prompt based model development is a strategy where models can accept prompts as inputs and generate desired responses with little (i.e., Few-Shot) or no examples (i.e., Zero-Shot). There are some challenges in prompt based Model development (mostly, around Prompt Engineering). However, Foundation models in combination with Prompt based Models provide a great alternative that are relatively cheap in terms of time (i.e., time-to-market), cost ( requiring few labeled examples and less engineering effort).
Conclusion and Next Steps
In practice, Foundation and Prompt based Models achieve reasonable accuracies. For the Entity Extraction task, we have seen 50+% accuracy with a zero shot model (depending on the domain) and 90+% accuracy for few shot models. In the next article, we will discuss in detail on how to build and fine-tune these models.
If you are interested to know how the above technologies might help your business, please feel free to reach out info@thinkdeeply.ai for a free consultation call. Besides ideation, we have set up end-to-end pipelines for rapid prototyping to quickly validate if these approaches may help you.
Comments