Build a chat bot for your application using RAG and chatGPT

Introduction

Imagine you have an application where users typically interact through a UI. Now, you want to provide a chatbot option for users to interact with instead.

Fine-tuning a large language model (LLM) with your application's data would be impractical due to scalability issues with user interactions. Instead, we can use Retrieval-Augmented Generation (RAG) to enrich the LLM with private or proprietary data not available on the public internet, enabling it to provide useful insights specific to your application.

Architecture
Tech Stack
Setup

Architecture

The high-level architecture for RAG includes the following workflows:

Text generation workflow - The user interacts with the chatbot via text, which is then converted to embeddings using an embedding model compatible with the chosen LLM. In this example, we use OpenAI's (text-embedding-3) embeddings.
Data ingestion workflow - To supplement the LLM with application-specific knowledge, we use RAG. This involves:
- Data source - This is the proprietary data to be used for the chatbot's responses.
- Vector Store - Data from the source is broken into smaller chunks, converted into embeddings, and stored in a vector store database. Here, We'll use chroma as our vectore store.
GenAI inference workflow - The LLM uses the context to respond to the user query.

Tech Stack

Our Stack includes:

Language: python
LLM: chatGPT 4.0
LLM Embeddings: text-embedding-3-smalle
Vector store: chroma
Sdk: langchain
UI: streamlit
Application: Any that has API Endpoints

Setup

print("Hello AI")