Large AI models are capable of handling text generation on a wide range of topics, but model knowledge can only be based on publicly available data that they were trained on. If you want to build AI applications that can use private data or real-time data for inference, you need to augment the knowledge of the model with specific information. Retrieval augmented generation (RAG) is used to retrieve and insert relevant information into the input of the model.
In this article, we will introduce how to develop a simple RAG Q&A application using Langchain. We'll walk through a typical Q&A architecture in turn, discuss the relevant langchain components, and show how to track and understand our application.RAG is a technology that combines retrieval and generation, which allows large models to utilize additional data sources when generating text, improving the quality and accuracy of generation. The basic process of RAG is as follows:
First, given a user's input, such as a question or a topic, RAG retrieves a piece of text related to it from a data source, such as a web page, document, or database record. These text fragments are called contexts. RAG then stitches together the user's input and retrieved context into a complete input that is passed to a large model, such as GPT. This input will usually contain prompts that guide the model on how to generate the desired output, such as an answer or a summary. Finally, the RAG extracts or formats the required information from the output of the large model and returns it to the user. LangChain is a platform that focuses on large-scale application development, and it provides a series of components and tools to help you easily build RAG applications. LangChain provides the following components to help you build RAG applications:
DocumentLoader: A data loader is an object that loads data from a data source and converts it into a document object. A document object contains two properties: page content(str) and metadata(dict). Page content is the text content of the document, and metadata is the metadata of the document, such as title, author, date, etc. DocumentsPlitter: A text splitter is an object that can split a document object into multiple smaller document objects. The purpose of this is to facilitate subsequent retrieval and generation, as the input window of the large model is limited, and it is easier to find relevant information in shorter texts. Text embeddings: A text embedding is an object that converts text into an embedding, a high-dimensional vector. Text embedding can be used to measure the similarity between texts, so as to achieve the function of retrieval. Vector Store: A vector store is an object that can store and query embeddings. Vector memory typically uses some indexing techniques, such as faiss or annoy, to speed up the retrieval of embeddings. Retriever: A retriever is an object that can return related document objects based on a text query. A common implementation of a retriever is the vectorstoreretriever, which uses the similarity search function of vector memory to achieve retrieval. ChatModel: A chatmodel is an object that can generate an output message based on a sequence of inputs. Chat models are often based on large models, such as GPT-3, to implement text generation capabilities. The general process of building a RAG application with LangChain is as follows:
First, we need to load our data. We can do this by using a data loader, choosing the right data loader depending on the type of data source. For example, if our data source is a web page, we can use WebBaseLoader, which can use urllib and beautifulsoup to load and parse the web page, returning a document object. Then, we need to split our document object into smaller document objects. We can use a text splitter to achieve this step, choosing the appropriate text splitter according to the characteristics of the text. For example, if our text is a blog post, we can use RecursiveCharacterTextsPlitter, which can recursively use common separators (like line breaks) to split the text until the size of each document object is as good as it requires. Next, we need to convert our document object to embedding and store it in vector memory. We can achieve this step using text embedders and vector memory to choose the appropriate text embedding and vector memory depending on the quality and speed of the embedding. For example, if we want to use OpenAI's embedding model and Chroma's vector memory, we can use OpenAiEmbeddings and Chrom**ECTORSTORE. Then, we need to create a retriever that retrieves the relevant document objects based on the user's input. We can do this using a vector memory retriever - passing a vector memory object and a text embedder object as arguments to create a vector memory retriever object. Finally, we need to create a dialog model that generates an output message based on the user's input and the retrieved document object. We can use the chat model provided by Langchain to achieve this step, and choose the appropriate chat model according to the performance and cost of the model. For example, if we want to use OpenAI's GPT-3 model, we can use OpenAICHATModel. Here's an example of building a RAG application using Langchain
Import langchain's library.
from langchain import *
Load the data source.
loader = webbaseloader()
doc = loader.load("")
Split the document object.
splitter = recursivecharactertextsplitter(max_length=512)
docs = splitter.split(doc)
Convert document objects to embeddings and store them in vector memory.
embedder = openaiembeddings()
vector_store = chrom**ectorstore()
for doc in docs:
embedding = embedder.embed(doc.page_content)
vector_store.add(embedding, doc)
Create a retriever.
retriever = vectorstoreretriever(vector_store, embedder)
Create a conversation model.
prompt = hub.pull("rlm/rag-prompt")
llm = chatopenai(model_name="gpt-3.5-turbo", temperature=0)
Create a Q&A app.
def format_docs(docs):
return "".join(doc.page_content for doc in docs)
rag_chain = (
prompt
llm| stroutputparser()
Launch the app.
rag_chain.invoke("what is main purpose of xxx.html?")
The combination of LangChain and RAG can bring the following advantages:
Flexibility: You can customize your RAG application by choosing different components and parameters based on your needs and data sources. You can also use custom components, as long as they follow Langchain's interface specifications. Scalability: You can use Langchain's cloud services to deploy and run your RAG application without worrying about resource and performance limitations. You can also use Langchain's distributed computing capabilities to accelerate your RAG application, taking advantage of the parallel processing power of multiple nodes. Visualization: You can use Langsmith to visualize the workflow of your RAG application, viewing the inputs and outputs of each step, as well as the performance and status of each component. You can also use Langsmith to debug and optimize your RAG application, identifying and resolving potential issues and bottlenecks. The combination of LangChain and RAG can be applied to a variety of scenarios, such as:
Professional Question Answering: You can use LangChain and RAG to build a Q&A application in a professional domain, such as healthcare, law, or finance. You can retrieve relevant information from data sources in your domain to help the large model answer the user's questions. For example, you can search for disease diagnoses and protocols from the medical literature to help large models answer medical-related questions. Text summarization: You can use LangChain and RAG to build a text summarization application, such as a news digest or a digest. You can retrieve relevant text from multiple data sources to help the large model generate a comprehensive summary. For example, you can retrieve stories about the same event from multiple news** to help the large model generate a comprehensive summary. Text generation: You can use LangChain and RAG to build a text generation application, such as poetry generation or story generation. You can retrieve inspiration from different data sources to help large models generate more interesting and creative text. For example, you can retrieve related text from poems, lyrics, or ** to help the large model generate a poem, a song, or a story. In this article, we have introduced how to develop a simple Q&A application using LangChain. We introduced the basic concepts and benefits of RAG and discussed the related LangChain components. We also introduce the advantages and application scenarios of the combination of LangChain and RAG.
We hope that this article will help you understand the potential and value of the combination of LangChain and RAG and encourage you to try developing your own application with LangChain and RAG. If you have any questions or suggestions, please feel free to contact us, we look forward to communicating and cooperating with you.
References: 1] Langsmith langsmith documentation. langchain. langchain documentation. chroma. chroma vector store.