Building RAG using Google Gemini

4 min readJun 7, 2024

In the ever-evolving landscape of AI, efficiently leveraging domain-specific knowledge without retraining large models is a game-changer. Enter Retrieval Augmented Generation (RAG) using Google Gemini, a powerful method to enhance your LLMs seamlessly. This blog post will guide you through building a simple yet effective RAG system using Google Gemini and Langchain, unlocking the potential of your LLMs with minimal effort and cost.

Before diving into the building process, it’s crucial to understand the need for Retrieval Augmented Generation (RAG). Large Language Models (LLMs) are typically trained on vast amounts of general data available on the internet. However, they often lack specialized knowledge in domains like healthcare, finance, or other specific fields. This limitation means that to use LLMs effectively in these domains, they would need to be retrained with domain-specific data — a process that demands significant time, financial resources, and infrastructure, making it impractical for smaller, specialized tasks.

RAG offers an efficient solution by enhancing LLMs with domain-specific knowledge without the need for retraining. Instead of relying solely on the pre-trained model, RAG supplements it with relevant context from a domain-specific knowledge base, enabling the LLM to generate accurate and relevant responses.

You might wonder why we can’t just feed the entire domain knowledge as context to the LLM. While this idea is valid, the reality is that LLMs have limitations on the amount of context they can handle. For example, Google Gemini, despite its advanced capabilities, can process up to 1 million tokens — an impressive capacity but still finite. Processing large amounts of context requires significant computational power and cost, which is impractical for many applications.

This is where RAG remains highly relevant. By splitting the domain text into manageable chunks and retrieving only the most pertinent information for a given query, RAG provides a cost-effective and efficient way to leverage domain-specific knowledge. Now, let’s explore how to build a RAG system using Google Gemini and Langchain.

How it works?:

First, we have to split our domain text into a small set of chunks.
Convert each chunk into embeddings using an embedding model here in our case “Google Gemini”
Store them into a vector database here in our case “Typesense”
Query vector database and get respective documents for a given question using semantic search.
Pass those documents as a context to LLM along with questions and get an answer.

Let’s code each step:

Note: We need to generate API keys from here and export them as GOOGLE_API_KEY

Split text into chunks: Splitting the text into chunks is a crucial step in building a RAG system. The way you divide the data will directly impact the relevance and accuracy of the retrieved documents for any given query, ultimately determining the quality of the output. Therefore, it is essential to perform semantic splitting, which preserves the meaning of the text within each chunk. This ensures that the chunks are contextually coherent and informative. Below code snippet, will demonstrate how to split the text from a PDF document effectively.

from PyPDF2 import PdfReader
pdfreader = PdfReader(<PDF Document which you want query>)

from typing_extensions import Concatenate
# read text from pdf
raw_text = ''
for i, page in enumerate(pdfreader.pages):
    content = page.extract_text()
    if content:
        raw_text += content

max_tokens = 1000
tokenizer = Tokenizer.from_pretrained("bert-base-uncased")
splitter = TextSplitter.from_huggingface_tokenizer(tokenizer, max_tokens)

chunks = splitter.chunks(raw_text)
chunks

Convert the text into embeddings & store into a vector database: In this step, we transform each chunk of text into a numerical vector representation, which captures the semantic meaning of the text. These embeddings are then stored in a vector database, which serves as a knowledge repository. This database can be queried to retrieve relevant information based on the semantic similarity of the vectors. Here’s how you can create embeddings and store them in the vector database:

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Typesense

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004", task_type="retrieval_document")

docsearch = Typesense.from_texts(
    res,
    chunks,
    typesense_client_params={
        "host": "localhost",  # Use xxx.a1.typesense.net for Typesense Cloud
        "port": "8108",  # Use 443 for Typesense Cloud
        "protocol": "http",  # Use https for Typesense Cloud
        "typesense_api_key": "xyz",
        "typesense_collection_name": "gemini-with-typesense",
    },
)

Query relevant documents and pass them to LLM:

from langchain.chains.question_answering import load_qa_chain

llm = ChatGoogleGenerativeAI(model="gemini-pro", convert_system_message_to_human=True)
chain = load_qa_chain(llm, chain_type="stuff")

question = "What is Scaled Dot-Product Attention?"

retriever = docsearch.as_retriever()
docs = retriever.invoke(question)
chain.run(input_documents=docs, question=question)

And the output would be this:

Scaled Dot-Product Attention computes a weighted sum of the values, 
where the weight assigned to each value is computed by a compatibility function 
of the query with the corresponding key. More specifically, it computes the 
dot product of the query with all keys, divides each by the square root 
of the dimension of the keys, and applies a softmax function to obtain 
the weights on the values.

As you can see, even though the LLM is not directly familiar with your PDF, it can provide accurate answers by leveraging the context you supplied. With RAG, you can instantly obtain answers from domain-specific texts. This approach allows you to utilize generative AI effectively for your specialized tasks without the need for extensive retraining.

I hope this guide helps you in building RAG systems and harnessing the power of generative AI for your work.

Enjoyed this article? Please share it with your network to help others learn about building RAG with Google Gemini and Langchain!

Stay connected for more insights:

Follow me on Medium for the latest articles.
Connect with me on LinkedIn for professional updates.
Join the conversation on Twitter and share your thoughts.

Building RAG using Google Gemini

Written by Siva Gollapalli

Responses (2)