Retrieval-Augmented Generation (RAG) has become the gold standard for building LLM applications that need access to your own documents. But there's a problem: most RAG pipelines break down at step one — parsing and chunking complex documents.
Traditional OCR tools give you raw text with no structure. Manual chunking strategies like "split every 512 tokens" destroy semantic meaning. Tables get mangled. Images are ignored completely.
💡 The Key Insight
Retriv.ai's agentic parser understands document structure — headings, paragraphs, tables, images — and creates semantic chunks that preserve context. This is perfect for RAG.
The Problem with Traditional RAG Chunking
Let's say you're building a RAG system over medical records. A traditional approach might:
- Run OCR to get raw text
- Split the text every N characters or tokens
- Embed each chunk
- Store in a vector database
But this destroys critical information:
- A table showing lab results gets split mid-row
- A diagnosis paragraph gets cut off halfway through
- Section headings are separated from their content
- Images and charts are completely ignored
The Retriv.ai Approach
Our Parse API gives you hierarchical, semantic chunks from the start. Here's how to build a production RAG pipeline in under 10 minutes.
Step 1: Install Dependencies
pip install retriv_ai openai chromadb
Step 2: Parse Your Documents
from retriv_ai import RetrivClient
import os
# Initialize client
client = RetrivClient(api_key=os.environ['RETRIV_API_KEY'])
# Parse a complex document
result = client.parse(file_path="medical_record.pdf")
# Get semantic chunks
chunks = result.chunks
# Each chunk includes:
# - type: 'text', 'table', 'image'
# - content: the actual content
# - metadata: page number, hierarchy, etc.
# - embedding_ready: pre-formatted for your vector DB
print(f"Found {len(chunks)} semantic chunks")
for chunk in chunks[:3]:
print(f"- {chunk.type}: {chunk.content[:100]}...")Step 3: Embed and Store
import chromadb
from openai import OpenAI
# Initialize vector DB
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("medical_records")
# Initialize OpenAI for embeddings
openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
# Embed each chunk
for i, chunk in enumerate(chunks):
# Get embedding
embedding_response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=chunk.embedding_ready # Pre-formatted by Retriv
)
embedding = embedding_response.data[0].embedding
# Store in Chroma
collection.add(
embeddings=[embedding],
documents=[chunk.content],
metadatas=[{
"page": chunk.page,
"type": chunk.type,
"source": "medical_record.pdf"
}],
ids=[f"chunk_{i}"]
)
print("✅ All chunks embedded and stored!")Step 4: Query Your RAG System
# User question
question = "What were the patient's glucose levels?"
# Get embedding for question
q_embedding = openai_client.embeddings.create(
model="text-embedding-3-small",
input=question
).data[0].embedding
# Search vector DB
results = collection.query(
query_embeddings=[q_embedding],
n_results=3
)
# Build context from results
context = "\n\n".join(results['documents'][0])
# Generate answer with GPT-4
response = openai_client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a medical assistant. Answer based on the provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
]
)
print(response.choices[0].message.content)Why This Works Better
Real-World Results
We tested this approach with a healthtech startup building clinical decision support. After switching from traditional OCR + chunking to Retriv.ai's semantic parsing:
- Answer accuracy improved by 43% (measured by clinical expert review)
- Setup time reduced from 2 weeks to 1 day
- Token costs dropped 35% (more relevant chunks = fewer retrieved)
Next Steps
This tutorial shows the basics, but there's much more you can do:
- Use Retriv's visual grounding to show users exactly where answers came from
- Build hybrid search (keyword + semantic) using chunk metadata
- Process thousands of documents asynchronously with the Parse Jobs API
- Fine-tune retrieval by filtering on chunk type (tables only, for example)
Ready to Build Your RAG Pipeline?
Get your free API key and start processing documents in minutes.
Get Started Free