Retriv.ai – Agentic Document Extraction

Retrieval-Augmented Generation (RAG) has become the gold standard for building LLM applications that need access to your own documents. But there's a problem: most RAG pipelines break down at step one — parsing and chunking complex documents.

Traditional OCR tools give you raw text with no structure. Manual chunking strategies like "split every 512 tokens" destroy semantic meaning. Tables get mangled. Images are ignored completely.

💡 The Key Insight

Retriv.ai's agentic parser understands document structure — headings, paragraphs, tables, images — and creates semantic chunks that preserve context. This is perfect for RAG.

The Problem with Traditional RAG Chunking

Let's say you're building a RAG system over medical records. A traditional approach might:

Run OCR to get raw text
Split the text every N characters or tokens
Embed each chunk
Store in a vector database

But this destroys critical information:

A table showing lab results gets split mid-row
A diagnosis paragraph gets cut off halfway through
Section headings are separated from their content
Images and charts are completely ignored

The Retriv.ai Approach

Our Parse API gives you hierarchical, semantic chunks from the start. Here's how to build a production RAG pipeline in under 10 minutes.

Step 1: Install Dependencies

pip install retriv_ai openai chromadb

Step 2: Parse Your Documents

from retriv_ai import RetrivClient
import os

# Initialize client
client = RetrivClient(api_key=os.environ['RETRIV_API_KEY'])

# Parse a complex document
result = client.parse(file_path="medical_record.pdf")

# Get semantic chunks
chunks = result.chunks

# Each chunk includes:
# - type: 'text', 'table', 'image'
# - content: the actual content
# - metadata: page number, hierarchy, etc.
# - embedding_ready: pre-formatted for your vector DB

print(f"Found {len(chunks)} semantic chunks")
for chunk in chunks[:3]:
    print(f"- {chunk.type}: {chunk.content[:100]}...")

Step 3: Embed and Store

import chromadb
from openai import OpenAI

# Initialize vector DB
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("medical_records")

# Initialize OpenAI for embeddings
openai_client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

# Embed each chunk
for i, chunk in enumerate(chunks):
    # Get embedding
    embedding_response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=chunk.embedding_ready  # Pre-formatted by Retriv
    )
    embedding = embedding_response.data[0].embedding
    
    # Store in Chroma
    collection.add(
        embeddings=[embedding],
        documents=[chunk.content],
        metadatas=[{
            "page": chunk.page,
            "type": chunk.type,
            "source": "medical_record.pdf"
        }],
        ids=[f"chunk_{i}"]
    )

print("✅ All chunks embedded and stored!")

Step 4: Query Your RAG System

# User question
question = "What were the patient's glucose levels?"

# Get embedding for question
q_embedding = openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=question
).data[0].embedding

# Search vector DB
results = collection.query(
    query_embeddings=[q_embedding],
    n_results=3
)

# Build context from results
context = "\n\n".join(results['documents'][0])

# Generate answer with GPT-4
response = openai_client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are a medical assistant. Answer based on the provided context."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
    ]
)

print(response.choices[0].message.content)

Why This Works Better

✅ Traditional Chunking

❌ Splits tables mid-row

❌ Loses document structure

❌ Ignores images

❌ No hierarchy preservation

✅ Retriv.ai Semantic Chunking

✅ Tables kept intact

✅ Document structure preserved

✅ Images with captions

✅ Full hierarchy metadata

Real-World Results

We tested this approach with a healthtech startup building clinical decision support. After switching from traditional OCR + chunking to Retriv.ai's semantic parsing:

Answer accuracy improved by 43% (measured by clinical expert review)
Setup time reduced from 2 weeks to 1 day
Token costs dropped 35% (more relevant chunks = fewer retrieved)

Next Steps

This tutorial shows the basics, but there's much more you can do:

Use Retriv's visual grounding to show users exactly where answers came from
Build hybrid search (keyword + semantic) using chunk metadata
Process thousands of documents asynchronously with the Parse Jobs API
Fine-tune retrieval by filtering on chunk type (tables only, for example)

Ready to Build Your RAG Pipeline?

Get your free API key and start processing documents in minutes.

Get Started Free

Michael Chen

Developer Advocate @ Retriv.ai

Previously: ML Engineering @ OpenAI

Building a RAG Pipeline with Retriv.ai in 10 Minutes