NEXA MULTIMODAL RAG
High-performance multimodal RAG (Retrieval-Augmented Generation) engine built on Hexagonal Architecture principles, designed to serve as a knowledge base for autonomous agent systems.
Pure software architecture
Robust backend · No visual UI
Project Overview
I designed and developed Nexa, a multimodal ingestion, vectorization, and semantic search engine that transforms heterogeneous documents (PDF, DOCX, JSON, images) into high-quality vectors stored in ChromaDB. I implemented a hybrid extraction pipeline that combines local OCR (PyMuPDF) with cloud models (Mistral OCR, DeepSeek OCR 2) and a page classifier that decides in real time which to use in order to minimize costs. The system applies two interchangeable chunking strategies through a factory (Entity Chunker for catalogs, Recursive Chunker for long documents), enriches images with Gemini Flash-Lite, and vectorizes in batches with Gemini Embedding 2. It exposes a multi-tenant REST API ready to be consumed by external conversational agent systems like Aethelgard.
Core Modules
Multi-Format Ingestion with Hybrid OCR
Unified pipeline that accepts PDF, DOCX, TXT, MD, JSON, and images. A local classifier (PyMuPDF) analyzes each page and decides whether to extract with native text (zero cost) or send to Mistral OCR (high precision) or DeepSeek OCR 2 (ultra low cost). Includes a DocumentExtractor that normalizes output into ExtractedPage objects with clean content and page metadata.
Strategic Chunking with Strategy Factory
Two interchangeable chunking strategies via the Factory pattern. EntityChunker (for catalogs and JSON) creates one chunk per product and automatically extracts image URLs, SKU, and prices into metadata. RecursiveChunker (for long documents) splits text respecting paragraphs and sentences with configurable overlap, ensuring the semantic integrity of each fragment.
Batch Vectorization & Vector Database
Gemini Embedding 2 Preview converts texts and visual descriptions into 3072-dimension vectors. The GeminiEmbeddingAdapter sends multiple texts in a single HTTP call (batching), drastically reducing latency. Vectors and metadata are stored in ChromaDB (local) with planned support for migration to pgvector in Supabase.
Multimodal Search with Adaptive Orchestrator
SearchOrchestrator checks the collection type and dynamically selects the right retriever: CatalogRetriever (metadata filters + similarity threshold + optional reranking) or DocumentRetriever (query expansion and context compression via LLM). The response includes enriched sources with page numbers, chunk type, and associated images.
Visual Enrichment with Gemini Flash-Lite
ImageEnricher detects image references in Markdown and fires async requests in parallel to Gemini Flash-Lite. It generates self-contained text descriptions stored as 'image' type chunks, enabling semantic searches over the visual content of charts, tables, and photographs.
Multi-Tenant API & Administration
REST endpoints for managing users, businesses, channels, and collections. Each business can be associated with multiple collections and channels (WhatsApp Cloud API, web widget). The users, businesses, collections, and documents modules encapsulate their own business logic, repositories, and Pydantic schemas.
Hexagonal Architecture & Data Contracts
The system strictly follows Ports & Adapters principles. Dependencies are inverted through ports (ICollectionRepository, IDocumentRepository, IVectorStore, IEmbeddingProvider, ILLMClient) implemented by concrete adapters. A dependency injection container centralizes instantiation, making it easy to swap technologies without touching business logic.