Multimodal Vector Engine | Hexagonal Architecture Status: In development

NEXA MULTIMODAL RAG

High-performance multimodal RAG (Retrieval-Augmented Generation) engine built on Hexagonal Architecture principles, designed to serve as a knowledge base for autonomous agent systems.

View Platform Source Code

Pure software architecture

Robust backend · No visual UI

Project Overview

I designed and developed Nexa, a multimodal ingestion, vectorization, and semantic search engine that transforms heterogeneous documents (PDF, DOCX, JSON, images) into high-quality vectors stored in ChromaDB. I implemented a hybrid extraction pipeline that combines local OCR (PyMuPDF) with cloud models (Mistral OCR, DeepSeek OCR 2) and a page classifier that decides in real time which to use in order to minimize costs. The system applies two interchangeable chunking strategies through a factory (Entity Chunker for catalogs, Recursive Chunker for long documents), enriches images with Gemini Flash-Lite, and vectorizes in batches with Gemini Embedding 2. It exposes a multi-tenant REST API ready to be consumed by external conversational agent systems like Aethelgard.

RoleSoftware Architect & AI Engineer

ClientInternal R&D project

DurationMarch 2026 – Present

Year2026

Core Modules

Multi-Format Ingestion with Hybrid OCR

Unified pipeline that accepts PDF, DOCX, TXT, MD, JSON, and images. A local classifier (PyMuPDF) analyzes each page and decides whether to extract with native text (zero cost) or send to Mistral OCR (high precision) or DeepSeek OCR 2 (ultra low cost). Includes a DocumentExtractor that normalizes output into ExtractedPage objects with clean content and page metadata.

Strategic Chunking with Strategy Factory

Two interchangeable chunking strategies via the Factory pattern. EntityChunker (for catalogs and JSON) creates one chunk per product and automatically extracts image URLs, SKU, and prices into metadata. RecursiveChunker (for long documents) splits text respecting paragraphs and sentences with configurable overlap, ensuring the semantic integrity of each fragment.

Batch Vectorization & Vector Database

Gemini Embedding 2 Preview converts texts and visual descriptions into 3072-dimension vectors. The GeminiEmbeddingAdapter sends multiple texts in a single HTTP call (batching), drastically reducing latency. Vectors and metadata are stored in ChromaDB (local) with planned support for migration to pgvector in Supabase.

Multimodal Search with Adaptive Orchestrator

SearchOrchestrator checks the collection type and dynamically selects the right retriever: CatalogRetriever (metadata filters + similarity threshold + optional reranking) or DocumentRetriever (query expansion and context compression via LLM). The response includes enriched sources with page numbers, chunk type, and associated images.

Visual Enrichment with Gemini Flash-Lite

ImageEnricher detects image references in Markdown and fires async requests in parallel to Gemini Flash-Lite. It generates self-contained text descriptions stored as 'image' type chunks, enabling semantic searches over the visual content of charts, tables, and photographs.

Multi-Tenant API & Administration

REST endpoints for managing users, businesses, channels, and collections. Each business can be associated with multiple collections and channels (WhatsApp Cloud API, web widget). The users, businesses, collections, and documents modules encapsulate their own business logic, repositories, and Pydantic schemas.

Hexagonal Architecture & Data Contracts

The system strictly follows Ports & Adapters principles. Dependencies are inverted through ports (ICollectionRepository, IDocumentRepository, IVectorStore, IEmbeddingProvider, ILLMClient) implemented by concrete adapters. A dependency injection container centralizes instantiation, making it easy to swap technologies without touching business logic.

Technologies Implemented

backend

Python 3.12FastAPIPydantic v2SQLAlchemyPostgreSQLuv (dependency manager)

ai And Models

Mistral OCR (mistral-ocr-latest)DeepSeek OCR 2 (Novita AI)Gemini Embedding 2 PreviewGemini 3.1 Flash-LiteDeepSeek-V3.2 (chat)

vector Store

ChromaDB (local)pgvector / Supabase (planned)

tools

UvicornSwagger UIGitcURLGoogle AI Studio