High-Performance Distributed Audio Processing | Edge AI Status: Completed (University Project)

AudioDAS

Distributed audio processing engine with extreme concurrency (threads, semaphores, channels), Edge AI in the browser, and efficient streaming (HTTP 206).

Source Code

Project Overview

I designed and developed AudioDAS, the processing engine for a real Digital Assistance System. I implemented a Clean Architecture under a Modular Monolith in .NET 9. I optimized the server using the Task Parallel Library (TPL), the Producer-Consumer pattern with System.Threading.Channels, and SemaphoreSlim-based concurrency control to prevent CPU saturation. The system orchestrates multiple tasks in parallel: AAC compression with FFmpeg, audio filtering, and two consecutive Python subprocesses (Whisper + Flan-T5) for transcription and summarization. All of this achieved a p95 response time of 205ms and a 0% error rate under k6 stress tests. I also implemented Edge AI on the frontend (ONNX Runtime WebAssembly) to validate audio locally before uploading, dramatically reducing network traffic. Final streaming uses HTTP 206 Partial Content for instant playback without full downloads.

RolePerformance & Concurrency Engineer

ClientAcademic project – Distributed Systems Lab (Jala University)

DurationMarch 2026 - April 2026

Year2026

Architecture Design

Modular Monolith / Event-Driven Background ProcessingRESTful API v1 + HTTP 206 Partial Content

Patterns: Clean Architecture, Producer-Consumer, Parallelism (TPL), Semaphore-based Concurrency Control, Singleton (Model Loader), Edge AI (WebAssembly)

Context Diagram: Users, Frontend with Edge AI, .NET Backend, Workers, FFmpeg, and Python Models

Container Diagram: .NET API, Background Worker, Channel, PostgreSQL, Python Subprocesses, and FFmpeg

Core Modules

Extreme Concurrency: Threads, Semaphores & Channels

Producer-Consumer pattern implemented with System.Threading.Channels for non-blocking task queuing. SemaphoreSlim (2-2-2 configuration) limits parallelism: max 2 simultaneous FFmpeg processes and 2 AI inferences. The worker consumes tasks and launches three parallel threads with TPL (Task Parallel Library), achieving audio compression and filtering in parallel with AI — without saturating the CPU or blocking HTTP requests.

Edge AI on the Client: ONNX Runtime WebAssembly

The frontend (Next.js) runs the Whisper-tiny (ONNX) model inside a Web Worker. It decodes audio locally with AudioContext, transcribes it, and validates the domain (e.g. 'economics') before uploading. If invalid, the file is rejected on the spot; if valid, it's uploaded. This saves bandwidth and prevents the server from processing garbage.

Parallel Server Processing with Python Subprocesses

The main worker launches two sequential Python subprocesses: first Whisper (ggml-tiny.bin) for transcription, then Flan-T5-small for summarization. Both models are loaded once into RAM (Singleton) and run under the same AI semaphore, ensuring no more than two simultaneous inferences. The result is a 50-character summary optimized for metadata.

Compression & Normalization with FFmpeg (Concurrent)

Two independent tasks run FFmpeg in parallel: one compresses to AAC (for 5-year legal retention), the other applies an echo filter and produces filtered MP3. Both share a 2-instance semaphore, so the limit of heavy processes is never exceeded. The rest of the HTTP requests keep responding without waiting.

Smart Streaming with HTTP 206 Partial Content

The /stream endpoint supports the Range header and responds with 206 Partial Content and Accept-Ranges: bytes. The HTML5 player can request only the needed fragment (e.g. bytes=0- to start fast). On seek, the browser automatically generates new range requests, downloading only the necessary bytes without loading the full file.

Stress Testing & Concurrency Monitoring

k6 test suite that simulated 1000 concurrent users uploading files and streaming audio. Throughput, p95 latency (205ms), and semaphore usage were measured. Unit tests with xUnit, Moq, and FluentAssertions cover domain and application logic.

Containerized Infrastructure & Reproducible Deployment

Multi-stage Docker build with .NET 9 SDK and runtime, plus pre-installed FFmpeg. AI models (Whisper, Flan-T5) are packaged as volumes or downloaded on first startup. The Next.js frontend is served from a separate container. Everything orchestrated with docker-compose for development and production.

Technologies Implemented

frontend

Next.js 16TypeScriptTailwind CSSONNX Runtime WebAssemblyReactWeb Workers

backend

.NET 9C#PostgreSQLEntity Framework CoreSystem.Threading.ChannelsTPLSemaphoreSlim

ai And Media

Whisper.net (ggml)FFmpegGoogle Flan-T5-smallONNX (Xenova/whisper-tiny)

tools

DockerDocker Composek6xUnitMoqFluentAssertions

System Gallery