A production AI telehealth platform with 70+ AI specialist doctors across web and mobile — real-time voice and streaming chat consultations powered by Anthropic Claude and OpenAI's Realtime API, engineered for reliability at scale.

01 — Overview
VirtualMD lets patients consult 70+ AI specialist doctors by voice or chat across web and mobile. A Python FastAPI backend coordinates multiple AI providers — Anthropic Claude for clinical reasoning, OpenAI's Realtime API over WebRTC for live voice — behind a React app, a separate admin console and a Next.js marketing site. It runs in production at virtualmd.app.
Role
Timeline
Stack
02 — Context
Real-time medical conversations are unforgiving: a dropped socket, a stalled token stream or a latency spike breaks the consultation instantly. The platform also had to coordinate multiple AI providers, serve web and mobile from one API, and stay reliable as usage grew toward serving a very large user base.
I owned web reliability and the real-time streaming pipeline end to end. I hardened the WebSocket layer (keep-alive pings, idle cleanup, exponential-backoff reconnection, per-connection rate limiting) and built an adaptive client-side drain that scales the typewriter render to queue depth so streamed responses never stall. On the backend I worked across a provider-abstracted AI connector with fallback handling, a Redis cache-aside layer with graceful degradation, Celery background jobs, and a Postgres schema tuned with connection pooling and targeted indexes — architecture built to scale toward 1M+ users.
03 — Showcase



04 — Capabilities
05 — Contribution
As Full-Stack Engineer · Web reliability owner, here is exactly what I owned and delivered on this project.
06 — Engineering
Challenge
Streamed AI responses stalled or overwhelmed the UI under load.
Solution
Built an adaptive requestAnimationFrame drain that scales batch size to queue depth (2→12 chars/frame) and flushes when the tab is hidden — smooth output with no runaway queues.
Challenge
Real-time voice and chat sockets dropped mid-consultation.
Solution
Added keep-alive pings, idle cleanup and 5-attempt exponential-backoff reconnection with a 15s health check so sessions stay live.
Challenge
A single AI provider hitting a 429 or timeout would break a consultation.
Solution
Routed all AI calls through one connector with model selection and typed fallbacks for rate limits, timeouts and refusals.
07 — Toolbox
08 — Impact