How We Rebuilt a Healthcare Platform Into a Fully Autonomous AI Caregiver Companion

The Challenge

53 million Americans are family caregivers — most of them Gen X adults juggling jobs, kids, and the health of aging parents. Their daily reality is chaos:

Fragmented medical information scattered across Medicare portals, insurance EOBs, paper records from multiple specialists, and pharmacy printouts
Healthcare jargon that makes critical information incomprehensible to non-clinicians
No single source of truth — when Mom goes to the ER, the caregiver fumbles through folders and phone photos trying to recall medications and dosages
Decision fatigue — knowing when to escalate, what questions to ask the doctor, and how to coordinate across providers

The first version of the platform (codenamed "Illuminator") tackled this with a traditional dashboard approach: a Next.js frontend, Laravel backend, and PostgreSQL database pulling Medicare Blue Button data into organized screens for medications, conditions, providers, and documents.

It worked — but the founder had a bigger vision. Dashboards still require the caregiver to know what to look for. The real need was an intelligent companion that could reason about the complete care picture and proactively surface what matters. The question became: what if the entire product experience was a conversation with an AI that truly understood your family's care situation?

Our Solution

We built Twila OS — an agentic AI operating system that replaced the traditional dashboard entirely with a conversational interface. Twila isn't a chatbot wrapper around an LLM. It's a purpose-built AI runtime with deep healthcare domain knowledge, real data access, and the ability to take actions on behalf of caregivers.

The Architecture

The system consists of three layers:

Gateway — A Node.js/TypeScript runtime that handles authentication, context assembly, skill loading, LLM orchestration, and tool execution. Every conversation turn assembles a care profile within an 8K token budget, selects relevant behavioral skills, sanitizes PHI, calls the LLM, and executes any resulting tool actions.
Canvas — A minimal React 19 + Vite frontend that renders the conversational UI and dynamic workspace cards. It's intentionally thin because the AI drives the experience.
Skills & Knowledge — 21 behavioral skill definitions and 15+ domain knowledge bases (Medicare, clinical topics, legal matters, caregiving stages, ADLs, assisted living, and more) loaded as structured prompts per conversation turn.

37 Purpose-Built Tools

Twila can read real Medicare claims data, search prescription histories, list insurance benefits, browse uploaded documents, and check care contact information. It can also write — adding medications, creating notes, generating CareMinder action items, updating care context, and even sending SMS messages to family members. Write operations use a proposal-confirm pattern so the AI never makes changes without caregiver approval.

PHI Protection as Architecture

Every message is sanitized before reaching the LLM — patient names, dates of birth, SSNs, and Medicare IDs are masked while preserving clinical content. Messages are encrypted at rest with AES-256-GCM. The entire system operates under a signed Business Associate Agreement.

Our Approach

The migration from traditional app to agentic AI operating system was executed in structured phases with a controlled cutover:

Phase 1 — Foundation (Phases 0-4)

We built the core personality engine, LLM provider abstraction (supporting both OpenAI and Anthropic Claude), the tool execution framework, an evaluation engine for response quality, and the workspace card system for dynamic UI rendering.

Phase 2 — Intelligence (Phases 5-9)

Conversation memory, SMS delivery via AWS Pinpoint, document extraction using GPT-4o vision (replacing an earlier OCR pipeline), write-back pipelines for extracted data, and persistent learning — Twila remembers preferences like "Carol prefers morning appointments" across sessions.

Phase 3 — Proactive Care (Phases 10-12)

The proactive engine runs on a 15-minute scheduler with 7 detection rules, quiet hours, daily alert limits, and cooldown periods. Predictive reasoning cross-correlates medications with diagnoses, detects care gaps, and identifies potential drug interactions. Multi-channel routing ensures alerts reach caregivers through the right channel at the right time.

Phase 4 — Hardening & Cutover

Six hardening phases covered token budget enforcement, JWT authentication, card security, database-backed encrypted sessions, structured logging, and PHI sanitization validation. The cutover from the legacy app followed a controlled path: shadow mode (AI runs alongside old app), canary routing (percentage-based traffic split), then full cutover.

The shared PostgreSQL database made this possible — both systems read from the same Medicare claims data and user records, so there was no data migration needed. Just a clean handoff of the experience layer.

Testing

296 tests across 64 test files cover the gateway's tool execution, context assembly, PHI sanitization, skill loading, and proactive engine. Smoke test runners validate end-to-end flows in staging before every deployment.

Results & Outcomes

Twila OS is now live at os.twila.ai, operating at 100% cutover from the legacy application.

What Twila Can Do Today

Read and reason about real Medicare data — claims, prescriptions, provider encounters, diagnoses — and explain them in plain language tailored to the caregiver's situation
Extract structured data from documents — upload an EOB, lab result, or prescription bottle photo and Twila pulls medications, conditions, providers, and diagnoses into the care record
Proactively detect care gaps — missed medications, overdue appointments, potential drug interactions, and gaps between what's prescribed and what's actually happening
Take actions with permission — add medications, create notes, generate action items, update care context, all through a propose-confirm pattern
Deliver across channels — web workspace with dynamic cards and SMS for time-sensitive alerts, with email planned next
Remember and learn — persistent memory means Twila gets better at serving each family over time

Platform Scale

37 tools giving the AI direct read/write access to the care data layer
21 behavioral skills loaded contextually per conversation turn
15+ domain knowledge bases covering Medicare, clinical topics, legal matters, caregiving stages, and more
296 automated tests across the gateway ensuring reliability
Multi-model support — currently running GPT-5-mini for chat and GPT-5.1 for tool calls, with Anthropic Claude as a drop-in alternative

The platform demonstrates that healthcare AI doesn't have to be a thin wrapper around a language model. By building domain-specific tools, behavioral skills, proactive detection, and HIPAA-grade data protection into the runtime itself, Twila OS delivers an experience that's genuinely more capable than the traditional dashboard it replaced.