Why I Built a Voice AI Platform Nobody Asked For

I don't build things because the market tells me to.

I build things because I can't stop thinking about them.

Voice waveforms transforming into digital API infrastructure — analog to digital, voice to code

The Spark

At work, Vervio had a product called Retail Agent AI. Voice-based, customer-facing. I wasn't fully engaged with it — different team, different priorities — but I watched from the sidelines.

What I saw was familiar: teams stitching together STT, TTS, LLMs, and WebRTC, then spending weeks on the glue code between them. The AI part was easy. The voice part was a mess.

Meanwhile, I'd been following XR development. Open-source headsets. The idea that someday you'd bring your own AI to your own glasses. And that AI would need a voice.

Not a chatbot voice. A real-time, low-latency, interrupt-me-mid-sentence voice.

I thought: what if there was infrastructure for that? Something you could plug into any existing LLM system and just... talk to it?

Nobody asked for it. I built it anyway.

The Name

My daughters named it.

We were in the car — my 9-year-old and my 4-year-old — singing along to a viral Japanese song called AI SCREAM! (愛♡スクリ～ム！). Not a song I would've chosen for myself, but it's catchy in the way that sticks, and we were all shouting the lyrics.

Somewhere in the chorus, there's "chan" — the Japanese honorific. My oldest said it sounded like a name. I checked the domain. chans.ai was available. We had our name.

The voice actors follow the theme — Aura Chan, Miko Chan. It's unorthodox. But it's ours.

What Chans Actually Does

The pitch is simple: plug voice into your existing AI.

You handle the intelligence. Chans handles the voice transport, processing, and everything around it.

Two modes:

Enhanced mode — Chans brings built-in RAG with hybrid search (vector + BM25), conversation persistence, and an embedded LLM. Upload your documents, configure a system prompt, and you have a voice agent that knows your content. Line-level citations included.

Passthrough mode — bring your own LLM entirely. Chans handles STT and TTS. Your transcript arrives via webhook, you return a response, Chans speaks it. Zero opinions about your AI stack.

Both modes get the same infrastructure: multi-tenant isolation, provider presets (swap STT/TTS/LLM without code changes), conversation history with semantic search, MCP tool integration, a full dashboard, and a published SDK with a clean state machine.

Loading diagram...

The entire core is about 2,400 lines of Python. I've seen competitors at 60,000+.

The LiveKit Problem

I built Chans on LiveKit Agents — an open-source framework for real-time voice AI. It handles WebRTC, VAD, provider integrations. In theory, it's the perfect foundation.

In practice, I spent days debugging issues that had nothing to do with my code.

Agents wouldn't join rooms. Sessions would start, wait, then fire participant_disconnected without explanation. No tracks subscribing. No errors — just silence.

Other times, the agent would get stuck in a "speaking" state with no audio output. The framework thought it was talking. Nothing was coming out. The only recovery was for the user to speak again, which would unstick the pipeline.

These aren't obscure edge cases. They're documented issues in the LiveKit repo, affecting multiple developers.

I'm not blaming LiveKit — real-time audio is genuinely hard, and they're iterating fast. But it made me question the dependency. If the transport layer is unreliable, everything built on top of it inherits that unreliability.

I'm considering removing the LiveKit dependency entirely in the future. Not sure yet. But the thought is there.

What I'm Not Building

I should be honest about what Chans isn't.

It's not a startup. There's no pitch deck. No investors. No customers beyond me.

It's in beta. Nobody else has used it. I'm still exploring whether the market even wants this, or whether I've just made LiveKit more complicated.

That doubt is real. Every builder of developer tools has felt it: am I solving a problem, or creating one?

Maybe that's the right question to sit with for a while.

What I Actually Learned

Even if Chans never gets a single user, the build was worth it.

Real-time audio is a different beast. Web development gives you request-response. Voice gives you streams, interruptions, silence detection, and latency budgets measured in milliseconds. Every architectural decision feels different when the user is literally waiting to hear a response.

Hybrid search outperforms pure vector. My RAG implementation combines 70% vector similarity with 30% BM25 full-text search. The BM25 component catches exact keyword matches that embeddings miss entirely. Line-level citations (file.md#L10-L25) make the results verifiable.

Provider abstraction pays for itself immediately. Named presets for STT, TTS, and LLM providers mean I can swap from OpenAI to Deepgram to ElevenLabs without touching agent code. When a provider has an outage or raises prices, it's a config change.

Multi-tenancy is harder than the feature itself. Every table needs scoping. Every query needs filtering. Every endpoint needs auth. The agent logic was maybe 20% of the work. The operational infrastructure was the other 80%.

The XR Bet

Here's the long-term thinking, for whatever it's worth.

XR glasses are coming. Not the $3,500 Vision Pro kind — the open, hackable, bring-your-own-AI kind. When someone ships an affordable pair of open-source AR glasses, the first thing people will want is a voice interface to their own AI.

Not Siri. Not Alexa. Their AI. Running on their server. With their data.

That's what Chans is designed for. A self-hostable voice layer that connects to whatever LLM you're running.

Is this a real market? I have no idea. It might be five years away. It might never happen.

But if it does, I want to be the person who already built the infrastructure.

Why I Keep Building Things Nobody Asked For

I rejected a Lead Engineer promotion because it had too many meetings and not enough machines. That same principle is why I code from my phone, why I built VARGOS after INGRA failed, and why I built Chans without a business case.

Every project answers the same question: what if I removed the thing that was in my way?

For Chans, the thing in the way was this:

What if adding voice to AI was as easy as adding a webhook?

I don't know if anyone needs the answer. But I needed to build it.

That's enough for now.