Back to all posts
BuildingDecember 5, 20256 min read

Why I Built a Voice AI Platform Nobody Asked For

No pitch deck. No customers. Just a hunch that voice AI shouldn't require stitching five services together. This is how Chans.ai started - in a car, singing along to a Japanese pop song with my kids.

voice-ailivekitopen-sourceside-projectpython

I built a voice AI platform with no customers yet, no confirmation of market demand, and no business plan. The alternative was simply unbearable. I did this because I couldn't see any other options. I watched my team spend weeks connecting up to five services just to make an LLM talk. I didn't build these things because the market told me to, but I built them because I could not stop thinking about them.

Voice waveforms transforming into digital API infrastructure - analog to digital, voice to code


The Spark

At Vervio, we had an AI retailer representative who was a voice-based consumer product, even though I wasn't directly involved, but by observing from a distance, I saw the same pattern happening everywhere: the team would build a voice recognition system as text, a large-scale language system, a voice message conversion system, and a real-time media system, and then it would take weeks to connect all the components together, which I found to be unreasonable, Around that same time, I began following XR development and the rise of open-source headsets, driven by the idea that people would eventually use their own AI through smart glasses that required a real-time, low-latency voice interface rather than a simple chatbot voice. and the voice part was a disaster.


The Name

The name was given to me by my daughter. We were in the car, a 9-year-old girl and a 4-year-old were singing a Japanese song that was famous AI SCREAM! (愛♡スクリム!) together. It wasn't the first company name I thought of, but it hit my ears and stuck in my head. The song had the word "chan" which is a Japanese honorific. My daughter said it sounded like a person's name, so I tried checking the domain and found out that chans.ai was available, so the voice actors follow the theme: Aura Chan, Miko Chan, although it looks unusual, but it's our name.


What Chans Actually Does

Plug voice into your existing AI. Two modes:

  • Passthrough mode - bring your own LLM. Chans handles STT and TTS. Your webhook gets the transcript, you respond, Chans speaks it out. Zero opinions about your AI stack.
  • Enhanced mode - Chans runs the LLM too. Upload your knowledge base and get a voice agent that knows your content with line-level citations.

Both modes share common infrastructure: multi-tenant isolation, provider presets (swap STT/TTS/LLM without code changes), persistent conversation history with semantic search, and MCP tool integration. The platform includes a full dashboard and an SDK with a clean state machine.

Loading diagram...

The entire core is about 2,400 lines of Python. I have seen competitors at 60,000 plus.


The LiveKit Problem

I built Chans on LiveKit Agents, an open-source framework for real-time voice AI, managing the WebRTC system, VAD, and connections to various service providers. In theory, everything looks perfect, but in practice it's a nightmare. Agents wouldn't join rooms. Sessions would start, pause, and fire participant_disconnected with no explanation. There were no track subscriptions and no errors reported. There will only be silence. I am considering removing the LiveKit dependency entirely, though I haven't decided yet. However, I learned that if the transport layer is unreliable, everything built on top of it inherits that unreliability. I am not blaming LiveKit; real-time sound is difficult and they iterate quickly. Sometimes the center is in a state of "talking" even though no voice is coming out. The framework thought it was talking, but nothing came out. The only recovery was to have the user speak again to unstick the pipeline. These things are not just difficult problems that can happen, but they are documented issues affecting multiple developers with no resolution timeline.


What I'm Not Building

So I should have made it clear that Chans is not a startup, because it's not a start-up, because there are no plans to launch the project yet, there are no investors, and there are no customers other than myself. It's only in testing and no one has used it yet. I'm looking into whether the market really needs this or I'm just making LiveKit more complicated. This uncertainty is real, and every developer has felt this way. The core question is whether I am solving a problem or creating one.


What I Actually Learned

Even if Chans never gets a single user, the build justified itself. Real-time vocalisation is different. Web development gives you request-reply. Voice gives you streams, interruptions, silence detection, and latency budgets in milliseconds. Every decision in deploying the system infrastructure will be of utmost importance when the user is waiting for a response In the part of the RAG data layer (**Hybrid search outperforms pure vector. **) My RAG layer combines 70% vector similarity with 30% BM25 full-text search. which BM25 will capture keywords that embeddings miss entirely. Line-level citations make the results confirmable. Provider abstraction pays for itself immediately. I can swap between providers like OpenAI, Deepgram, or ElevenLabs via a simple config change.


The XR Bet

The XR glasses are coming, but not the Vision Pro, which costs up to $3,500, and the open, hackable, bring-your-own-AI kind, but when someone ships an affordable pair of open-source AR glasses, the first thing people will want is a voice interface to their own AI, not Siri or Alexa themselves, but their own AI that runs on a server and uses its own data, which is what Chans was designed to be. I'm not sure if there is a real market for this. It's a self-hostable voice layer that can work with any LLM you're running, I'm not sure if this is going to be a real marketplace in five years or not at all, but if it does, I want to build the infrastructure.


Why I Keep Building Things Nobody Asked For

I rejected a Lead Engineer promotion because of too many meetings and not enough time with machines. This is the same principle I applied when I code from my phone, built VARGOS after INGRA failed, and started understanding how to run AI locally. Every project answers the same question: what if I removed the obstacle? For Chans, the obstacle was this:

What if adding voice to AI was as easy as adding a webhook?

I do not know if anyone needs the answer. But I need to build it.

That is enough for now.