How do you feel about combining voice agents with Generative UI?
I've been thinking about the future of voice agents and wondering if pure voice is actually the best interface.
Most discussions focus on either:
● Voice-only assistants
● Chat-based assistants
● Generative UI experiences
But what if they were combined?
For example, instead of a voice agent simply responding with words:
User: "Show me my portfolio."
The agent could respond verbally while also generating an interactive UI containing charts, filters, recent transactions, and actions.
Or:
User: "Find me a flight to Bangalore next weekend."
Instead of reading out 20 options, the agent could generate a visual card layout while continuing the conversation.
In this model, voice becomes the input/output layer, while the UI is generated dynamically based on intent and context.
I'm curious what others think:
● Is voice + Generative UI the natural evolution of AI assistants?
● Are there products already doing this well?
● When should an AI speak versus generate a visual interface?
● Would users actually prefer this over traditional apps?
Interested to hear thoughts from people building voice agents, GenUI systems, or multimodal products.