A landmark partnership between IBM and ElevenLabs is moving enterprise AI beyond text, delivering natural, secure, and scalable voice-first agents. The collaboration integrates ElevenLabs’ premium Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities with IBM’s watsonx Orchestrate platform, enabling organizations to build voice-enabled AI agents that communicate with nuance, emotion, and rhythm across 70 languages.
This strategic integration expands agentic AI from traditional text-based systems to voice-first interactions, offering enterprises the ability to replace robotic call flows with human-like conversations. The partnership addresses key enterprise needs for security and compliance, including PCI compliance for payment processing and HIPAA-compliant data handling through Zero Retention Mode.
Industry applications span government services, banking, healthcare, insurance, and utilities, where AI phone agents can now converse in multiple languages with regional accents. Internal use cases include helping employees navigate legacy systems and retrieve complex compliance documentation via simple voice commands.
ElevenLabs has achieved $330 million in annual recurring revenue (2025) and a valuation of $11 billion following a $500 million Series D funding round in February 2026. The company’s voice library contains over 10,000 voices.
Nick Holda, Vice President of AI Technology Partnerships at IBM, stated: “We’re bringing a voice to AI Agents in the enterprise. As clients increasingly deploy agentic AI that interacts with their customers and employees, they want these experiences to feel intuitive, responsive and accessible.”
Mati Staniszewski, Co-Founder of ElevenLabs, added: “AI agents are becoming central to everyday work, and voice is where AI either earns trust or loses it.”
The collaboration underscores a shift toward human-centered AI interfaces that adapt to natural speaking habits, moving beyond rigid call flows and towards empathetic, efficient digital ecosystems that can scale globally.






