Amazon has unveiled significant enhancements to Alexa’s language processing and speech features, aimed at facilitating more fluid and human-like conversations.
First hinted at in May, the updated model enables Alexa to engage in discussions that feel more natural. The virtual assistant now boasts new capabilities, including the ability to make API calls and improved personalization features, enhancing the accuracy of its factual responses.
Additionally, Amazon has revamped Alexa’s automatic speech recognition (ASR) system, upgrading its algorithms and hardware. This new ASR system has been trained on extensive multilingual audio data, allowing it to recover from interruptions in speech with a tool designed to fix truncated responses.
A new speech-to-speech model has also been introduced, which empowers Alexa to exhibit conversational traits reminiscent of human interaction, such as laughter. The updated system can now respond more appropriately to user cues, reflecting the user’s emotions during interactions.
Amazon senior vice president Dave Limp showcased these advancements at an event in the company’s new headquarters in Arlington, Virginia, stating that conversations with Alexa would now feel “just like talking to another human being.”
In a notable change, users can now activate Alexa simply by looking at the screen of a camera-equipped device, eliminating the need for a wake word—a feature similar to that being introduced with Apple’s Siri. Enhanced visual processing capabilities combined with acoustic models allow the device to discern whether the user is addressing Alexa or another individual.
These new features are set to be rolled out in the coming months as part of CEO Andy Jassy’s vision to establish Alexa as “the world’s best personal assistant.”
To support this initiative, Amazon has formed a dedicated “central team” focused on ambitious AI projects, led by Rohit Prasad, the head scientist for Alexa, who reports directly to Jassy.