Fujitsu Laboratories Limited today announced the development of speech interface technology that enables users to retrieve a variety of information by simply speaking into a smartphone, without having to look at the smartphone's display.
After listening to a synthesized speech read the latest news and other information, users can articulate the information that they would like to learn more about. The software will then read details about the topic and other related information. By taking advantage of this technology, users who are driving or working and need to keep their eyes and hands free can use various information services without having to look at or touch the smartphone's display.
Currently, most smartphones and other mobile devices are operated by the user touching the handset while looking at its display. However, mobile devices are also employed in other situations - such as walking, driving, and working - where users must keep their eyes and hands focused on the task at hand. In such scenarios, users can benefit from speech recognition technology that understands human speech and speech synthesis technology where devices are able to read text aloud.
In recent years, by employing devices to remotely access datacenters, where an abundance of computing resources can be utilized, it has become possible to develop speech recognition and synthesis technologies that handle a larger lexicon than has previously been possible on stand-alone devices. This has led to high expectations for the delivery of new and innovative services.
Fujitsu Laboratories has developed industry-leading technologies that include, professional-level quality speech synthesis technology, as well as speech recognition technology that can eliminate background noise while picking up on only the user's voice. The company is currently aiming to enable new speech interfaces, including the development of datacenter-based speech recognition and synthesis technologies.
Speech-based input and ouptut makes it easy to receive various news and other information services without looking at or touching a handset device. To accomplish this, and in order for the system to accurately pronounce news and other content and correctly recognize words articulated by the user, it must be able to properly support the ever-growing assortment of new terminology, including modern lingo. In addition, the system must also be able to properly interpret homonym variants spoken by the user. Fujitsu Laboratories is working to overcome these challenges and to realize a highly original function for smooth and ideal communication.
About the Newly Developed Technology
To address these issues, Fujitsu Laboratories has developed a new eyes-free and hands-free speech interface in which, by simply speaking about what the user is interested in, the system pulls up relevant information and reads it out loud. For instance, when the user speaks a particular phrase from a news headline that the system has read, the system will read more detailed articles related to the topic at hand.
Features of the newly developed technology are as follows.
1. Speech dialogue knowledge building technology supports the latest modern lingo and newly coined terms
Language is constantly changing. To address linguistic evolution, Fujitsu has developed technology that automatically extracts the orthographic patterns of new terminology from text found on the Internet, and then automatically inputs it into the system's vocabulary dictionary. This makes it possible to create a speech interface that minimizes often misread and falsely recognized words.
2. Technology that selects from homonym variants based on previous exchanges
Fujitsu has developed technology that analyzes information previously presented by the system, extracts vocabulary focused on certain topics, and automatically generates a speech recognition dictionary. As a result, the system is able to correctly recognize homonyms and other ambiguous phrases, thereby helping to facilitate accurate dialogue with the user.
3. Technology for providing appropriate responses
When performing speech recognition and speech synthesis, the handset is connected to a datacenter where a huge lexicon is stored and updated. Fujitsu Laboratories has developed technology that, by dividing and anticipating speech data, is able to absorb the delays caused by processing and transmission as part of the datacenter-based speech recognition and speech synthesis process. In addition, the technology is able to further improve the quality of the response time by controlling the timing of breaks between words. As a result, the user experience compares favorably with that of car navigation systems.
This technology enables users to retrieve information through a series of intuitive speech interaction, without looking at any displays. As a result, news, email and other web services frequently used in daily life are available while driving or walking, or provided to users who have difficulty viewing a display. In addition, for audio tour systems employed in museums, the technology can provide more detailed information. For example, additional information could be offered just by saying a word that comes up in an audio tour or in a description of an exhibit.
Fujitsu Laboratories conducts development with the aim of commercializing this technology as a mobile user interface for cloud services within fiscal 2012. While performing field trials in the current fiscal year, the company will also explore a variety of other potential applications.