.. as published by Speech Technology Magazine
Embedding Speech into Mobile Devices
By Phillip Britt
Anne Rosenfeld, a Boston, Mass.-based producer of neuroscience conferences for educators, found the 75-minute (one-way) commute between her home and office to be a big waste of time until a dead cell phone helped her become more productive.
Needing a new portable phone to keep in touch with her office, suppliers, hotels and family and friends, Rosenfeld bought a new Samsung phone which she later learned was speech-enabled with software from VoiceSignal.
Before she had a speech-enabled device, Rosenfeld was very hesitant to use a cell phone in the car due to safety concerns. Because she is 60 years old, the buttons on the cell phone can be difficult to see, making dialing while driving a dangerous proposition, Rosenfeld explains. So the phone would often go unused during her commute, leaving her with 12 or more hours a week of unproductive time. The loss of productive time was especially frustrating when the commute time doubled as it sometimes did due to heavy traffic or poor weather.
Now that she has a phone with embedded speech, she uses voice commands to tell the device to call her office, family or friends. Even the social calls add to her productivity, Rosenfeld explains. As president of her company, she has the freedom to make those calls from the office, but now she can also make them from the car, allowing her to be more productive when at the office. “It’s made all of the difference,” Rosenfeld says. “Now I’m not endangering myself or anyone else when I use the phone while driving. Now that I have it, I wouldn’t want to be without it.”
Rosenfeld’s experience is only one of many for handheld device users, which points out the benefits of embedded speech in handheld devices from phones to PDAs to laptop computers. The development of embedded speech started to take hold in 2004 and is expected to grow even more in 2005 as the devices themselves continue to become more powerful and additional applications are developed and gain acceptance.
Many of the newest applications are speaker independent, so anyone can use them without training the device to learn a specific voice, and, unlike some earlier voice-activated dialing systems, today’s versions are embedded in the device rather than embedded in the network (e.g., Sprint’s initial voice dialing application), says Rich Geruson, VoiceSignal CEO. The company has released embedded speech applications for PalmOne Treo and Nokia smart phones.
Geruson explains that speaker-dependent applications require the user to speak each name and number into the device for later use in the voice-dialing mode. While that may be acceptable for those with a short contact list, entering a lengthy list of names and numbers is a cumbersome, time-consuming process. Some of today’s speech applications synchronize with contact lists, which can be transferred from desktop to handheld devices, so no additional input is necessary.
Alan Schwartz, vice president and general manager for embedded speech solutions for ScanSoft, Inc., adds that speaker-dependent applications require comparison of .wav files to confirm the user and continue the application (e.g., dialing a person). The speaker independent applications have no such requirement.
Applications for the Vision-Impaired
One of the latest applications for handheld devices is one designed for the vision-impaired by ScanSoft and Cingular. According to the American Foundation for the Blind, there are approximately 10 million people who are blind or have low vision. While there are numerous desktop applications designed for the vision-impaired, the Cingular TALKS program using ScanSoft’s ETI-Eloquence text-to-speech software is one of the first for the wireless phone market.
Erodio Diaz, who operates E. Diaz Real Estate & Associates Inc., has been completely blind since 1955, and has looked for ways to make himself as self-sufficient as possible ever since. He has had portable communications since the days of the “brick phone,” but found many of the Internet and text-messaging features of some of the more recent devices to be of little use due to his disability. While he could dial the phone (currently a Nokia 6620) easily enough and can type in text messages and emails, his disability had prevented him from checking typed messages for accuracy, so he was reluctant to send them.
With his real estate and mortgage business, Diaz estimates that he spends 50 to 60 percent of the time out of the office (with the aid of a sighted assistant). The increasing proliferation of email and text messaging has encouraged him to want to find a way to use those capabilities without being in the office or requiring a sighted person to do so.
A long-time customer of Cingular and its predecessor, Diaz had discussed the needs of the visually impaired with the company and found a sympathetic ear, so he eagerly agreed to be one of the pilot customers for the software, which was offered to all Cingular customers in September of 2004.
The application reads back numbers/letters as Diaz enters them, as well as entire messages, enabling Diaz to confirm the accuracy before sending. Similarly, the application reads any text messages or emails that Diaz receives on the device.
Additionally, Diaz can access spoken information about the phone’s battery level, network and signal strength, caller ID information, such as logs of incoming and outgoing calls, and other settings (i.e., ring tones) or tools (i.e., calculator).
“I’m three to four times more productive than I was before. That’s very meaningful to me,” Diaz says.
The only drawback Diaz cites is that the speech functionality doesn’t work when he uses the conference-calling feature. Even with this disadvantage, the mobile speech application “is wonderful,” Diaz says. “It’s opened a whole new world of accessibility.”
Warehouse Applications
Embedded speech is also making an impact in the distribution chain of different industries as companies outfit workers with wearable computers to make them more productive.
For example, CooperVision, Inc., the world’s fourth largest contact lens manufacturer, uses wearable computers in its Rochester, N.Y. manufacturing facility to aid workers in preparing 10,000 shipments each day. The facility has 25-30 headsets and wearable computers with Princeton, N.J.-based Voxware’s VoiceLogistics embedded speech application. They communicate via an 802.11b wireless wide area network. Similar applications in other distribution facilities may also incorporate RFID and Bluetooth wireless technology.
The Application tells the worker via the headset where to go, what to pick up and where to put it. The application is more than a simple recording one might hear at a museum, explains Jeff McCaffery, CooperVision’s logistics business analyst.
At CooperVision, the user scans orders into a file, which sends the information to the speech application. The user logs onto a personal cart, puts on the headset and computer, which instructs him what aisle contains a specific item, then where in the aisle to find that item. The worker speaks into the headset’s microphone to confirm each item as he retrieves it. The device tells the worker which slot in the cart to place the item. Then the device tells the employee where to obtain the next item.
“We’ve been growing at 15 to 20 percent per year. With 450,000 SKUs, we needed a way to continue to grow quickly without adding more staff,” McCaffery relates. Not only has the company accomplished that goal, it has also significantly reduced the mandatory overtime other employees work to one-seventh of what it was before the installation of the system.
There are non-speech systems that promise similar productivity enhancements using lights to direct the area to the right location, but these systems are more expensive than the Voxware or similar solutions, McCaffery says.
The Rochester installation was the first of a planned four-stage process. CooperVision also has facilities in Hamble, England; Huntington Beach, Calif.; Helsinki, Finland; Adelaide, Australia; Madrid, Spain; Norfolk, Va.; and Toronto, Canada.
CooperVision is an early adopter in the distribution market, according to Steve Gerrard, Voxware, Inc. vice president of marketing. The main impediment to the warehouse market for such applications had been noise from various machineries.
CooperVision employees can adjust the volume level of the headsets to the level needed to hear over any ambient noise. Evolution of the headsets, microphones and improvements in hardware and software have eliminated much of this problem, so Gerrard expects more demand in this market, as well as in medical, military and other environments where someone needs to receive communications while keeping his eyes and hands free.
“Five years ago [these applications] were limited to early adopters and cost savings were very tenuous,” says Gerrard. “In the last year, companies have moved beyond the experimental stage. Dozens are making investments in this technology.”
Vocollect, which offers a competing speech application for the warehouse environment, expects the company to continue its 60 to 70 percent growth rate in shipments, which vice president of product management Larry Sweeney says is reflective of the increasing adoption of speech-enabled applications.
Vocollect’s application is used in some food storage environments. Before that was a viable option, the hardware had to be able to withstand extreme hot and cold temperatures, high humidity as well as movement from a refrigeration area to a warm interior or exterior environment, Sweeney says.
Medical Applications
The increasing computing power of mobile devices has made older applications with speech capabilities more usable in a mobile environment. The Dragon Naturally Speaking® application has been around since the 1990s, but it has not been until the last year or so that tablet PCs and similar handheld devices have had the power to make efficient use of the program, according to Dr. Eric Fishman, an orthopedic specialist who started using the software 10 years ago and eventually became a reseller of the application, which is now owned by ScanSoft.
His interest in speech applications for the medical profession came from his interest in electronic medical records. Dr. Fishman tried to create a template to help physicians automatically enter information for different auto accident injuries. He created a list of 300 different injuries for a point-and-click application, but found that even that lengthy list was woefully inadequate in terms of actual injuries and related medical information.
Writing or typing the information is a tedious process, so Dr. Fishman was intrigued by the efficiency offered via the Dragon speech small portion of medical records, which take up rows and rows of file cabinets in medical facilities across the United States. Different health care providers and different pharmacies may have different information about a person, leading to potential conflicting information regarding medicines a person is taking, allergies to certain drugs, etc.
So Dr. Fishman is also a proponent of electronic medical record keeping, a process that embedded speech enabled applications on portable devices such as tablet PCs and PDAs facilitate for record-keeping by allowing doctors to speak all information into the device rather than using keyboard entry.
“There’s a tremendous push for increased usage of EMR to reduce errors,” says Dr. Fishman, who points to President Bush’s 2004 State of the Union address in which he called for increased use of electronic medical records.
Looking Ahead
Speech applications providers will continue to advance applications in 2005, adding unconstrained speech, speech-launched browser applications, and more applications for the visually impaired. Some of these applications exist in pilot stages today and will become more viable as the handheld devices continue to become more powerful.
As the developments in applications and in device processing power continues to evolve, speech in handheld devices will quickly move from the realm of early adopters to the early mass market.