Voice Commerce

views updated


A system of advanced speech technology, voice commerce (or v-commerce) lets computers understand and respond to natural human speech via telephone. Essentially a "voice Web page" used in tandem with Internet Web sites, v-commerce uses speech recognition technology to allow customers to interact with a business' Web site to place orders, check status, or obtain information by simply speaking into the telephone. These technologies also allow callers to access e-mail, traffic information, news, sports scores, stock prices, and travel information, putting Web-based information in the hands of customers anytime, anywhere.

These systems, called voice portals or "vortals," are attractive to businesses wanting to save on labor costs and improve customer service. By using these systems, call center staffing can be reduced by directing calls to the interactive voice portal, a system whose accuracy rivaled that of human agents by 2002. Efficiency is also increased; voice applications' direct integration into a company's Web site eliminates the need for separate databases, and numerous calls may be handled simultaneously, increasing call volume while reducing customer attrition through lengthy on-hold times.

What sets v-commerce apart from general e-commerce is customers' use of mobile phones. By 2005, over 45 million American cell phone owners will access voice portals regularly, predicts the Kelsey Group, a new economy research and analysis organization. For instance, a customer with a computer, Internet connection, and landline phone might choose the computer to access a company's Web site. But once the same client is out of the office, Internet accessibility is largely available via wireless devices like palmtop computers and cell phones with text messaging. But since these devices are inappropriate for use while driving, and their tiny screens and minis-cule touch pads are difficult to maneuver, v-commerce has emerged as a logical market.

Despite its high-tech applications, v-commerce can also target low-tech customers. Customers with no Internet access at all can telephone familiar businesses like Amazon.com, opening a new customer base to online companies. The systems can even be accessed by old-style pulse phones that do not generate touch tones. Consumers in this sector benefiting from speech recognition technology include the elderly, children, drivers, and the blind and otherwise disabled as well as travelers separated from their computers.

Although many voice portals are merely an addition to a company's sales platforms (store, catalog, and Web site), others rely on advertising for support. Some voice portals offer advertisers "per-hit," fee-based audio spots, charging higher rates to target specific audiences. For accurate billing, the voice portal must install the capability to count the number of hits for each three-to-five second spot.

Simple voice recognition systems have been in place for many years. American Airlines and United Airlines use speech recognition to convey flight arrival times. Sears uses speech recognition to connect callers to the right department without delay; its system handles 200,000 calls daily and paid for itself in 90 days.

Other systems include more complex, interactive services. Office Depot's v-commerce number takes orders over the phone, recording quantity and color for future orders; the company reports that 5 percent of its retail catalog sales are handled through speech recognition at an average savings of 88 percent per call.

In addition to taking orders, voice portals can also facilitate delivery, return, and exchange. In the case of United Parcel Service, voice recognition handles half of its incoming calls at up to 90 seconds faster than calls with human interaction. These v-commerce interactions include order placement, issue and review of tracking numbers, and product availability.

Another service-oriented application involves automated oil-change reminder calls for clients who agree to participate. Jiffy Lube's two-way phone system lets customers tell the computer to connect them to a service center, send the reminder to their home, or even shop for the company's products. Since many provide their cell numbers, they are likely to receive the call in their cars, allowing them to confirm the timing for an oil change with their mileage.

V-commerce technology has evolved rapidly, improving on some familiar interactive telephone technologies. Interactive voice response (IVR) systems, in which customers respond to voice prompts using push button touch tones, force customers to wade through layers of voice menus to find information. But with the addition of Automated Speech Recognition (ASR) engines, IVRs can bypass touch tones in favor of vocal recognition and communication.

Voice commerce comprises three technologies. With speech recognition, the caller's words are transformed to text, while text-to-speech (TTS) translates written words to speech. Using speech recognition, customer commands are input as text, as if the client were interacting with the Web site via computer keyboard. The computer then answers customer queries by converting its text response into audible speech. Lastly, speech authentication can identify individual callers by "listening" to their unique voices, eliminating the need for Personal Identification Numbers (PINs) or passwords.

Although ASRs use complex algorithms to translate mispronunciations and mumbles, problems still affected speech recognition in 2002. These included background noise from cell phones, confusion over homonyms, echo, traffic noise, sudden noise activating speech recognition, and unusual user accents. However, "barge-in" technologies have developed that allow callers to speak over outbound voice prompts, while speech-enabled scripting languages like VoiceXML are starting to extend Internet capabilities to the telephone by creating audio dialogue.

Companies can opt for multilingual speech applications for regional populations—Spanish services for large Hispanic communities in New York, Florida, Texas, California, and Puerto Rico, for example. These technologies are expanding internationally, with numerous systems operating in Scandinavian countries, Italy, France, Germany, and Central America. Development of a new language ASR engine requires data from about two thousand native speakers, while a new TTS platform translates text to speech by using significant data from just one speaker.

Some larger Internet companies—Yahoo.com, America Online, and Priceline.com—have developed their own voice portals. America Online bought its own voice portal company (Quack.com) to develop the AOL by Phone service, which provides Web-based telephone e-mail and news headlines. But others find it more efficient to outsource, purchasing voice portal development and maintenance services from companies like BeVocal, TellMe, NetByTel, HeyAnita, and VocalPoint Technologies.


Eagle, Gene. "From E-Commerce to V-Commerce: The Voice Portal Revolution." Speech Technology Magazine, September/October 2000. Available from http://www.speechtechmag.com/issues/5_5/cover/217-1.html.

Easton, Jaclyn. "V-Commerce: The Voice Advantage." Going Wireless. Harperbusiness, May 2002.

Harman, Greg. "Understanding V-Commerce." Voice XML Review, January 2002. Availble from http://www.voicexmlreview.org/Jan2002/features/vcommerce.html.

Neustein, Amy. "Untangling V-Commerce: Building Intelligence into Voice-Based Apps." Wireless Report, April 2002. Available from http://www.wirelessreport.net/mcommerce/april02.

Kanaley, Reid. "Voice Portal Technology Lives On." The Philadelphia Inquirer, 26 June 2001.

Markoff, John. "Operator? Give Me the World Wide Web, and Make It Snappy." The New York Times, 6 October 1998.

Mozer, Todd. "The Third Wave: Speech in Consumer Electronics." Speech Technology Magazine, July/August 2001. Available from http://www.speechtechmag.com/issues/5_4/cover/205-1.html.

Schalk, Tom. "The Evolution of Global Speech Technology." Speech Technology Magazine, March/April, 2002. Available from http://www.speechtechmag.com/issues/7_2/avios/579-1.html.

White, George. "All You Have to Do Is Call." Speech Technology Magazine, September/October 2001. Available from http://www.speechtechmag.com/issues/6_5/cover/31-1.html.