Showing posts with label Siri. Show all posts
Showing posts with label Siri. Show all posts

Thursday, November 10, 2011

Google : Siri is a serious threat

It sounded pretty good until Eric Schmidt said it: Siri, the so-called personal assistant app on Apple's iPhone 4S, is the new face of search. Siri is threatening to sideline the tried-and-true search box that Google turned into a cross between a wishing well and the most trusted way to navigate a rapidly sprawling web. Some said that Google should be concerned. Others, predictably, overreacted and labeled Siri a "Google killer."

Then last week Google's (GOOG) former CEO and current chair released his responses to Senate subcommittees looking into Google's dominance in the search industry. In the past few years, Schmidt has transitioned from a seasoned and successful CEO to something of a loose cannon who spends most of his time retracting, or explaining or laughing away his previous statements. So when Schmidt, citing some of those commentators who saw Apple's (AAPL) Siri as a Google competitor, suggested that Siri could be a force in search, he drew a skeptical response. Some said Schmidt was just downplaying Google's prominence in search. Others pointed out that Siri's own default search engine is Google.

But how can Google be a monopoly that is about to get its clock cleaned by Apple? The truth is less certain, if equally dramatic: The search industry is in the early stages of a disruptive period of change. It will look more like Siri than Google does today -- that is, it will have a more intuitive AI feel to it. Apple and Google -- and maybe even Microsoft (MSFT) -- will play a key role in shaping it. Which means it's well past time to be worrying about whether Google is a monopoly.

Thursday, November 3, 2011

Speech Recognition Through the Decades

Looking back on the development of speech recognition technology is like watching a child grow up, progressing from the baby-talk level of recognizing single syllables, to building a vocabulary of thousands of words, to answering questions with quick, witty replies, as Apple's supersmart virtual assistant Siri does.

Listening to Siri, with its slightly snarky sense of humor, made us wonder how far speech recognition has come over the years. Here's a look at the developments in past decades that have made it possible for people to control devices using only their voice.

1950s and 1960s: Baby Talk
The first speech recognition systems could understand only digits. (Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers.) Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. Ten years later, IBM demonstrated at the 1962 World's Fair its "Shoebox" machine, which could understand 16 words spoken in English.

Labs in the United States, Japan, England, and the Soviet Union developed other hardware dedicated to recognizing spoken sounds, expanding speech recognition technology to support four vowels and nine consonants.

They may not sound like much, but these first efforts were an impressive start, especially when you consider how primitive computers themselves were at the time.

1970s: Speech Recognition Takes Off

Speech Recognition Through the Decades: How We Ended Up With SiriSpeech recognition technology made major strides in the 1970s, thanks to interest and funding from the U.S. Department of Defense. The DoD's DARPA Speech Understanding Research (SUR) program, from 1971 to 1976, was one of the largest of its kind in the history of speech recognition, and among other things it was responsible for Carnegie Mellon's "Harpy" speech-understanding system. Harpy could understand 1011 words, approximately the vocabulary of an average three-year-old.

Harpy was significant because it introduced a more efficient search approach, called beam search, to "prove the finite-state network of possible sentences," according to Readings in Speech Recognition by Alex Waibel and Kai-Fu Lee. (The story of speech recognition is very much tied to advances in search methodology and technology, as Google's entrance into speech recognition on mobile devices proved just a few years ago.)

The '70s also marked a few other important milestones in speech recognition technology, including the founding of the first commercial speech recognition company, Threshold Technology, as well as Bell Laboratories' introduction of a system that could interpret multiple people's voices.

1980s: Speech Recognition Turns Toward Prediction
Over the next decade, thanks to new approaches to understanding what people say, speech recognition vocabulary jumped from about a few hundred words to several thousand words, and had the potential to recognize an unlimited number of words. One major reason was a new statistical method known as the hidden Markov model.

Rather than simply using templates for words and looking for sound patterns, HMM considered the probability of unknown sounds' being words. This foundation would be in place for the next two decades (see Automatic Speech Recognition—A Brief History of the Technology Development by B.H. Juang and Lawrence R. Rabiner).

Equipped with this expanded vocabulary, speech recognition started to work its way into commercial applications for business and specialized industry (for instance, medical use). It even entered the home, in the form of Worlds of Wonder's Julie doll (1987), which children could train to respond to their voice. ("Finally, the doll that understands you.")

However, whether speech recognition software at the time could recognize 1000 words, as the 1985 Kurzweil text-to-speech program did, or whether it could support a 5000-word vocabulary, as IBM's system did, a significant hurdle remained: These programs took discrete dictation, so you had … to … pause … after … each … and … every … word.

1990s: Automatic Speech Recognition Comes to the Masses
Speech Recognition Through the Decades: How We Ended Up With SiriIn the '90s, computers with faster processors finally arrived, and speech recognition software became viable for ordinary people.

In 1990, Dragon launched the first consumer speech recognition product, Dragon Dictate, for an incredible price of $9000. Seven years later, the much-improved Dragon NaturallySpeaking arrived. The application recognized continuous speech, so you could speak, well, naturally, at about 100 words per minute. However, you had to train the program for 45 minutes, and it was still expensive at $695.

The advent of the first voice portal, VAL from BellSouth, was in 1996; VAL was a dial-in interactive voice recognition system that was supposed to give you information based on what you said on the phone. VAL paved the way for all the inaccurate voice-activated menus that would plague callers for the next 15 years and beyond.

2000s: Speech Recognition Plateaus--Until Google Comes Along
By 2001, computer speech recognition had topped out at 80 percent accuracy, and, near the end of the decade, the technology's progress seemed to be stalled. Recognition systems did well when the language universe was limited--but they were still "guessing," with the assistance of statistical models, among similar-sounding words, and the known language universe continued to grow as the Internet grew.

Did you know speech recognition and voice commands were built into Windows Vista and Mac OS X? Many computer users weren't aware that those features existed. Windows Speech Recognition and OS X's voice commands were interesting, but not as accurate or as easy to use as a plain old keyboard and mouse.

Speech Recognition Through the Decades: How We Ended Up With SiriSpeech recognition technology development began to edge back into the forefront with one major event: the arrival of the Google Voice Search app for the iPhone. The impact of Google's app is significant for two reasons. First, cell phones and other mobile devices are ideal vehicles for speech recognition, as the desire to replace their tiny on-screen keyboards serves as an incentive to develop better, alternative input methods. Second, Google had the ability to offload the processing for its app to its cloud data centers, harnessing all that computing power to perform the large-scale data analysis necessary to make matches between the user's words and the enormous number of human-speech examples it gathered.

In short, the bottleneck with speech recognition has always been the availability of data, and the ability to process it efficiently. Google's app adds, to its analysis, the data from billions of search queries, to better predict what you're probably saying.

In 2010, Google added "personalized recognition" to Voice Search on Android phones, so that the software could record users' voice searches and produce a more accurate speech model. The company also added Voice Search to its Chrome browser in mid-2011. Remember how we started with 10 to 100 words, and then graduated to a few thousand? Google's English Voice Search system now incorporates 230 billion words from actual user queries.

Speech Recognition Through the Decades: How We Ended Up With Siri And now along comes Siri. Like Google's Voice Search, Siri relies on cloud-based processing. It draws what it knows about you to generate a contextual reply, and it responds to your voice input with personality. (As my PCWorld colleague David Daw points out: "It's not just fun but funny. When you ask Siri the meaning of life, it tells you '42' or 'All evidence to date points to chocolate.' If you tell it you want to hide a body, it helpfully volunteers nearby dumps and metal foundries.")

Speech recognition has gone from utility to entertainment. The child seems all grown up.

The Future: Accurate, Ubiquitous Speech
The explosion of voice recognition apps indicates that speech recognition's time has come, and that you can expect plenty more apps in the future. These apps will not only let you control your PC by voice or convert voice to text--they'll also support multiple languages, offer assorted speaker voices for you to choose from, and integrate into every part of your mobile devices (that is, they'll overcome Siri's shortcomings).

The quality of speech recognition apps will improve, too. For instance, Sensory's Trulyhandsfree Voice Control can hear and understand you, even in noisy environments.

As everyone starts becoming more comfortable speaking aloud to their mobile gadgets, speech recognition technology will likely spill over into other types of devices. It isn't hard to imagine a near future when we'll be commanding our coffee makers, talking to our printers, and telling the lights to turn themselves off.

Monday, October 24, 2011

Apple iPhone 4S 16GB

The iPhone 4S ($200 for 16GB with a two-year contract from AT&T) might not be the most exciting iPhone to date, but don’t write it off: The improved camera, faster processor, and the addition of the Siri personal assistant make for one powerful smartphone. If you’re upgrading from a 3G or a 3GS, you’ll see a huge difference. But if you’re currently rocking an iPhone 4, you might want to wait for the next upgrade. The phone's iOS still has a few things that irk me, and I wasn’t overly pleased with the call quality (though no “antenna gate” issues this time), but otherwise the iPhone 4S impresses.

The Same Premium Design

The iPhone 4S is largely identical in design to the iPhone 4, which, in my opinion, isn’t necessarily a bad thing. It took a bit of time for me to get used to the iPhone 4’s slightly rectangular shape when I reviewed it last year. I grew to appreciate its aesthetic, however, which is both distinctly Apple and different from the pack of other high-end smartphones.

The overall design exudes elegance--from the rounded, individual volume up and down buttons to the ring/silent switch and the power/sleep button up top. Like last year's black iPhone review unit, the face and back are made of glass that is specially treated to withstand scratches and oily fingers, according to Apple. Despite the company's claims, though, I found the front and back of that earlier unit covered with fingerprints after only a couple of hours of use. This year, I got a white review unit and found fingerprints to be less of an issue.