Google recently hosted a two day workshop (April 29-30, 2014) at their London office focusing on both speech recognition and speech synthesis. They invited a total of 68 students and post-docs from across Europe. The attendees had a wide range of experience—I met someone just two months into their Phd and others who have been working in the field for years with dozens of publications. No proceedings were associated with the workshop, but each of the attendees were asked to present a poster of their work. Most seemed to present a poster from a previous conference (I used the poster from my recent ASRU paper).
Overall, I had a great time at the workshop, and I think most attendees would say the same thing. If Google demonstrated one thing, it is that they take very good care of their guests and employees. We were well fed and stayed in a nice hotel in the center of London. We spent three days in London, discussing state-of-the-art research and meeting new people, without spending a cent of our own money.
As the workshop began, I could sense the uncertainty from the other attendees. None of us really knew what the goal of the workshop would be. Some assumed it would be two days of hearing how great Google is and we would be handed job applications as we walked out the door. In truth, it was nothing like that.
The majority of the workshop consisted of Google researchers giving talks, not necessarily describing how great the Google product family is, but their personal research. Google is known by their products. Nearly every attendee uses Gmail, Google search, and Scholar. Half of us have smartphones running Android. I think they really wanted to make the point that, yes, Google is a company that makes products, but the research scientists are doing the cutting edge research that makes those products possible.
Obviously the Google employees take great pride in the work they do. Unlike in academia, they have the benefit of seeing their ideas work, not just on toy datasets, but in real world technologies used by millions of people. Often the papers they publish are about techniques that have a real world impact. This also explains why they sometimes have difficulty using research done in academia. If I have a technique that works well on a small dataset, there is little evidence that it would scale to a live system trained on thousands of hours of speech. Since academic labs do not have the resources of Google, it is difficult to prove the efficacy of a technique to the level desired by a large company like Google, Microsoft, or Apple.
While all of the research talks were interesting, it is the anecdotes and trivia that really stick in my mind. Below are some of the more interesting details I remember.
Google Speech Recognition System
Their typical ASR system is a hybrid DNN-HMM system. Input are 26 frames of mel-scale filter bank features. Typically systems from other labs use a window with an equal amount of before and after context. Since Google focuses on real-time performance, they only allow five frames of future context. This behemoth of a system uses 8 hidden layers of 2560 units each, giving a total of 85 million parameters. Approximately 2,000 hours of speech are used to train the system.
There is a large focus on optimization. They use lots of tricks to reduce the total amount of computation. Surprisingly, they even have an offline system in Android that uses about 3 million parameters. Even with that huge reduction in parameters, they only lose about 30% in relative accuracy.
The web consists of approximately 60 trillion unique addresses. Google indexes around 10 billion of those per day and maintains an index that takes 100 petabytes of disk space. Everyday Google handles 3 billion searches. One amazing fact is that around 15% of those searches have never been seen before.
While I have no idea what the implementation details are, Google maintains a knowledge graph of over 500 million entries and 18 billion facts. This represents discrete knowledge about the world, such as celebrities, sports teams, and countries. This helps for disambiguation and better understanding of search queries.
Google Approach to Research
Obviously their research is heavily tied to their current end goal, improved voice search. Within that space, there is apparently a lot of freedom. General expectations are that your project should bear fruit within a year. You can play with toy datasets, but you quickly need to move to real-world experiments. They tend to use their Icelandic system as an initial baseline for testing new ideas.
Rarely do they work towards a specific product deadline. Instead their goals are more performance-based. Publication is encouraged, but not required. In general, any paper published needs to have the ideas patented first. Most other companies have a similar policy. In the case of Google, they claim the patents are purely for defensive purposes. Since I never hear of Google suing smaller companies for patent infringement, I assume this is true.