For over two decades of Search, we’ve been at the forefront of innovation in language understanding to help deliver on our mission of making the world’s information more accessible and useful for everyone. We’ve seen how critical these advancements are to making information more helpful, and being able to better connect people to creators, publishers and businesses on the web. It’s this constant improvement in understanding human language that’s enabled us to send more traffic to the web every year since Google was created.
We’ve also seen how AI models have significantly improved language innovation. Each successive milestone, from neural nets, to BERT, to MUM, has blown us away with the step changes in information understanding they’ve offered. But with each step forward, we look closely at the limitations and risks new technologies can present.
Across Google, we have been examining the risks and challenges associated with more powerful language models, and we’re committed to responsibly applying AI in Search. Here are some of the ways we do that.
Training on high quality data
We pretrain our models on high-quality data to reduce their potential to perpetuate undesirable biases that may exist in web content. In the case of MUM, we ensured that training data from the web was designated as high-quality based on our search quality metrics, which are informed by our Search Quality Rater Guidelines and driven by our quality rating and evaluation system. This substantially reduces the risk of training on misinformation or explicit content, for example, and is key to our approach.
And as part of our efforts to build a Search experience that works for everyone, MUM was trained on over 75 languages from around the world.
Rigorous Evaluation
Every improvement to Google Search undergoes a rigorous evaluation process to ensure we’re providing more relevant, helpful results. Our Search Quality Rater Guidelines are our north star for how we evaluate great search results. Human raters follow these guidelines and help us understand if our improvements are better fulfilling people’s information needs.
This evaluation process is central to the responsible application of any improvement to Search, whether we’re introducing powerful new systems like BERT or MUM, or simply adding a new feature.
Some changes are bigger than others, so we have to adjust our process accordingly. At the time of its introduction to Search, BERT impacted 1 in 10 English-language queries, so we scaled our evaluation process to be even more rigorous than usual. We subjected our systems to an unprecedented amount of scrutiny, increasing both the scale and granularity of quality testing, to help ensure they weren’t introducing concerning patterns into our systems.
While our standard evaluation process helps us judge launches across a representative query stream, for some improvements, we also more closely examine whether changes provide quality gains or losses across specific slices of queries, or topic areas. This allows us to identify if concerning patterns exist and pursue mitigations before launching an improvement to Search.
Search is not perfect, and any application of AI will not be perfect — this is why any change to Search involves extensive and constant evaluation and testing.
Responsible application design
In addition to working with responsibly designed and trained models, the thoughtful design of products and applications is key to addressing some of the challenges of language models. In Search, many of these critical mitigations take place at the application level, where we can focus on the end-user experience and more effectively manage risk in smaller models designed for specific tasks.
When we adopt new AI technologies such as BERT or MUM, they’re able to help improve individual systems to perform tasks more efficiently and effectively. This approach allows us to focus the scope of our evaluation and understand if an application is introducing concerning patterns. In the event that we do find concerning behavior, we’re able to design much more targeted solutions.
Minding our footprint
Training and running advanced AI models can be energy consumptive. Another benefit of training smaller, application-specific models is that the energy costs of the larger base model, such as MUM, are amortized over the many different applications.
The Google Research team recently published research detailing the energy costs of training state-of-the art language models, and their findings show that combining efficient models, processors, and data centers with clean energy sources can reduce the carbon footprint of a model by as much as one thousand-fold — and we follow this approach to train our models in Search.
Language models in practice
New language models like MUM have enormous potential to transform our ability to understand language and information about the world. And while they may be powerful, they do not make our existing systems obsolete. Today, Google Search employs hundreds of algorithms and machine learning models, none of which are wholly reliant on any singular, large model.
Amongst these hundreds of applications are systems and protections designed specifically to ensure you have a safe, high quality experience. For example, we design our ranking systems to surface relevant and reliable information. Even if a model were to present issues around low quality content, our systems are built to counteract this.
As we’re able to introduce new technologies like MUM into Search, they’ll help us greatly improve our systems and introduce entirely new product experiences. And they can also help us tackle other challenges we face. Improved AI systems can help bolster our spam fighting capabilities and even help us combat known loss patterns. In fact, we recently introduced a BERT-based system to better identify queries seeking explicit content, so we can better avoid shocking or offending users not looking for that information, and ultimately make our Search experience safer for everyone.
We look forward to making Search a better, more helpful product with improved information understanding from these advanced language models, and bringing these new capabilities to Search in a responsible way.