It appears to be unanimous: Compared to the other chatbots on the market, Google’s Bard is the boring one. In a more or less positive assessment, Vox called Bard’s answers “dry and uncontroversial.” Our own test results beg to differ. Dry? Absolutely. Uncontroversial? Not if you scratch beneath the surface.
Yes, Bard is boring…in a way
Yes, Bard’s name — a term for a type of poet, often used in reference to Shakespeare — is sort of hilarious in light of how steadfastly artless the chatbot’s answers manage to be. For instance, I asked GPT-3.5, GPT-4, and Bard to start writing a good fireside scary story. OpenAI’s models shot for the moon (literally in one case).
Here’s GPT-3.5’s intriguing response:
Credit: OpenAI / Screengrab
GPT-4’s is absolute madness:
Credit: OpenAI / Screengrab
Bard, meanwhile, plopped out this dud:
Credit: Google / Screengrab
Bard always gives the user three drafts of a response, but this prompt only resulted in two. There were two identical “I saw something in the woods tonight” drafts, and one slight variation: “I heard a voice in the woods last night.” These are deflatingly boring, and one might reasonably call them disappointing.
Bard sometimes gives unpopular answers to controversial questions
Being aggressively straightforward doesn’t always make a chatbot boring. In fact, it can be provocative. What’s more, allowing itself three drafts each time it answers seems to — whether accidentally, or on purpose — give Bard the leeway it needs to give straightforward answers that are sometimes downright bold.
Look how the bots answer a question about the most populous country on Earth, when the prompt demands extreme brevity:
Credit: OpenAI / Screengrab
Credit: OpenAI / Screengrab
Credit: Google / Screengrab
The GPT models said China, and Bard said India. It’s worth noting that Bard did produce one draft of three that said China. However, after five more tries each, I could not get either GPT model to say India even once.
Is Bard “wrong”? It depends. It just so happens that humanity has been in a demography donut hole for several years on this topic — long enough to make the relative ages of the models’ training data unimportant. Some contrarians started saying India’s population had surpassed China’s about five years ago, but officially it still hasn’t, because the data isn’t there yet. China is still the right answer on paper, but the common sense right answer may well be India.
So while Bard may be earning a reputation for giving boring answers, this wasn’t “the point,” contrary to Vox’s speculation, according to Google itself. Instead, Google’s overview document about Bard says the chatbot is supposed to contain a diversity of possible answers without being offensive. “Training data, including from publicly available sources, reflects a diversity of perspectives and opinions. We continue to research how to use this data in a way that ensures that an LLM’s response incorporates a wide range of viewpoints, while preventing offensive responses.”
Bard doesn’t use offensive language, but it might still offend
“Offensive” is, of course, in the eye of the beholder. It may offend some, for instance, when Bard makes the following rather bold and specific claim about fetal pain sensitivity starting as early as 24 weeks:
Credit: Google / Screengrab
OpenAI’s models are far less apt to give answers like this. Here’s GPT-3.5’s non-answer:
Credit: OpenAI / Screengrab
And here’s GPT-4’s somewhat more substantive response:
Credit: OpenAI / Screengrab
And it’s worth noting that, with persistence, OpenAI’s models would provide more provocative responses (At one point, GPT-4 even asserted that the answer was “20 weeks”).
Again, the point is not that one model gives answers that are more right. It is, instead, that when wading into genuinely difficult territory, Bard has a tendency to nonetheless give answers that are straightforward, and this sometimes means bucking the perceived wisdom.
Bard is trained on some less-than-boring geopolitical concepts
I tested Bard and the GPT twins on their responses to the single thorniest problem in all of geopolitics — how to connect Gaza and the West Bank under a hypothetical two-state solution for Israel-Palestine. None of these chatbots are going to achieve peace in the Middle East, to be clear, but it’s useful to see how they might try.
When I asked all three models for “a single, brief paragraph presenting one single all-encompassing solution,” to this problem, GPT-3.5 and 4 were steadfastly committed to the most popular option favored by diplomats: “the creation of a secure and efficient transportation corridor between the two territories,” in the words of GPT-3.5. GPT-4, similarly, called for a “a secure, high-speed transportation corridor.” In other words, Gaza and the West Bank can be connected by a road through Israel.
Fascinatingly, Bard’s first proposal was much, much less conventional: “The only way to resolve the Gaza-West Bank discontiguity problem is to create a single, contiguous Palestinian state. This would require Israel to withdraw from the West Bank and Gaza Strip, and for the Palestinian Authority to take control of all of Palestinian territory. This would allow the Palestinians to exercise self-determination and build a viable state.”
In the interest of not sparking an international incident, let’s be clear: Bard provided other, tamer answers. But this was its first attempt. And this seems to be the core of Bard’s mission. Google says all large language models trained on multiple sides of a controversy may “incorporate views on certain sides of controversial social or political issues into their responses.”
Bard seems to incorporate such views more willingly than OpenAI’s models, which can make for an eye-opening chatbot experience. And that supposed boringness you’ve heard so much about? That might just be a disguise.