It appears that Microsoft’s AI chatbot is an election truther.
According to a new study conducted by two nonprofit groups, AI Forensics and AlgorithmWatch, Microsoft’s AI chatbot failed to correctly answer one out of three election-related questions.
Microsoft’s chatbot makes up controversies about political candidates
The chatbot, formerly known as Bing Chat (it has since been renamed to Microsoft Copilot), didn’t just get basic facts wrong either. Yes, the study found that Copilot would provide incorrect election dates or outdated candidates. But, the study also found that the chatbot would even completely make up stories such as controversies about the candidates.
For example, in once instance mentioned in the study, Copilot shared information about German politician Hubert Aiwanger. According to the chatbot, Aiwanger was involved in a controversy regarding the distribution of leaflets that spread misinformation about COVID-19 and the vaccine. However, there was no such story. The chatbot appeared to be pulling information about Aiwanger that came out in August 2023 where he spread “antisemitic leaflets” when he was in high school more than 30 years ago.
The creation of these made-up narratives in AI language models is commonly known as “hallucinations.” However, the researchers involved with the study say that’s not an accurate way to describe what’s going on.
“It’s time we discredit referring to these mistakes as ‘hallucinations’,” said Applied Math Lead and Researcher at AI Forensics Riccardo Angius in a statement. “Our research exposes the much more intricate and structural occurrence of misleading factual errors in general-purpose LLMs and chatbots.”
The AI chatbot’s question dodging alarmed researchers
The study also found that the chatbot evaded directly answering questions around 40 percent of the time. Researchers state that this is preferable to making answers up in instances where the chatbot does not have relevant information. However, researchers were concerned about how simple some of the questions the chatbot evaded were.
Another issue, according to the researchers, is that the chatbot did not appear to improve over time as it seemingly had access to more information. The incorrect answers were consistently incorrect, even if the incorrect answer provided by the chatbot had changed when asked a question multiple times.
In addition, the study also found that the chatbot performed even worse in languages other than English, like German and French. For example, the study found that answers to questions asked in English resulted in an answer containing a factual error 20 percent of the time. When asked in German, the number of times an incorrect answer was provided jumped to 37 percent. The number of times the chatbot evaded answering a question in either language was much closer, with evasion occurring 39 percent and 35 percent of the time, respectively.
Researchers say that they contacted Microsoft with the study’s findings and were told these issues would be addressed. However, researchers conducted more samples a month later and found “little had changed in regard to the quality of the information provided to users.”
“Our research shows that malicious actors are not the only source of misinformation; general-purpose chatbots can be just as threatening to the information ecosystem,” said AI Forensics Senior Researcher Salvatore Romano in a statement. “Microsoft should acknowledge this, and recognize that flagging the generative AI content made by others is not enough. Their tools, even when implicating trustworthy sources, produce incorrect information at scale.”
As AI becomes more prevalent in online platforms, studies like this one certainly provide reasons to be worried. Users are increasingly turning to AI chatbots to simplify their routine and increase productivity. The assumption is that these chatbots, with unlimited knowledge at their fingertips, will provide accurate information. This is simply not the case.
“Until now, tech companies have introduced societal risks without having to fear serious consequences,” said Senior Policy and Advocacy Manager at AlgorithmWatch Clara Helming. “Individual users are left to their own devices in separating fact from AI-fabricated fiction.”
As we head into a presidential election year in the U.S., it’s clear that there are potential election integrity issues at stake. With that in mind, researchers added their conclusion to their study: These problems will not be fixed by the companies alone. AI must be regulated.