Content creators and ordinary Google users alike are harmed by artificial intelligence summaries of search queries, because the summaries are often inconsistent, presenting disreputable results and withholding information from better sources that restrict Gemini training bots, researchers from New Jersey Institute of Technology found.
Power users have suspected these concerns for themselves, but now it’s documented after examining more than 14,000 search results, explained Riley Grossman, a fourth-year doctoral student in business data science, whose paper How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews will be presented in Australia this summer at the ACM Conference on Research and Development in Information Retrieval.
The results come from research focused on understanding how the digital publishing industry responds to potential threats, Grossman said. His team, led by professors Yi Chen and Cristian Borcea in NJIT’s management school and computing college, initially looked at online privacy regulations. They added an emphasis on digital search impacts after seeing publishing companies cite that newsroom layoffs were due to generative artificial intelligence taking away search clicks, just as online news itself decimated print journalism.
For publishers, generative AI “disrupts the way that users see information presented to them. It changes both the way that they get information from the web — so now instead of clicking links, clicking sources, you might just read an AI summary of those sources — and then secondarily it redirects you or refers you to a different list of sources,” Grossman noted. “If you only look at the links in the AI overview, you're going to end up on very different websites than if you look at the links that are in the organic search results below the AI overview.”
Summaries are conceptually good for users because they can save time and bring answers straight to the top — if you trust Google’s judgment on your behalf. “Some users are frustrated by it, some users feel it's very convenient, depending on the setting. I would say right now we're in a state where it's disrupting the search, which is impacting publishers, but the way that it's impacting publishers is still a little bit undetermined.”
Reliable sources that opt out of Gemini training tend to also appear less often, or not at all, in Google AI overviews. “Those lists of sources for the same query are extremely different, to the point where there's almost no overlap.” Opting out of Gemini results has been a bold and potentially risky power move for publishers like Nature and The New York Times because it leads to barely being included in Google’s AI search overviews at all.
The reasons are unclear. “Google is notoriously not willing to play ball on some of those questions. When publishers say you're stealing our traffic, Google's response so far has been we're actually not stealing your traffic, we're still referring the same amount of traffic to publishers, and we think that we're actually providing you more engaged traffic … Google's maybe in denial about some of this stuff,” Grossman explained. “So asking them why it refers people to more niche content than to the mainstream publishers isn't going to be a viable strategy right now.”
Meanwhile, de-emphasizing conventional sources and propping up niche ones can be good or bad, in providing searchers with wider perspectives but often based on wrong information. Other times it leads to AI hallucinations. The research described a query about an upcoming fight between Youtuber Jake Paul and professional fighter Anthony Joshua. The summary stated that Paul won the match “in a major upset” by unanimous judges’ decision. When the fight actually happened, Joshua won in a knockout and broke Paul’s jaw in two places.
Still, many people are satisfied with the overview answer and never click a link inside of the results, known as a zero-click search in Grossman’s world. “This is just a complete erasure for the site traffic. It's completely removing that traffic from any website,” he noted. Without clicks, legitimate news sources could reduce their writing staff or close entirely — but without reliable sources, Google would lose its search dominance — so the relationship is co-dependent.
Grossman said one potential path forward is licensing deals between search providers and media companies. For example, the Associated Press news service is already licensed by Google, he said. However, “The problem is that if they're only one-off deals and the AI company, without any sort of regulation to be backing these deals, really holds all the power because they're coming to the table saying, ‘You can either take this deal or we can continue to scrape your content for free and use it for free, and the only thing you can do is enter a really long, arduous legal battle with us.’ — that's certainly my concern.
“I would say there needs to be some combination of a regulatory side or an industry framework that gets developed,” Grossman concluded. “It would be open-source, everybody knows what everyone else is getting paid, and then they can demand a fair rate based on that. It becomes an actual market where you can leverage the competition you have with other publishers.”
Co-author Cristian Borcea -- one of Grossman's advisers and a professor of computer science -- added that the research methods also apply to fields such as politics and healthcare. Most political searches resulted in AI overviews showing less-credible sources that have bias, such as whether immigration law should be reformed. In healthcare, Borcea said, the team has ongoing work showing that web traffic to government healthcare sites decreased significantly in the last 18 months, likely due to a combination of AI search overviews and established healthcare information being removed from the web.