YouTube’s recommender AI still a horrorshow, finds major crowdsourced study

For years YouTube’s video-recommending algorithm has stood accused of fuelling a grab-bag of societal ills by feeding users an AI-amplified diet of hate speech, political extremism and/or conspiracy junk/disinformation for the profiteering motive of trying to keep billions of eyeballs stuck to its ad inventory.

And while YouTube’s tech giant parent Google has, sporadically, responded to negative publicity flaring up around the algorithm’s antisocial recommendations — announcing a few policy tweaks or limiting/purging the odd hateful account — it’s not clear how far the platform’s penchant for promoting horribly unhealthy clickbait has actually been rebooted.

The suspicion remains nowhere near far enough.

New research published today by Mozilla backs that notion up, suggesting YouTube’s AI continues to puff up piles of ‘bottom-feeding’/low grade/divisive/disinforming content — stuff that tries to grab eyeballs by triggering people’s sense of outrage, sewing division/polarization or spreading baseless/harmful disinformation — which in turn implies that YouTube’s problem with recommending terrible stuff is indeed systemic; a side-effect of the platform’s rapacious appetite to harvest views to serve ads.

That YouTube’s AI is still — per Mozilla’s study — behaving so badly also suggests Google has been pretty successful at fuzzing criticism with superficial claims of reform.

The mainstay of its deflective success here is likely the primary protection mechanism of keeping the recommender engine’s algorithmic workings (and associated data) hidden from public view and external oversight — via the convenient shield of ‘commercial secrecy’.

But regulation that could help crack open proprietary AI blackboxes is now on the cards — at least in Europe.

To fix YouTube’s algorithm, Mozilla is calling for “common sense transparency laws, better oversight, and consumer pressure” — suggesting a combination of laws that mandate transparency into AI systems; protect independent researchers so they can interrogate algorithmic impacts; and empower platform users with robust controls (such as the ability to opt out of “personalized” recommendations) are what’s needed to rein in the worst excesses of the YouTube AI.

Regrets, YouTube users have had a few…

To gather data on specific recommendations being made made to YouTube users — information that Google does not routinely make available to external researchers — Mozilla took a crowdsourced approach, via a browser extension (called RegretsReporter) that lets users self-report YouTube videos they “regret” watching.

The tool can generate a report which includes details of the videos the user had been recommended, as well as earlier video views, to help build up a picture of how YouTube’s recommender system was functioning. (Or, well, ‘dysfunctioning’ as the case may be.)

The crowdsourced volunteers whose data fed Mozilla’s research reported a wide variety of ‘regrets’, including videos spreading COVID-19 fear-mongering, political misinformation and “wildly inappropriate” children’s cartoons, per the report — with the most frequently reported content categories being misinformation, violent/graphic content, hate speech and spam/scams.

A substantial majority (71%) of the regret reports came from videos that had been recommended by YouTube’s algorithm itself, underscoring the AI’s starring role in pushing junk into people’s eyeballs.

The research also found that recommended videos were 40% more likely to be reported by the volunteers than videos they’d searched for themselves.

Mozilla even found “several” instances when the recommender algorithmic put content in front of users that violated YouTube’s own community guidelines and/or was unrelated to the previous video watched. So a clear fail.

A very notable finding was that regrettable content appears to be a greater problem for YouTube users in non-English speaking countries: Mozilla found YouTube regrets were 60% higher in countries without English as a primary language — with Brazil, Germany and France generating what the report said were “particularly high” levels of regretful YouTubing. (And none of the three can be classed as minor international markets.)

Pandemic-related regrets were also especially prevalent in non-English speaking countries, per the report — a worrying detail to read in the middle of an ongoing global health crisis.

The crowdsourced study — which Mozilla bills as the largest-ever into YouTube’s recommender algorithm — drew on data from more than 37,000 YouTube users who installed the extension, although it was a subset of 1,162 volunteers — from 91 countries — who submitted reports that flagged 3,362 regrettable videos which the report draws on directly.

These reports were generated between July 2020 and May 2021.

What exactly does Mozilla mean by a YouTube “regret”? It says this is a crowdsourced concept based on users self-reporting bad experiences on YouTube, so it’s a subjective measure. But Mozilla argues that taking this “people-powered” approach centres the lived experiences of Internet users and is therefore helpful in foregrounding the experiences of marginalised and/or vulnerable people and communities (vs, for example, applying only a narrower, legal definition of ‘harm’).

“We wanted to interrogate and explore further [people’s experiences of falling down the YouTube ‘rabbit hole’] and frankly confirm some of these stories — but then also just understand further what are some of the trends that emerged in that,” explained Brandi Geurkink, Mozilla’s senior manager of advocacy and the lead researcher for the project, discussing the aims of the research.

“My main feeling in doing this work was being — I guess — shocked that some of what we had expected to be the case was confirmed… It’s still a limited study in terms of the number of people involved and the methodology that we used but — even with that — it was quite simple; the data just showed that some of what we thought was confirmed.

“Things like the algorithm recommending content essentially accidentally, that it later is like ‘oops, this actually violates our policies; we shouldn’t have actively suggested that to people’… And things like the non-English-speaking user base having worse experiences — these are things you hear discussed a lot anecdotally and activists have raised these issues. But I was just like — oh wow, it’s actually coming out really clearly in our data.”

Mozilla says the crowdsourced research uncovered “numerous examples” of reported content that would likely or actually breach YouTube’s community guidelines — such as hate speech or debunked political and scientific misinformation.

But it also says the reports flagged a lot of what YouTube “may” consider ‘borderline content’. Aka, stuff that’s harder to categorize — junk/low quality videos that perhaps toe the acceptability line and may therefore be trickier for the platform’s algorithmic moderation systems to respond to (and thus content that may also survive the risk of a take down for longer).

However a related issue the report flags is that YouTube doesn’t provide a definition for borderline content — despite discussing the category in its own guidelines — hence, says Mozilla, that makes the researchers’ assumption that much of what the volunteers were reporting as ‘regretful’ would likely fall into YouTube’s own ‘borderline content’ category impossible to verify.

The challenge of independently studying the societal effects of Google’s tech and processes is a running theme underlying the research. But Mozilla’s report also accuses the tech giant of meeting YouTube criticism with “inertia and opacity”.

It’s not alone there either. Critics have long accused YouTube’s ad giant parent of profiting off-of engagement generated by hateful outrage and harmful disinformation — allowing “AI-generated bubbles of hate” surface ever more baleful (and thus stickily engaging) stuff, exposing unsuspecting YouTube users to increasingly unpleasant and extremist views, even as Google gets to shield its low grade content business under a user-generated content umbrella.

Indeed, ‘falling down the YouTube rabbit hole’ has become a well-trodden metaphor for discussing the process of unsuspecting Internet users being dragging into the darkest and nastiest corners of the web. This user reprogramming taking place in broad daylight via AI-generated suggestions that yell at people to follow the conspiracy breadcrumb trail right from inside a mainstream web platform.

Back as 2017 — when concern was riding high about online terrorism and the proliferation of ISIS content on social media — politicians in Europe were accusing YouTube’s algorithm of exactly this: Automating radicalization.

However it’s remained difficult to get hard data to back up anecdotal reports of individual YouTube users being ‘radicalized’ after viewing hours of extremist content or conspiracy theory junk on Google’s platform.

Ex-YouTube insider — Guillaume Chaslot — is one notable critic who’s sought to pull back the curtain shielding the proprietary tech from deeper scrutiny, via his algotransparency project.

Mozilla’s crowdsourced research adds to those efforts by sketching a broad — and broadly problematic — picture of the YouTube AI by collating reports of bad experiences from users themselves.

Of course externally sampling platform-level data that only Google holds in full (at its true depth and dimension) can’t be the whole picture — and self-reporting, in particular, may introduce its own set of biases into Mozilla’s data-set. But the problem of effectively studying big tech’s blackboxes is a key point accompanying the research, as Mozilla advocates for proper oversight of platform power.

In a series of recommendations the report calls for “robust transparency, scrutiny, and giving people control of recommendation algorithms” — arguing that without proper oversight of the platform, YouTube will continue to be harmful by mindlessly exposing people to damaging and braindead content.

The problematic lack of transparency around so much of how YouTube functions can be picked up from other details in the report. For example, Mozilla found that around 9% of recommended regrets (or almost 200 videos) had since been taken down — for a variety of not always clear reasons (sometimes, presumably, after the content was reported and judged by YouTube to have violated its guidelines).

Collectively, just this subset of videos had had a total of 160M views prior to being removed for whatever reason.

In other findings, the research found that regretful views tend to perform well on the platform.

A particular stark metric is that reported regrets acquired a full 70% more views per day than other videos watched by the volunteers on the platform — lending weight to the argument that YouTube’s engagement-optimising algorithms disproportionately select for triggering/misinforming content more often than quality (thoughtful/informing) stuff simply because it brings in the clicks.

While that might be great for Google’s ad business, it’s clearly a net negative for democratic societies which value truthful information over nonsense; genuine public debate over artificial/amplified binaries; and constructive civic cohesion over divisive tribalism.

But without legally-enforced transparency requirements on ad platforms — and, most likely, regulatory oversight and enforcement that features audit powers — these tech giants are going to continue to be incentivized to turn a blind eye and cash in at society’s expense.

Mozilla’s report also underlines instances where YouTube’s algorithms are clearly driven by a logic that’s unrelated to the content itself — with a finding that in 43.6% of the cases where the researchers had data about the videos a participant had watched before a reported regret the recommendation was completely unrelated to the previous video.

The report gives examples of some of these logic-defying AI content pivots/leaps/pitfalls — such as a person watching videos about the U.S. military and then being recommended a misogynistic video entitled ‘Man humiliates feminist in viral video.’

In another instance, a person watched a video about software rights and was then recommended a video about gun rights. So two rights make yet another wrong YouTube recommendation right there.

In a third example, a person watched an Art Garfunkel music video and was then recommended a political video entitled ‘Trump Debate Moderator EXPOSED as having Deep Democrat Ties, Media Bias Reaches BREAKING Point.’

To which the only sane response is, umm what???

YouTube’s output in such instances seems — at best — some sort of ‘AI brain fart’.

A generous interpretation might be that the algorithm got stupidly confused. Albeit, in a number of the examples cited in the report, the confusion is leading YouTube users toward content with a right-leaning political bias. Which seems, well, curious.

Asked what she views as the most concerning findings, Mozilla’s Geurkink told ProWellTech: “One is how clearly misinformation emerged as a dominant problem on the platform. I think that’s something, based on our work talking to Mozilla supporters and people from all around the world, that is a really obvious thing that people are concerned about online. So to see that that is what is emerging as the biggest problem with the YouTube algorithm is really concerning to me.”

She also highlighted the problem of the recommendations being worse for non-English-speaking users as another major concern, suggesting that global inequalities in users’ experiences of platform impacts “doesn’t get enough attention” — even when such issues do get discussed.

Responding to Mozilla’s report in a statement, a Google spokesperson sent us this statement:

“The goal of our recommendation system is to connect viewers with content they love and on any given day, more than 200 million videos are recommended on the homepage alone. Over 80 billion pieces of information is used to help inform our systems, including survey responses from viewers on what they want to watch. We constantly work to improve the experience on YouTube and over the past year alone, we’ve launched over 30 different changes to reduce recommendations of harmful content. Thanks to this change, consumption of borderline content that comes from our recommendations is now significantly below 1%.”

Google also claimed it welcomes research into YouTube — and suggested it’s exploring options to bring in external researchers to study the platform, without offering anything concrete on that front.

At the same time, its response queried how Mozilla’s study defines ‘regrettable’ content — and went on to claim that its own user surveys generally show users are satisfied with the content that YouTube recommends.

In further non-quotable remarks, Google noted that earlier this year it started disclosing a ‘violative view rate‘ (VVR) metric for YouTube — disclosing for the first time the percentage of views on YouTube that comes from content that violates its policies.

The most recent VVR stands at 0.16-0.18% — which Google says means that out of every 10,000 views on YouTube, 16-18 come from violative content. It said that figure is down by more than 70% when compared to the same quarter of 2017 — crediting its investments in machine learning as largely being responsible for the drop.

However, as Geurkink noted, the VVR is of limited use without Google releasing more data to contextualize and quantify how far its AI was involved in accelerating views of content its own rules state shouldn’t be viewed on its platform. Without that key data the suspicion must be that the VVR is a nice bit of misdirection.

“What would be going further than [VVR] — and what would be really, really helpful — is understanding what’s the role that the recommendation algorithm plays in this?” Geurkink told us on that, adding: “That’s what is a complete blackbox still. In the absence of greater transparency [Google’s] claims of progress have to be taken with a grain of salt.”

Google also flagged a 2019 change it made to how YouTube’s recommender algorithm handles ‘borderline content’ — aka, content that doesn’t violate policies but falls into a problematic grey area — saying that that tweak had also resulted in a 70% drop in watchtime for this type of content.

Although the company confirmed this borderline category is a moveable feast — saying it factors in changing trends as well as context and also works with experts to determine what’s get classed as borderline — which makes the aforementioned percentage drop pretty meaningless since there’s no fixed baseline to measure against.

It’s notable that Google’s response to Mozilla’s report makes no mention of the poor experience reported by survey participants in non-English-speaking markets. And Geurkink suggested that, in general, many of the claimed mitigating measures YouTube applies are geographically limited — i.e. to English-speaking markets like the US and UK. (Or at least arrive in those markets first, before a slower rollout to other places.)

A January 2019 tweak to reduce amplification of conspiracy theory content in the US was only expanded to the UK market months later — in August — for example.

“YouTube, for the past few years, have only been reporting on their progress of recommendations of harmful or borderline content in the US and in English-speaking markets,” she also said. “And there are very few people questioning that — what about the rest of the world? To me that is something that really deserves more attention and more scrutiny.”

We asked Google to confirm whether it had since applied the 2019 conspiracy theory related changes globally — and a spokeswoman told us that it had. But the much higher rate of reports made to Mozilla of — a yes broader measure of — ‘regrettable’ content being made in non-English-speaking markets remains notable.

And while there could be others factors at play, which might explain some of the disproportionately higher reporting, the finding may also suggest that, where YouTube’s negative impacts are concerned, Google directs greatest resource at markets and languages where its reputational risk and the capacity of its machine learning tech to automate content categorization are strongest.

Yet any such unequal response to AI risk obviously means leaving some users at greater risk of harm than others — adding another harmful dimension and layer of unfairness to what is already a multi-faceted, many-headed-hydra of a problem.

It’s yet another reason why leaving it up to powerful platforms to rate their own AIs, mark their own homework and counter genuine concerns with self-serving PR is for the birds.

(In additional filler background remarks it sent us, Google described itself as the first company in the industry to incorporate “authoritativeness” into its search and discovery algorithms — without explaining when exactly it claims to have done that or how it imagined it would be able to deliver on its stated mission of ‘organizing the world’s information and making it universally accessible and useful’ without considering the relative value of information sources… So color us baffled at that claim. Most likely it’s a clumsy attempt to throw disinformation shade at rivals.)

Returning to the regulation point, an EU proposal — the Digital Services Act — is set to introduce some transparency requirements on large digital platforms, as part of a wider package of accountability measures. And asked about this Geurkink described the DSA as “a promising avenue for greater transparency”.

But she suggested the legislation needs to go further to tackle recommender systems like the YouTube AI.

“I think that transparency around recommender systems specifically and also people having control over the input of their own data and then the output of recommendations is really important — and is a place where the DSA is currently a bit sparse, so I think that’s where we really need to dig in,” she told us.

One idea she voiced support for is having a “data access framework” baked into the law — to enable vetted researchers to get more of the information they need to study powerful AI technologies — i.e. rather than the law trying to come up with “a laundry list of all of the different pieces of transparency and information that should be applicable”, as she put it.

The EU also now has a draft AI regulation on the table. The legislative plan takes a risk-based approach to regulating certain applications of artificial intelligence. However it’s not clear whether YouTube’s recommender system would fall under one of the more closely regulated categories — or, as seems more likely (at least with the initial Commission proposal), fall entirely outside the scope of the planned law.

“An earlier draft of the proposal talked about systems that manipulate human behavior which is essentially what recommender systems are. And one could also argue that’s the goal of advertising at large, in some sense. So it was sort of difficult to understand exactly where recommender systems would fall into that,” noted Geurkink.

“There might be a nice harmony between some of the robust data access provisions in the DSA and the new AI regulation,” she added. “I think transparency is what it comes down to, so anything that can provide that kind of greater transparency is a good thing.

“YouTube could also just provide a lot of this… We’ve been working on this for years now and we haven’t seen them take any meaningful action on this front but it’s also, I think, something that we want to keep in mind — legislation can obviously take years. So even if a few of our recommendations were taken up [by Google] that would be a really big step in the right direction.”