Data Science at Home

The dark side of AI: recommend and manipulate (Ep. 90)

Dec 11 '19

In 2017 a research group at the University of Washington did a study on the Black Lives Matter movement on Twitter. They constructed what they call a “shared audience graph” to analyse the different groups of audiences participating in the debate, and found an alignment of the groups with the political left and political right, as well as clear alignments with groups participating in other debates, like environmental issues, abortion issues and so on. In simple terms, someone who is pro-environment, pro-abortion, left-leaning, is also supportive of the Black Lives Matter movement, and viceversa.

F: Ok, this seems to make sense, right? But… I suspect there is more to this story?

So far, yes…. What they did not expect to find, though, was a pervasive network of Russian accounts participating in the debate, which turned out to be orchestrated by the Internet Research Agency, the not-so-secret Russian secret service agency of internet black ops. The same connected with the US election and Brexit referendum, allegedly.

F: Are we talking about actual spies? Where are you going with this?

Basically, the Russian accounts (part of them human and part of them bots) were infiltrating all aspects of the debate, both on the left and on the right side, and always taking the most extreme stances on any particular aspect of the debate. The aim was to radicalise the conversation, to make it more and more extreme, in a tactic of divide-and-conquer: turn the population against itself in an online civil war, push for policies that normally would be considered too extreme (for instance, give tanks to the police to control riots, force a curfew, try to ban Muslims from your country). Chaos and unrest have repercussions on international trade and relations, and can align to foreign interests.

F: It seems like a pretty indirect and convoluted way of influencing a foreign power…

You might think so, but you are forgetting social media. This sort of operation is directly exploiting a core feature of internet social media platforms. And that feature, I am afraid, is recommender systems.

F: Whoa. Let’s take a step back. Let’s recap the general features of recommender systems, so we are on the same page.

The main purpose of recommender systems is to recommend people the same items similar people show an interest in. Let’s think about books and readers. The general idea is to find a way to predict the best book to the best reader. Amazon is doing it, Netflix is doing it, probably the bookstore down the road does that too, just on a smaller scale. Some of the most common methods to implement recommender systems, use concepts such as cosine/correlation similarity, matrix factorization, neural autoencoders and sequence predictors.

The major issue of recommender systems is in their validation. Even though validation occurs in a way that is similar to many machine learning methods, one should recommend a set of items first (in production) and measure the efficacy of such a recommendation. But, recommending is already altering the entire scenario, a bit in the flavour of the Heisenberg principle of uncertainty.

F: In the attention economy, the business model is to monetise the time the user spends on a platform, by showing them ads. Recommender systems are crucial for this purpose. Chiara, you are saying that these algorithms have effects that are problematic?

As you say, recommender systems exist because the business model of social media platforms is to monetise attention. The most effective way to keep users’ attention is to show them stuff they could show an interest in. In order to do that, one must segment the audience to find the best content for each user. But then, for each user, how do you keep them engaged, and make them consume more content?

F: You’re going to say the word “filter bubble” very soon.

Spot on. To keep the user on the platform, you start by showing them content that they are interested in, and that agrees with their opinion.

But that is not all. How many videos of the same stuff can you watch, how many articles can you read? You must also escalate the content that the user sees, increasing the wow factor. The content goes from mild to extreme (conspiracy theories, hate speech etc).

The recommended content pushes the user opinion towards more extreme stances. It is hard to see from inside the bubble, but a simple experiment will show it. If you continue to click the first recommended video on YouTube, and you follow the chain of first recommended videos, soon you will find yourself watching stuff you’d never have actively looked for, like conspiracy theories, or alt-right propaganda (or pranks that get progressively more cruel, videos by people committing suicide, and so on).

F: So you are saying that this is not an accident: is this the basis of the optimisation of the recommender system?

Yes, and it’s very effective. But obviously there are consequences.

F: And I’m guessing they are not good.

The collective result of single users being pushed toward more radical stances is a radicalisation of the whole conversation, the disappearance of nuances in the argument, the trivialisation of complex issues. For example, the Brexit debate in 2016 was about trade deals and custom unions, and now it is about remain vs no deal, with almost nothing in between.

F: Yes, the conversation is getting stupider. Is this just a giant accident? Just a sensible system that got out of control?

Yes and no. Recommender systems originate as a tool for boosting commercial revenue, by selling more products. But applied to social media, they have caused an aberration: the recommendation of information, which leads to the so-called filter bubbles, the rise of fake news and disinformation, and the manipulation of the masses.

There is an intense debate in the scientific community about the polarising effects of the internet and social media on the population. An example of such study is a paper by Johnson et al. It predicts that whether and how a population becomes polarised is dictated by the nature of the underlying competition, rather than the validity of the information that individuals receive or their online bubbles.

F: I would like to stress on this finding. This is really f*cked up. Polarisation is not caused by the particular subject nor the way a debate is conducted. But by how legitimate the information seems to the single person. Which means that if I find a way to convince the single individuals about something, I will be in fact manipulating the debate at a community scale or, in some cases, globally! Oh my god we seem to be so f*cked.

Take for instance the people who believe that the Earth is flat. Or the time it took people to recognise global warming as scientific, despite the fact that, the threshold for scientific confirmation was reached decades ago.

F: So, recommender systems let loose on social media platforms amplify controversy and conflict, and fringe opinions. I know I’m not going to like the answer, but I’m going to ask the question anyway. This is all just an innocent mistake, right?

Last year, the European Data Protection Supervisor has published a report on online manipulation at scale.

F: That does not sound good.

The online digital ecosystem has connected people across the world with over 50% of the population on the Internet, albeit very unevenly in terms of geography, wealth and gender. The initial optimism about the potential of internet tools and social media for civic engagement has given way to concern that people are being manipulated. This happens through the combination of constant harvesting of often intimate information about them, and the control over the information they see online according to the category they are put into (so called segmentation of the audience). Arguably since 2016, but probably before, mass manipulation at scale has occurred during democratic elections. By using algorithms to game recommender systems, among other things, to spread misinformation. Remember Cambridge Analytica?

F: I remember. I wish I didn’t. But why does it work? Are we so easy to manipulate?

An interesting point is this. When one receives information collectively, as for example from the television news, it is far less likely that she develops extreme views (like, the Earth is flat), because she would base the discourse on a common understanding of reality. And people call out each other’s bulls*it.

F: Fair enough.

But when one receives information singularly, like what happens via a recommender system through micro-targeting, then reality has a different manifestation for each audience member, with no common ground. It is far more likely to adopt extreme views, because there is no way to fact check, and because the news feel personal. In fact, they tailor such news are to the users to push their buttons. Francesco, if you show me George Clooney shirtless and holding a puppy, and George tells me that the Earth is flat, I might have doubts for a minute. Too personal?

F: That’s good to know about you. I’m more of a cat person. But, experts keep saying that we are moving towards personalisation of everything. While this makes sense for things like personalised medicine, it probably is not that beneficial with many other kinds of recommendations. Especially not the news. But social media feeds are extremely personalised. What can we do?

Solutions have focused on transparency measures, exposing the source of information while neglecting the accountability of players in the ecosystem who profit from harmful behaviour. But these are band aids on bullet wounds. The problem is the social media platforms. In October 2019 Zuckerberg was in front of congress again, because Facebook refuses to fact-check political advertisements, in 2019, after everything that’s happened. At the same time market concentration and the rise of platform dominance threatens media pluralism. This in turn, is leading to repeat and amplify a handful of news pieces and to silence independent journalism.

F: When I think of a recommender system, I think of Netflix.

You liked this kind of show in the past, so here are more shows of the same genre
People like you have liked this other type of show. Hence, here it is for your consideration

This seems relatively benign. Although, if you think some more, you realise that this mechanism will prevent you from actually discovering anything new. It just gives you more of what you are likely to like. But one would not think that this would have world-changing consequences. If you think of the news, this mechanism becomes lethal: in the mildest form – which is already bad – you will only hear opinions that already align with those of your own peer group. In the worst scenario, you will not hear some news at all, or you will hear a misleading or false version of the news, and you don’t even know that a different version exists.

In the Brexit referendum, misleading or false content (like the famous NHS money that supposedly was going to the EU instead) has been amplified in filter bubbles. Each bubble of people was essentially understanding a different version of the same issue. Brexit was a million different things, depending on your social media feeds. And of course, there are malicious players in the game, like the russian Internet Research Agency and Cambridge Analytica, who actively exploited this features in order to swing the vote.

F: Even the traditional media is starting to adopt recommender systems for the news content. This seems like a very bad idea, after all. Is there any other scenario in which recommender systems are not great?

Researchers use recommender systems in a variety of applications. For instance, in the job market. A recommender system limits exposure to certain information about jobs on the basis of the person’s gender or inferred health status, and therefore it perpetuates discriminatory attitudes and practices. In the US, researchers use recommender systems to calculate the bail fee for people who have been arrested, disproportionately penalising people of colour. This has to do with the training of the algorithm. In an already unequal system (where for instance there are few women in top managerial positions, and more African-Americans in jail than white Americans) a recommender system will by design amplify such inequality.

F: Recommender systems are part of the problem, and they make everything worse. But the origin of the problem lies somewhere else, I suspect.

Yep. The problem with recommender systems goes even deeper. I would rather connect it to the problem of privacy. A recommender system only works if it knows its audience. They are so powerful, because they know everything about us. We don’t have any privacy anymore. Online players know exactly who we are, our lives are transparent to both corporations and governments. For an excellent analysis of this, read Snowden’s book “Permanent Record”. I highly recommend it.

F: The pun was intended wasn’t it?

With all this information about us, we are put into “categories” for specific purposes: selling us products, influencing our vote. They target us with ads aimed at our specific category, and this generates more discussion and more content on our social media. Recommender systems amplify the targeting by design. They would be much less effective, and much less dangerous, in a world where our lives are private.

F: Social media platforms base their whole business model in “knowing us”. The business model itself is problematic.

As we said in the previous episode, the internet has become centralised, with a handful of platforms controlling most of the traffic. In some countries like Myanmar, internet access itself is provided and controlled by Facebook.

F: Chiara, where’s Myanmar?

In South-East Asia, between India and Thailand. In effect, the forum for public discourse and the available space for freedom of speech is now bounded by the profit motives of powerful private companies. Due to technical complexity or on the grounds of commercial secrecy, such companies decline to explain how decisions are made. Mostly, they make decisions via recommender algorithms, which amplify bias and segregation. And at the same time the few major platforms with their extraordinary reach offer an easy target for people seeking to use the system for malicious ends.

Conclusion

This is our call to all data scientists out there. Be aware of personalisation in building recommender systems. Personalising is not always beneficial. There are a few cases where it is, e.g. medicine, genetics, drug discovery. Many other cases where it is detrimental e.g. news, consumer products/services, opinions. Personalisation by algorithm, and in particular of the news, leads to a fragmentation of reality that undermines democracy. Collectively we need to push for reigning in targeted advertising, and the path to this leads to more strict rules on privacy. As long as we are completely transparent to commercial and governmental players, like we are today, we are vulnerable to lies, misdirection and manipulation. As Christopher Wylie (the Cambridge Analytica whistleblower) eloquently said, it’s like going on a date, where you know nothing about the other person, but they know absolutely everything about you. We are left without agency, and without real choice. In other words, we are f*cked

References

Black lives matter / Internet Research Agency (IRA) articles:

http://faculty.washington.edu/kstarbi/Stewart_Starbird_Drawing_the_Lines_of_Contention-final.pdf

https://medium.com/s/story/the-trolls-within-how-russian-information-operations-infiltrated-online-communities-691fb969b9e4

https://faculty.washington.edu/kstarbi/BLM-IRA-Camera-Ready.pdf

IRA tactics: https://int.nyt.com/data/documenthelper/533-read-report-internet-research-agency/7871ea6d5b7bedafbf19/optimized/full.pdf#page=1

https://int.nyt.com/data/documenthelper/534-oxford-russia-internet-research-agency/c6588b4a7b940c551c38/optimized/full.pdf#page=1

EDPS report https://edps.europa.eu/sites/edp/files/publication/18-03-19_online_manipulation_en.pdf

Johnson et al. “Population polarization dynamics and next-generation social media algorithms” https://arxiv.org/abs/1712.06009

Episode source