Reinforcement Learning: A Two Way Street?

Rahul Chodankar
6 min readMay 10, 2023

The past decade has seen the increasing advent of machine learning enabled artificial intelligence applications into more and more casual use cases. Once a domain of distant, high level problems, AI is now closer and closer into our everyday lives; may it be through the our search results, home assistants, shopping experiences, and social media feeds.

Quite simply, we are consuming algorithmically influenced content at alarming volumes. Our interactions are becoming more organic; the hesitancy in dealing with such platforms is fading.

Is this a problem? No one can deny that the quality of our experiences has dramatically improved. We are able to increase the efficiency and effectiveness of many daily, mundane processes. Finally, we have machines which are constantly learning to better serve our interests. Of course, there are some issues with data security and algorithmic inaccuracies, but those will resolve themselves over time. All in all, there is no problem right?

Reinforcement learning is a type of machine learning that is based on the idea of reward-based learning. In reinforcement learning, an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or punishments. The agent then adjusts its behavior based on this feedback, with the goal of maximizing its long-term reward.

While reinforcement learning is most commonly associated with machine learning, there is also evidence to suggest that humans may use similar processes to learn and make decisions.

Although human learning processes are more complex, the fundamentals behind it are the same. One example of reinforcement learning in humans is seen in the development of habits. Habits are automatic behaviors that are learned through repeated actions and reinforced by rewards or positive outcomes. Over time, these habits become automatic, and individuals may continue to engage in them even if they are no longer beneficial or necessary.

Another example of reinforcement learning in humans is seen in addiction. Addiction is characterized by compulsive behavior that is reinforced by the positive effects of a substance or behavior. Individuals with addiction may continue to engage in these behaviors even if they are harmful, because they have learned to associate the behavior with a reward.

Reinforcement learning may also play a role in decision-making more broadly. For example, individuals may be more likely to make decisions that lead to positive outcomes, based on past experiences and reinforcement.

While there is evidence to suggest that humans may use reinforcement learning to some extent, it is important to note that human decision-making is also influenced by many other factors, such as emotions, cognitive biases, and social norms. Additionally, humans are often able to consider long-term consequences and make decisions based on complex information, which is not always possible for reinforcement learning algorithms.

The above explanation for reinforcement learning was generated by me from Chat-GPT. Neat right? Of course, I have edited it and made it flow more in line with my desired style, but the tool has been valuable in shaping a collection of relevant explanations into a coherent passage.

Why was it required? Because, as noted, both humans and AI learn from the same mechanisms. This allows AI to learn and be affected by human interaction.

And theoretically, it allows humans to learn and be affected by AI interaction.

Is this a problem? No, and yes.

AI, with its vast connection to the world wide web, will be a tremendous resource for humans. Collectively — as well on the individual level — all previous bounds on human intellectual development will no longer be valid. Leonardo da Vinci was one of the most famous polymath of his time. There have been a select lucky few who have managed to follow his example since.

Not anymore. AI has the capacity to make polymaths out of us all. True knowledge is nearly at our fingertips.

Then what is the issue?

I was born in 1996, and was of the lucky few who had access to the internet by the time I was five years old. I have regularly browsed the internet since I was seven, and had my own computer since I was eleven.

Many who are reading this will remember the time of the internet pre-2010. To all of us who were kids, the one piece of advice we were repeatedly given was to distance ourselves from the internet; use only what you have to, and don’t trust any strangers.

It was good advice — and specifically relevant for the time. As newer generations entered the cyberspace, this advice has naturally faded. Today, it sounds old-school and out of date. Today, with our algorithmically generated Instagram and TikTok feeds, no one takes it seriously.

But they should.

Humans learn constantly from their environment, even more so when they are young. Multitudes of studies have shown that algorithmically promoted content has the potential to have a significant impact on the behavior, beliefs, and attitudes of teens and young adults. Social media platforms, in particular, use algorithms to promote content that is likely to be of interest to users, based on their past behavior, preferences, and social network.

One potential effect of algorithmically promoted content is the reinforcement of existing beliefs and attitudes. Algorithmically promoted content tends to show users more of what they have already engaged with, creating a feedback loop that can entrench existing beliefs and attitudes. This can lead to the development of echo chambers, where individuals are only exposed to content that confirms their pre existing beliefs and biases.

Here, I am not even getting into the potential pitfalls of harmful, inaccurate, misleading, malicious, or divisive content. That is a matter that is under discussion by many. No, the issue here is that of the nature of algorithmically generated content itself, and what it means for the coming generations of content consuming humans. Even having the potential to be an excellent learning tool, we are seeing the exact opposite happening. Something that should broaden horizons is narrowing them.

Why?

Because in a traditional human learner relationship, there is a child and there is an adult. Or more generally, a teacher and an student.

Reinforcement learning requires one party to know when to enforce the reward-penalty system, something that is lacking when many users employ such AI driven platforms. The AI assumes the human knows what it is doing, while the human assumes the AI is all knowing.

All that assuming makes…

you know how the saying goes

The conclusion here is a thought. Algorithmically generated content is here. The question is what output we want it to generate. Is entertainment or engagement a valid goal to pursue, given the overall social cost? Just as the world got together to put together a group of human rights to determine the basic requirements and rights that we deserve, so must we decide on what goals should humanity as a whole should pursue. Is technological and educational progress of importance? Is artistic and cultural development desirable? Or is the end goal to be the exploitation of the human psyche for purely capitalistic drivers, promoting ‘unproductive’ content consumption?

Most experts have in my opinion taken the correct stance on demanding the algorithms and models be made public. This is too influential a technology to be kept in control of parties who don’t even have theoretical purpose of serving the people. But it is obviously unrealistic to implement — and so it won’t be. All we can hope that the people working on this are cognizant of the impact their work has on the world. Perhaps in another post I will write more on the balancing of the algorithms, but it is a matter too vast to fully explore here.

This is not an issue which has a straightforward answer. Nor is it something that the leaders of the world — primarily older generations with little to no experience of being molded by the content consumption economy of today — should take on their own. Another generation down the line, the effects of this will start to become evident. We should have acknowledged and started working on the issue till then at the least.

All we can do till then is educate the public that the AI may be brilliant as the smartest child; but a child it is, and thus ignorant of consequences.

--

--