Hey readers,
There are teams of researchers in academia and at major AI labs these days working on the problem of AI ethics, or the moral concerns raised by AI systems. These efforts tend to be especially focused on data privacy concerns and on what is known as AI bias — AI systems that, using training data with bias often built in, produce racist or sexist results, such as refusing women credit card limits they'd grant a man with identical qualifications.
There are also teams of researchers in academia and at some (though fewer) AI labs that are working on the problem of AI alignment. This is the risk that, as our AI systems become more powerful, our oversight methods and training approaches will be more and more meaningless for the task of getting them to do what we actually want. Ultimately, we'll have handed humanity's future over to systems with goals and priorities we don't understand and can no longer influence.
Today, that often means that AI ethicists and those in AI alignment are working on similar problems. Improving the understanding of the internal workings of today's AI systems is one approach to solving AI alignment, and is crucial for understanding when and where models are being misleading or discriminatory.
And in some ways, AI alignment is just the problem of AI bias writ (terrifyingly) large: We are assigning more societal decision-making power to systems that we don't fully understand and can't always audit, and that lawmakers don't know nearly well enough to effectively regulate.
As impressive as modern artificial intelligence can seem, right now those AI systems are, in a sense, "stupid." They tend to have very narrow scope and limited computing power. To the extent they can cause harm, they mostly do so either by replicating the harms in the data sets used to train them or through deliberate misuse by bad actors.
But AI won't stay stupid forever, because lots of people are working very diligently to make it as smart as possible.
Part of what makes current AI systems limited in the dangers they pose is that they don't have a good model of the world. Yet teams are working to train models that do have a good understanding of the world. The other reason current systems are limited is that they aren't integrated with the levers of power in our world — but other teams are trying very hard to build AI-powered drones, bombs, factories, and precision manufacturing tools.
That dynamic — where we're pushing ahead to make AI systems smarter and smarter, without really understanding their goals or having a good way to audit or monitor them — sets us up for disaster.
And not in the distant future, but as soon as a few decades from now. That's why it's crucial to have AI ethics research focused on managing the implications of modern AI, and AI alignment research focused on preparing for powerful future systems.
Not just two sides of the same coin
So do these two groups of experts charged with making AI safe actually get along?
Hahaha, no.
These are two camps, and they're two camps that sometimes stridently dislike each other.
From the perspective of people working on AI ethics, experts focusing on alignment are ignoring real problems we already experience today in favor of obsessing over future problems that might never come to be. Often, the alignment camp doesn't even know what problems the ethics people are working on.
"Some people who work on longterm/AGI-style policy tend to ignore, minimize, or just not consider the immediate problems of AI deployment/harms," Jack Clark, co-founder of the AI safety research lab Anthropic and former policy director at OpenAI, wrote this weekend.
From the perspective of many AI alignment people, however, lots of "ethics" work at top AI labs is basically just glorified public relations, chiefly designed so tech companies can say they're concerned about ethics and avoid embarrassing PR snafus — but doing nothing to change the big-picture trajectory of AI development. In surveys of AI ethics experts, most say they don't expect development practices at top companies to change to prioritize moral and societal concerns.
(To be clear, many AI alignment people also direct this complaint at others in the alignment camp. Lots of people are working on making AI systems more powerful and more dangerous, with various justifications for how this helps learn how to make them safer. From a more pessimistic perspective, nearly all AI ethics, AI safety, and AI alignment work is really just work on building more powerful AIs — but with better PR.)
Many AI ethics researchers, for their part, say they'd love to do more but are stymied by corporate cultures that don't take them very seriously and don't treat their work as a key technical priority, as former Google AI ethics researcher Meredith Whittaker noted in a tweet: