Philosopher Cameron Kirk-Giannini Examines How AI Could Pose Major Risks to Society

Cameron Kir-Giannini

Cameron Domenico Kirk-Giannini joined the Rutgers University–Newark faculty in 2019 as Assistant Professor in the Department of Philosophy. He received his Ph.D. in Philosophy at Rutgers University–New Brunswick, along with an MPhil in Philosophical Theology at Oxford and a B.A. in Organismic and Evolutionary Biology and Philosophy at Harvard. His interests include the philosophy of both language and religion, social philosophy, formal epistemology, the philosophy of mind and cognitive science, decision theory, and philosophical logic.

During graduate school, Kirk-Giannini became interested in emerging technologies and pivoted his research, in part, to tackling questions on Big Data, artificial intelligence and ethics. In 2023 he did a fellowship at the Center for AI Safety in San Francisco, where he focused on AI well-being, cooperation and conflict between future AI systems, and the safety and interpretability of agential system architectures.

Recently, Kirk-Giannini co-authored a paper titled, ''Artificial Intelligence: Arguments for Catastrophic Risk,'' a survey of two major arguments proffered by philosophers that claim to show how AI could pose large-scale risk to society. The paper was published by Philosophy Compass, an online-only journal by Wiley Online Library that publishes original, peer-reviewed surveys of current research from across the entire discipline for specialists and non-specialists alike.

We sat down with Kirk-Giannini to discuss his foray into AI research, his paper, and other issues related to AI safety.
 

When and how did you become interested in ethical questions of emerging technologies such as AI?
One of the questions that drew me to philosophy as an undergraduate was the mind–body problem—the problem of understanding the relationship between the mental and the biological. How do beliefs, desires, and conscious experiences arise in a network of cells like the human brain? My interest in this problem was closely related to some worries I had at the time about free will and determinism.

As I studied more philosophy, I realized that the connection between the mind–body problem and my concerns about free will was not as close as I had thought. But I was hooked on the study of the mind, and in graduate school at Rutgers–New Brunswick, one of my main focuses was the philosophy of cognitive science. A central idea in the philosophy of cognitive science is that the mind is a kind of information processing system: a computer. But if the human mind is a kind of computer, couldn't other kinds of computers be other kinds of minds?

As soon as we seriously entertain the possibility that mental life could arise in nonhuman computational systems, all sorts of ethical questions present themselves. Could machine minds experience pleasure and pain? Would they matter morally the same way human minds do? My earliest thoughts in this area were about political philosophy—if we create artificially intelligent systems but are not sure whether they have mental lives like ours, how should we theorize about the question of whether to extend them political rights and protections?

What’s been your focus at RU-N?
Since coming to Rutgers–Newark, my research interests have expanded to a range of other important ethical issues related to AI. For example, I contribute a humanities perspective to the university's data science initiative by bringing philosophy to bear on recent developments in machine learning. Each year, I teach a course of my own design on the ethical issues raised by data science. Designing and teaching the course has given me the opportunity to engage with work on algorithmic bias and digital privacy. And after a research fellowship at the Center for AI Safety last year, I have become increasingly interested in thinking about the risks AI systems pose to humans.

I understand that in addition to the Ethical Issues in Data Science class, you’re teaching a course called The Philosophy of AI.
That’s right. The Philosophy of AI is a new course which I designed this semester. After covering some similar themes in AI ethics, it focuses on questions about whether AI systems can think and feel, whether "mind uploading" is possible or desirable, whether life in a simulated world could have meaning, and whether advanced AI systems pose a threat to humanity.

My view is that we should approach technologies that have the potential to cause serious harm with a safety mindset.

How long have AI risk theorists been grappling with questions of catastrophic risk, and has there been an uptick in the last decade or two?
Well, there are some scattered warnings and reflections going all the way back to the early days of AI. For example, Alan Turing mused during a lecture in 1951 that "It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers.... At some stage, therefore, we should have to expect the machines to take control." And in 1960, Norbert Weiner wrote that "If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively…we had better be quite sure that the purpose put into the machine is the purpose which we really desire." But I think it's fair to say that rapid progress in machine learning over the past two decades has really put these issues in the spotlight, both in terms of the public sphere and in terms of academic work.

Would you offer a summary of the two major arguments that you review in the paper?
First, the Problem of Power-Seeking develops the case for thinking that advanced AI systems will behave in ways that endanger humans or humanity. There are two central ideas here. First, that if we don't specify the goals of advanced AI systems exactly right, they will have incentives to harm humans. This is because, for a very wide range of possible goals, the best way to promote those goals is to acquire as many resources as possible and respond aggressively to potential challenges. The idea that a wide range of goals will result in this kind of behavior is sometimes called instrumental convergence. Second, there is the idea that it's quite hard to know exactly which goals we want to give artificial systems and, once we have decided on a goal, to ensure that they understand what we have in mind.

You draw on a historical figure to illustrate your point?
Yes. Think about King Midas, who thought he wanted everything he touched to turn to gold until he tried to eat, at which point he realized that what he had thought was a blessing was actually a curse. And consider that we can't communicate our desires with AI systems directly; instead, we have to specify what we want as a mathematical function and hope the system we are training manages to pick up on the behavior we intend. So, given the difficulty of getting the goals exactly right, the Problem of Power-Seeking says we should worry about how advanced AI systems will behave.

And second?
Second is the Singularity Hypothesis, which is the idea that the development of human-level AI will unlock rapid further progress, eventually leading to AI systems far more capable than any human. The idea behind this is that as AI systems become more capable, they will increasingly be able to contribute to research on making even more capable AI systems. Eventually, this line of thinking goes, AI research will be completely automated. At that point, AI systems will improve themselves at a rate far exceeding what is possible for human researchers.

Can you give us a broad outline of the human and technological assumptions these arguments rest on?
Both of these arguments assume that developments in AI will be "business as usual" for the foreseeable future. That is, they assume that AI research will continue to be driven by investment from corporations and national governments and shaped by economic and political incentives. They also assume that there are no in-principle barriers to building extremely capable AI systems. Additionally, the Problem of Power-Seeking assumes that normal progress in machine learning will not automatically produce systems that behave according to commonsense human moral standards.

After your examination of the main arguments, you and your co-authors conclude that the debate is very much unsettled. Can you elaborate on that? How much of a danger do you believe AI systems pose, and does it meet the catastrophic standard?
I think it's valuable for researchers to keep the Problem of Power-Seeking in focus when designing new AI systems. At the same time, the kinds of difficulties in designing AI systems which the Problem of Power-Seeking emphasizes are not inevitable. There are a lot of talented academics and engineers thinking about how to specify safe goals for AI systems and ensure that those goals are learned during the training process. And there is also active research about corrigibility, or the engineering problem of ensuring that systems that aren't working as intended can be switched off by humans.

On the other hand, the kind of risk the Problem of Power-Seeking describes is only one kind of risk from AI systems. In addition to the possibility that we could lose control over an advanced AI system, we need to consider the possibility that advanced AI systems might be used intentionally to cause widespread harm. This could happen in the context of a military conflict, for example.

In general, my view is that we should approach technologies that have the potential to cause serious harm with a safety mindset we should operate under the assumption that they will cause serious harm if we don't take precautions. This is how we think about nuclear technologies, biological research on deadly pathogens, and so on. Even if the risk that disaster will occur is not objectively very high, extreme caution is warranted both because the consequences of disaster are so severe and because the attitude of caution itself is part of what keeps the risk low. At the moment, as a society we are very far from approaching AI with a safety mindset. My hope is that drawing attention to arguments for AI risk will help to move us in the right direction.

Any final thoughts you’d like to offer?
In emphasizing the possibility of catastrophic scenarios like loss of control over an advanced AI system, it is important not to lose sight of the full range of other risks from AI systems. For example, I think we should be just as worried about the corrosive effects of AI-driven misinformation on democratic political systems as we are about the Problem of Power-Seeking. And there are serious issues of bias in machine learning that are harming members of historically marginalized communities today. To successfully address the challenges posed by emerging AI technologies, we need to be working on all of these issues simultaneously.

Thank you for sitting down with us.
It’s been a pleasure.