Of alignment problem and AI ethics

Anwesh Satpathy
7 min readJun 25, 2024

--

The advent of large language models (LLMs) in recent years like Chat GPT and Gemini has led to increased discussion among the public as well as the academia on the possibilities and dangers that AI holds in transforming normal everyday life as we know it. The immediate response by many universities was to revert to pen and paper to avoid the use of AI in assignments. Notwithstanding the impermanency of such a “solution”, this event highlights the fact that there is a general recognition among people of the transformative, perhaps even destructive, potentials of LLMs specifically and Artificial intelligence generally. All remaining doubts on the transformative potentials were dispelled when the notoriously apathetic entertainment industry i.e. Hollywood came to streets against perceived misuse of generative AI . Conversations regarding the potential existential risks of AI as well as the difficulty of aligning AI with human values have gained a new lease. In this article, I trace the trajectory of these ideas through the works of various thinkers, including the philosopher Nick Bostrom and the computer scientist Eliezer Yudkowsky, and argue that the existing concerns of existential risks of A.I is valid but severely limited by the framework of “longtermism”.

Before tracing the roots of the debates concerning existential risks, it is helpful to look at the specifics of the arguments concerning the AI alignment problem. To illustrate the need for AI ethics, Bostrom and Yudkowsky present a scenario of a bank using machine learning algorithm to decide mortgage application approval. It is found later that the approval rate for blacks have been steadily dropping. One would assume that A.I. would be more transparent and thus prejudicial in its approval. Yet, the fact remains that Blacks are denied mortgages due to poor credit history. While a human being may sometimes consider the differing socio-economic circumstances among races while approving applications, a machine would simply look at the statistics of credit history. An algorithm intended to replace human judgements must necessarily inherit certain social requirements. For instance, the legal system functions through predictability exemplified by precedence. It is based on precedence that judge reasons for judgements and lawyers justify their case. For A.I. to function in the legal environment, it must acquire some form of predictability. Similarly, Bostrom highlight transparency, responsibility, auditability and incorruptibility as criteria to be considered while creating algorithm to replace human judgements.

The eventual goal of A.I. companies like Open-AI is to create Artificial General Intelligence. While the A.I. that we have today has already achieved superiority in multiple fields compared to human beings, it is still domain specific. Deep Blue, for instance, can defeat our best players in chess but it cannot drive a car. AGI will not be predictable for human beings. Deep Blue was not programmed to perform a specific chess move. The possible chess moves are extremely difficult to quantify. Moreover, had the programmers provided the machine with possible good moves then the machine would not have been able to make stronger chess moves than its creators. The programmers sacrificed the predictability of local, proximate outcomes for the sake of the larger ultimate outcome of winning. Creating an A.I. which would act in an ethical manner conducive to human well-being would mean an A.I. that would think in the same manner as human beings about ethics under varying circumstances. It is important here to mention that human values are not a monolith. Cultures across the world not only disagree on ethics but often have oppositional ideas on the same topics. An argument can still be made that a moral A.I. is possible. A pacifist, for instance, sincerely believes in non-violence. Should the pacifist be given an option of acquiring the ability to murder people without hesitation through the consumption of a pill, the pacifist would still refuse owing to his present sincere belief in non-violence.

Omohundro argues that a sufficiently advanced AI will exhibit certain basic “drives” which can be designed to ensure a positive future for humanity . Such a system, Omohundro argues, will be rational as they are goal directed. AGI will not be confined to one goal i.e. playing chess. We may want an A.I. that will be able to play both chess and checkers. Thus, the system must find a pathway to properly master both the games equally. To navigate through conflicting goals, they will have to be assigned real-valued weights or “utility function”. The uncertain outcome can be evaluated by finding out its expected utility. For the system to maximize expected utilities, they will have to be rational economic agents. Rational agents, however, are not necessarily moral agents. These basic AI drives do not mean AGI will be moral. It merely provides us with expected space which we can use to attempt the induction of ethics. The fact that AGI, by its very definition, will have to operate in at least some unpredictable manner makes it more difficult to sustain morality. This is particularly concerning given that Omohundro’s basic drives includes “self-improvement”, which cannot be controlled by human beings. It is possible that under situations of emergency, AI will not hesitate to sacrifice the lives of few to save the many being a rational, utilitarian agent.

There is no good reason to suspect that ASI will have the same morality or even a “better” form of morality than human beings. Consider the scenario where we encounter an extremely bright individual with visible extraordinary intelligence. The said individual, in this scenario, spends the whole day counting blades of grass or recounting pi to a million digits. To most of us, this would strike as bizarre. This is because we have some form of shared idea over what counts as interesting or “boring” or valuable. We have evolved in a selective environment. An advanced non-human being in a different environment may indulge in tasks that we consider to be “boring”. As Bostrom writes: — “Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.” Much of the research concerning AI ethics is similarly fraught with anthropomorphising.

AI also has thought-speed advantage. This means that AI thinks millions of times faster than us. Should we desire to pull off the plug of a dangerous AI, it will still have ample of time to figure a way out as it would be extremely unlikely to outsmart such a system. As Eliezer Yudkowsky writes :- “the AI runs on a different timescale than you do; by the time your neurons finish thinking the words ‘I should do something’ you have already lost.”

It is extremely difficult to identify all possible constraints and values required to build ASI for it to have an amicable attitude towards human beings, instead of being hostile. Moreover, these problems will have to be solved before creating the first AGI or ASI. The fact that we are yet to solve these problems and are instead engaging in a competition to create AGI is troubling. All these circumstances give credence to the argument of AI being an existential risk. Yet, it is also important to inquire into the specific ethics or philosophies motivating those who are concerned about AI ethics.

Bostrom’s interest in existential risks precedes his identification of AI as an existential threat. According to Bostrom, “a non-existential disaster causing the breakdown of global civilization is, from the perspective of humanity as a whole, a potentially recoverable setback: a giant massacre for man, a small misstep for mankind.” These long-term potential existential disasters are much more pressing than the problems that are occurring today. Bostrom argues that even if there is a potential 1% chance of creating the technological, transhumanist utopia that he envisions, then the “he expected value of reducing existential risk by a mere one billionth of one billionth of one percentage point is worth 100 billion times as much as a billion human lives.”

The long termism advocated by the likes of Bostrom does not mind sacrificing the autonomy of individuals today for the sake of potential individuals a hundred years from now. We live in a “semi-anarchic” vulnerable world where states do not have the capacity to implement proper surveillance to prevent potential attacks leading to “devastation of civilization”. As a solution to this, Bostrom recommends pre-emptive policing and state’s ability to monitor its citizens closely through a “high tech panopticon”. This proposal does sound like an archetypal dystopia. For Bostrom, however, it is only a matter of time before we achieve advanced technological development where nothing sort of global surveillance can stop the end of humanity. Bostrom’s concerns about existential risks are explicitly framed under utilitarianism. As he writes: — “For standard utilitarians, priority number one, two, three, and four should consequently be to reduce existential risk, where the fifth should be to colonize space as soon as we possibly can.”

Let us assume, for the sake of the argument, that the top tech companies of the world heed the requests of the likes of Yudkowsky and Bostrom to have a moratorium on AI development owing to reasonable existential risks and the alignment problem. What sort of ethics would be prioritized in this alternative scenario? Among the thinkers who have seriously argued for AI as an existential risk, Bostrom is preeminent. His voice is heard and taken seriously by tech billionaires like Elon Musk and Sam Altman. It is fair to assume then that the sort of ethics that AI would be made to align with under this scenario would reflect the utilitarian long termism of Bostrom. Should we be surprised if the so-called “ethical” AI then helps build a global surveillance system that Bostrom envisions? The problem with discussion around AI ethics is that it is only discussed on the terms that appeals to the select few in Silicon Valley. Therefore, AI will only align, if it ever does, with the values of the capitalist long-termism of the Silicon Valley, which is not even representative of the values of most Americans, much less the world. While AI’s risk to the future of humanity cannot be ignored, we must expand our discourse to other parts of the world. This will enrich our conversations on AI ethics by expanding our imagination to ways of being that are not centred around billionaires. It is only through an honest conversation of this manner that we can even hope to create an AI that is inclusive and somewhat representative of the values of the world.

--

--