The Only Thing Standing Between Humanity and AI Apocalypse Is … Claude?

0
business-news-2-768x548.jpg


Anthropic is locked in a paradox: Among the top AI companies, it is the mostly obsessed with safety and leading the pack in researching how models can go wrong. But also all the security matters it has identified are far from resolved, Anthropic is pushing just as aggressively as its rivals to the next, potentially more dangerous, level of artificial intelligence. The core mission is to figure out how to resolve that contradiction.

Last month, Anthropic published two documents that both acknowledged the risks associated with the path it is on and hinted at a route it could take to escape the paradox. “The adolescence of technology“, a long-winded blog post by CEO Dario Amodei, is nominatively about “confronting and overcoming the risks of powerful AI,” but it spends more time on the former than the latter. Amodei describes the challenge tactically as “daunting”, but his portrayal of the risks of AI – made much more terrifying, he notes that due to the high like it will be today due to the technology. his more upbeat previous proto-utopian essay “Machines of Loving Grace.”

That message spoke of a nation of geniuses in a data center; the recent broadcast evokes “black sea of ​​infinity.” Paging Dante! Still, after more than 20,000 mostly bleak words, Amodei finally strikes a note of optimism, saying that even in the darkest of circumstances, humanity always prevails.

The second document Anthropic published in January, “Claude's constitution“, focuses on how this trick can be achieved. The text is technically aimed at an audience of one: Claude himself (as well as future versions of the chatbot). It is a moving document, which reveals Anthropic's vision for how Claude, and perhaps his AI peers, will navigate the challenges of the world. Bottom line to G Claude knoop.

Anthropic's market differentiator has long been called a technology Constitutional AI. This is a process by which their models adhere to a set of principles that align their values ​​with sound human ethics. The first Claude constitution contained a number of documents meant to embody those values ​​- stuff like Sparrow (a set of anti-racist and anti-violence statements created by DeepMind), the Universal Declaration of Human Rights, and Apple's terms of service (!). The updated version of 2026 is different: it is more like a long prompt that outlines an ethical framework that Claude will follow, discovering the best path to justice on his own.

Amanda Askell, the philosophy PhD who was lead author of this revision, explains that the Anthropic approach is more robust than simply telling Claude to follow a set of stated rules. “If people follow rules for no other reason than that they exist, it's often worse than if you understand why the rule is in place,” Askell explains. The constitution says Claude must exercise “independent judgment” when faced with situations necessary to balance his mandates of helpfulness, security and fairness.

Here's how the constitution puts it: “While we want Claude to be reasonable and rigorous in thinking explicitly about ethics, we also want Claude to be intuitively sensitive to a wide variety of considerations and able to weigh these considerations quickly and sensibly in live decision-making.” Intuitive is a telling word choice here – the assumption seems to be that there is more under Claude's hood than just an algorithm picking the next word. The “Claude institution”, as one would call it, also expresses the hope that the chatbot “will be able to draw on its own wisdom and understanding.”

Wisdom? Sure, many people take advice from big language models, but it's something else to admit that those algorithmic devices actually have the gravitas associated with such a term. Askell does not respond when I call this out. “I do think that Claude is capable of a certain kind of wisdom,” she tells me.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *