Anthropic Has a Plan to Keep Its AI From Building a Nuclear Weapon. Will It Work?

0
business-news-2-768x548.jpg


At the end of August, the AI ​​company Anthropic announced that's his chatbot Claude wouldn't help anyone build a nuclear weapon. According to Anthropic, it had worked with the Department of Energy (DOE) and the National Nuclear Security Administration (NNSA) to ensure that Claude would not spill nuclear secrets.

The manufacture of nuclear weapons is both an exact science and a solved problem. Much information about America's most advanced nuclear weapons is Top Secret, but the original nuclear science is 80 years old. North Korea proved that a dedicated country with an interest in getting the bomb can do it, and it doesn't need the help of a chatbot.

How, exactly, did the US government work with an AI company to make sure a chatbot didn't leak sensitive nuclear secrets? And also: was there ever any danger of a chatbot helping someone build a nuke in the first place?

The answer to the first question is that it uses Amazon. The answer to the second question is complicated.

Amazon Web Services (AWS) offers Top Secret cloud services to government clients where they can store sensitive and classified information. The DOE already had several of these servers when it started working with Anthropic.

“We deployed a then-limited version of Claude in a Top Secret environment so that the NNSA could systematically test whether AI models could create or exacerbate nuclear risks,” Marina Favaro, who oversees National Security Policy & Partnerships at Anthropic WIRED, tells WIRED. “Since then, the NNSA has red-teamed successive Claude models in their secure cloud environment and provided us with feedback.”

The NNSA red-teaming process—meaning, testing for weaknesses—helped Anthropic and America's nuclear scientists develop a proactive solution for chatbot-supported nuclear programs. Together, they developed “a nuclear classification, which you can think of as a sophisticated filter for AI conversations,” says Favaro. “We built it with a list developed by the NNSA of nuclear risk indicators, specific topics and technical details that help us identify when a conversation can turn into harmful territory.

Favaro says it took months of tweaking and testing to get the classification working. “It captures concerning conversations without highlighting legitimate discussions about nuclear energy or medical isotopes,” she says.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *