Why AI Breaks Bad | WIRED

However, the models are improving much faster than the efforts to understand them. And the anthropic team admits that as AI agents proliferate, the laboratory's theoretical crime grows ever closer to reality. If we don't crack the black box, it can crack us.

“Most of mine life is focused on trying to do things that I believe are important. When I was 18, I dropped out of university to support a friend who was accused of terrorism, because I believe it's most important to support people when others don't. When he was found innocent, I realized that deep learning would affect society, and dedicated myself to figuring out how humans could understand neural networks. I've been working on that for the last decade because I think it could be one of the keys to making AI safe.

So begins Chris Olah's “date me doc”, which he posted on Twitter in 2022. He is no longer single, but the doc remains on his Github page “since it was an important document for me,” he writes.

Olah's description leaves out a few things, including that despite not having a university degree, he is an Anthropic co-founder. A less important omission is that he received a Thiel Fellowship, which gives $100,000 to talented dropouts. “It gave me a lot of flexibility to focus on what I thought was important,” he told me in a 2024 interview. Encouraged by reading articles in WIRED, among others, he tried to build 3D printers. “At 19 one doesn't necessarily have the best taste,” he admitted. Then, in 2013, he attended a seminar series on deep learning and was galvanized. He left the sessions with a question no one else seemed to be asking: What happens in those systems?

Olah struggled to interest others in the question. When he joined Google Brain as an intern in 2014, he worked on a strange product called Deep dreaman early experiment in AI image generation. The neural net produced bizarre, psychedelic patterns, almost as if the software were on drugs. “We didn't understand the results,” says Olah. “But one thing they show is that there is a lot of structure within neural networks.” At least some elements, he concluded, could be understood.

Olah set out to find such elements. He founded a scientific journal called Distill to bring “more transparency” to machine learning. In 2018, he and a few Google colleagues published a paper in Distill called “The Building Blocks of Interpretability.” For example, they had identified that specific neurons encoded the concept of floppy ears. From there, Olah and his coauthors were able to figure out how the system knew the difference between, for example, a Labrador retriever and a tiger cat. They acknowledged in the paper that this was only the beginning of deciphering neural nets: “We need to make them human scale, instead of overwhelming dumps of information.”

The paper was Olah's swan song at Google. “There was actually a feeling at Google Brain that you weren't very serious when you talked about AI security,” he says. In 2018, OpenAI offered him the opportunity to form a permanent team on interpretability. He jumped. Three years later, he joined a group of his OpenAI colleagues to co-found Anthropic.

Source link

Why AI Breaks Bad | WIRED

Related

The Argument for Letting AI Burn It All Down

AI Is the Bubble to Burst Them All

OpenAI Says Hundreds of Thousands of ChatGPT Users May Show Signs of Manic or Psychotic Crisis Every Week

Leave a Reply Cancel reply

Related

More Stories

The Argument for Letting AI Burn It All Down

AI Is the Bubble to Burst Them All

OpenAI Says Hundreds of Thousands of ChatGPT Users May Show Signs of Manic or Psychotic Crisis Every Week

Leave a Reply Cancel reply