Monday, April 03, 2023

Transformers: Robots In Disguise?

quantamagazine |  Recent investigations like the one Dyer worked on have revealed that LLMs can produce hundreds of “emergent” abilities — tasks that big models can complete that smaller models can’t, many of which seem to have little to do with analyzing text. They range from multiplication to generating executable computer code to, apparently, decoding movies based on emojis. New analyses suggest that for some tasks and some models, there’s a threshold of complexity beyond which the functionality of the model skyrockets. (They also suggest a dark flip side: As they increase in complexity, some models reveal new biases and inaccuracies in their responses.)

“That language models can do these sort of things was never discussed in any literature that I’m aware of,” said Rishi Bommasani, a computer scientist at Stanford University. Last year, he helped compile a list of dozens of emergent behaviors, including several identified in Dyer’s project. That list continues to grow.

Now, researchers are racing not only to identify additional emergent abilities but also to figure out why and how they occur at all — in essence, to try to predict unpredictability. Understanding emergence could reveal answers to deep questions around AI and machine learning in general, like whether complex models are truly doing something new or just getting really good at statistics. It could also help researchers harness potential benefits and curtail emergent risks.

“We don’t know how to tell in which sort of application is the capability of harm going to arise, either smoothly or unpredictably,” said Deep Ganguli, a computer scientist at the AI startup Anthropic.

The Emergence of Emergence

Biologists, physicists, ecologists and other scientists use the term “emergent” to describe self-organizing, collective behaviors that appear when a large collection of things acts as one. Combinations of lifeless atoms give rise to living cells; water molecules create waves; murmurations of starlings swoop through the sky in changing but identifiable patterns; cells make muscles move and hearts beat. Critically, emergent abilities show up in systems that involve lots of individual parts. But researchers have only recently been able to document these abilities in LLMs as those models have grown to enormous sizes.

Language models have been around for decades. Until about five years ago, the most powerful were based on what’s called a recurrent neural network. These essentially take a string of text and predict what the next word will be. What makes a model “recurrent” is that it learns from its own output: Its predictions feed back into the network to improve future performance.

In 2017, researchers at Google Brain introduced a new kind of architecture called a transformer. While a recurrent network analyzes a sentence word by word, the transformer processes all the words at the same time. This means transformers can process big bodies of text in parallel.

Transformers enabled a rapid scaling up of the complexity of language models by increasing the number of parameters in the model, as well as other factors. The parameters can be thought of as connections between words, and models improve by adjusting these connections as they churn through text during training. The more parameters in a model, the more accurately it can make connections, and the closer it comes to passably mimicking human language. As expected, a 2020 analysis by OpenAI researchers found that models improve in accuracy and ability as they scale up.

But the debut of LLMs also brought something truly unexpected. Lots of somethings. With the advent of models like GPT-3, which has 175 billion parameters — or Google’s PaLM, which can be scaled up to 540 billion — users began describing more and more emergent behaviors. One DeepMind engineer even reported being able to convince ChatGPT that it was a Linux terminal and getting it to run some simple mathematical code to compute the first 10 prime numbers. Remarkably, it could finish the task faster than the same code running on a real Linux machine.

As with the movie emoji task, researchers had no reason to think that a language model built to predict text would convincingly imitate a computer terminal. Many of these emergent behaviors illustrate “zero-shot” or “few-shot” learning, which describes an LLM’s ability to solve problems it has never — or rarely — seen before. This has been a long-time goal in artificial intelligence research, Ganguli said. Showing that GPT-3 could solve problems without any explicit training data in a zero-shot setting, he said, “led me to drop what I was doing and get more involved.”

He wasn’t alone. A raft of researchers, detecting the first hints that LLMs could reach beyond the constraints of their training data, are striving for a better grasp of what emergence looks like and how it happens. The first step was to thoroughly document it.

0 comments:

bonjour bonne annΓ©e...,

2025 is a mathematical wonder.!! pic.twitter.com/WsUfhKF4C9 — π—Ÿ 𝗼 𝗹 𝗹 π˜‚ 𝗯 𝗲 𝗲 (@Lollubee) December 30, 2024