Gadgets

OpenAI Claims Its New Model Has Achieved Human-Level Tests of ‘General Intelligence.’ What does that mean?

A new artificial intelligence (AI) model recently achieved human-level results in tests designed to measure “general intelligence”.

On December 20, OpenAI’s O3 system scored 85% on the ARC-AGI benchmark, well above the previous best AI score of 55% and in line with the average human score. It also scored well on a very difficult math test.

Creating general artificial intelligence, or AGI, is the stated goal of all major AI research labs. At first glance, OpenAI seems to have at least made a significant step towards this goal.

Although skepticism remains, many AI researchers and developers feel that something has recently changed. For many, the prospect of AGI now seems more real, more urgent and closer than expected. Are they right?

Generalization and creativity

To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sampling efficiency” in adapting to something new – how many examples of a novel situation does the system need to see to see how it works.

An AI system such as ChatGPT (GPT-4) does not sample properly. It was “trained” on millions of examples of human text, creating probabilistic “rules” about which combinations of words are most likely.

The result is very good for general tasks. It is bad for unusual jobs, because it has less data (few samples) about those jobs.

Until AI systems can learn from small samples and adapt to more sample efficiency, they will only be used for repetitive tasks and those where occasional failures are tolerable.

The ability to accurately solve previously unknown or novel problems from limited samples of data is known as generalization ability. It is widely considered a necessary, or fundamental, element of intelligence.

Grids and patterns

ARC-AGI benchmark test for adaptive sampling using least square grid problems like the one below. The AI ​​needs to find a pattern that turns the grid on the left into the grid on the right.

Example work from the ARC-AGI benchmark test.
ARC Award

Each question provides three examples to study from. Then the AI ​​program needs to find the rules that “connect” from examples three to four.

These are very similar to the IQ tests you may sometimes remember from school.

Weak regulation and adaptation

We don’t know exactly how OpenAI did it, but the results suggest that the o3 model is very flexible. In just a few examples, he finds rules that can be generalized.

To find a pattern, we should not make any unnecessary assumptions, or be more specific than we really should be. In theory, if you can identify the “weak” rules that do what you want, you have increased your ability to adapt to new situations.

What do we mean by weak laws? The technical definition is complex, but weak rules are usually those that can be explained in simple statements.

In the example above, a plain English expression of the rule might be something like this: “Any condition with a bold line will move to the end of that line and ‘cover’ any other conditions that overlap it.”

Searching for thought chains?

While we don’t know how OpenAI achieved this result yet, it seems unlikely that they deliberately tweaked the o3 system to find weak rules. However, to be successful in ARC-AGI activities must be to find it.

We know that OpenAI started with a general-purpose version of the o3 model (which is different from most other models, because it can spend more time “thinking” about difficult questions) and trained it for the ARC-AGI test.

French AI researcher Francois Chollet, who designed the benchmark, believes that o3 searches through different “chains of thought” that describe the steps to solve a task. It will then choose the “best” according to some loosely defined rule, or “heuristic”.

This would not be “unlike” how Google’s AlphaGo system tested possible sequences of moves to beat the world Go champion.

You can think of these thought chains as patterns that go along with patterns. Of course, if it is like Go-playing AI, then it needs a heuristic rule, or a loose rule, to decide which system is the best.

There can be thousands of seemingly equally effective programs being produced. A heuristic can be “choose the weakest” or “choose the easiest”.

However, if it’s like AlphaGo they just have an AI that creates a heuristic. This was the AlphaGo process. Google trained a model to rate different sequences of movements as better or worse than others.

What we don’t know yet

The question then is, is this really close to AGI? If that’s how the o3 works, then the sub-model may not be much better than the previous models.

Concepts modeled on language learning may be less generalizable than before. Rather, we may be seeing a more general “chain of reasoning” that is achieved through additional steps to train a specialized heuristic in this experiment. The proof, as always, will be in the pudding.

Almost everything about o3 is unknown. OpenAI has limited exposure to a few media presentations and early testing to a few researchers, laboratories and AI security institutions.

To truly understand the power of o3 will require a lot of work, including testing, understanding the distribution of its capabilities, how often it fails and how often it succeeds.

When the o3 is finally released, we’ll have a much better idea of ​​whether it’s as adaptable as the average person.

If so, it could have a huge, transformative, economic impact, ushering in a new era of rapid self-improvement intelligence. We will need new benchmarks for AGI itself and serious consideration of how it should be governed.

If not, then this will still be an impressive result. However, everyday life will remain the same.The conversation

Michael Timothy Bennett, PhD Student, School of Computing, Australian National University and Elija Perrier, Research Associate, Stanford Center for Responsible Quantum Technology, Stanford University

This article is republished from The Conversation under a Creative Commons license. Read the first article.


Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button