AI that learns without training data

AI has always been known to learn from the training data supplied by humans. However, researchers at Tsinghua University have recently developed an AI system that needs no training data whatsoever as it is capable of learning by itself. Yes, that’s correct… zero external data. The researchers have developed “Absolute Zero Reasoner” (AZR) that shows how Large Language Models (LLMs) can be trained with zero external data.

Of late, LLMs have shown advancements in reasoning capabilities using Reinforcement Learning with Verifiable Rewards (RLVR) that relies on outcome-based feedback instead of imitating intermediate reasoning steps. Machine learning models heavily depend on the collections of datasets and questions curated by humans. This has traditionally limited the capability of AI to learn independently, despite absence of its developers. However, researchers from Beijing Institute for General Artificial Intelligence, Tsinghua University, and Pennsylvania State University have recently proposed a RLVR system to enable a single system to autonomously create tasks and solve them to maximize its learning outcomes without any reliance on external data. Imagine a toddler who decides by himself that he needs to walk on its feet instead of crawling all day and eventually navigates his own way to learn walking. That’s what the Absolute Zero Reasoner does.

It self-evolves its training curriculum and reasoning capabilities using a code executor, which validates the proposed code reasoning tasks and verifies the answers, thereby offering verifiable rewards as part of a self-learning mechanism. The researchers involved claim that AZR has achieved state-of-the-art performance in the 7B overall average and coding average categories which surpasses previous best models by 1.8 percentage points. It also outperforms models trained with datasets curated by humans.

This can have some serious implications for the humankind, as AI systems were dependent of humans for training and therefore, humans wielded control over how AI systems would learn and perform. However, if AI systems begin to pick up lessons by themselves and choose what to learn over what not to, without human oversight, governance over the AI systems can grow weaker, unless some other mechanisms are devised by humans to keep AI as a collaborator, rather than a controller.

 

Tomorrow Avatar

Arijit Goswami

Leave a Reply

Your email address will not be published. Required fields are marked *