Why OpenAI’s ‘Strawberry’ Reasoning Model Is a Big Deal

OpenAI’s latest A.I. model is codenamed “Strawberry.” Maryam Sicard/Unsplash

Ever watched the 2013 movie Her? It stars the actor Joaquin Phoenix as Theodore, a man who forms a deep romantic relationship with a self-aware artificial intelligence (A.I.) named Samantha, voiced by Scarlett Johansson. Fast forward to today, a ‘conscious A.I.’ like Samantha might not be as far away as it once seemed. A.I. chatbots like ChatGPT typically wait for the user’s prompt to start the conversation. But recently, a viral Reddit post shows ChatGPT initiating conversation, asking the user, “How was your first week in high school? Did you settle in well?” Likewise, another user reported that ChatGPT’s mobile app greeted her first with, “Hello, how’s it going?”—leaving the A.I. community surprised and curious.

ChatGPT’s newfound capabilities are reportedly linked to OpenAI testing its recently announced next-gen A.I. model, GPT-o1, codenamed “Strawberry.” Unlike the current GPT-4o, GPT-o1 is designed to push A.I. to the level of human-like reasoning. OpenAI claims it can tackle tricky topics related to math, science and even law more accurately through reinforcement learning.

“The sophisticated reasoning in o1, which is done by an internal chain-of-thought reinforcement learning mechanism, allows the model to finally surpass one of the key limitations of language models—the fact that they only generate ‘forward’ and don’t revise things they have said before,” Jeffrey Wang, co-founder and chief architect of the A.I.-powered CRM analytics platform Amplitude, told Observer. “Now they can ‘think’ internally before responding. This mirrors the way humans think before speaking and allows the A.I. to be more cohesive.”

According to a LinkedIn post from CEO Aravind Srinivas, Perplexity AI has already integrated OpenAI’s o1-mini model into its A.I.-powered search engine. If OpenAI’s approach works, it could transform how A.I. tackles complex tasks, with potential impact across industries like health care, education, corporate work and research.

For years, A.I. advancements have hinged on creating larger and larger models, but OpenAI is now betting on smarter—not bigger—systems. The company claims that, during internal testing, GPT-o1 solved 83 percent of International Mathematics Olympiad (IMO) problems. While GPT-4o has often struggled with tasks requiring multi-step reasoning, GPT-o1 can go as far as tackling mind-bending puzzles and high-level scientific problems.

“In order to compare models to humans, we recruited experts with Ph.D.s to answer GPQA-diamond questions. We found that o1 surpassed the performance of those human experts, becoming the first model to do so on this benchmark,” said OpenAI in a recent blog post. GPQA (Graduate-Level Google-Proof Q&A Benchmark) Diamond is a set of 198 questions designed to evaluate the capabilities of LLMs in scientific domains like biology, physics and chemistry.

As of today, ChatGPT Plus and Team users have access to the o1 model, with Enterprise and Edu users’ access being limited to 50 queries per week for o1-preview and 50 queries per day for o1-mini. OpenAI plans to bring access to o1-mini to all ChatGPT free users soon and said that future iterations of GPT-o1 will include browsing capabilities, file and image uploads, and other features for everyday use.

“As A.I. becomes more human-like in its reasoning, the public may become more comfortable interacting with it, potentially seeing A.I. as a helpful assistant rather than a mysterious black box,” Steve Toy, CEO of the language learning app Memrise, told Observer. “However, with this trust comes the risk of over-reliance on A.I. If users begin to see A.I. as infallible, they may start deferring too much decision-making to machines, even in areas that require human judgment.”

Safety is a growing concern as A.I. gets smarter

A.I. models are already raising concerns about their safety risks, and the development of conscious models like GPT-o1 could intensify this issue. OpenAI said it’s working closely with the U.S. and U.K. AI Safety Institutes, offering early access to the model as part of a broader effort to ensure ethical A.I. development. Regulators, too, are taking proactive measures.

Recently, California Governor Gavin Newsom signed three new bills to curb the use of A.I. to create deceptive images and videos for political ads as we approach the 2024 U.S. presidential election. Likewise, Securities and Exchange Commission (SEC) Chairman Gary Gensler warned in a video on X that the increasing reliance on generative A.I. could threaten the U.S. financial system. “What we’ve seen in our economy is how one or a small number of tech platforms can come to dominate a field. We’re bound to see the same develop with artificial intelligence,” Gensler said in the video, noting that the three largest cloud providers (Amazon, Google and Microsoft) are already affiliated with the leading generative A.I. companies. “Regulators, market participants, need to think about what it means to have the dependencies of financial institutions on A.I. models,” he said.

“The next major mission for companies will be to establish A.I.’s place in society—its roles and responsibilities. I don’t see OpenAI heading in that direction yet,” Matan Libis, vice president of product at SQream, an advanced data processing company, told Observer. “We need to be clear about its boundaries and where it truly helps. Unless we identify A.I.’s limits, we’re going to face growing concerns about its integration into everyday life.”