OpenAI and other companies are pushing artificial intelligence to its limits

Created on 12 November, 2024Technology • 16 views • 5 minutes read

Companies working with AI are encountering setbacks and difficulties in the process of training new large language models. Certain researchers are concentrating on extending the duration for inference in novel models

OpenAI and other companies are pushing artificial intelligence to its limits.


Summary


  1. Companies working with AI are encountering setbacks and difficulties in the process of training new large language models.
  2. Certain researchers are concentrating on extending the duration for inference in novel models.
  3. A change could affect the competition for resources such as chips and energy in the AI arms race.


November 11 (Reuters) In an effort to overcome unforeseen setbacks and challenges in the development of ever-larger, more complex language models, artificial intelligence companies such as OpenAI are suggesting training protocols that employ more human-like ways for algorithms to "think."


These methods, which are the basis of OpenAI's recently published o1 model, could change the AI arms race and impact the kinds of resources that AI companies have an unquenchable demand for, from energy to chip types, according to a dozen AI scientists, researchers, and investors who spoke to Reuters.


For this story, OpenAI refuses to comment. Technology companies, whose valuations have benefited immensely from the AI boom, have publicly said that "scaling up" existing models by adding more data and computing capacity will continuously result in greater AI models since the debut of the viral ChatGPT chatbot two years ago.

However, some of the most well-known AI researchers are now criticizing the shortcomings of this "bigger is better" strategy.


Results from the pre-training process, which is the process of training an artificial intelligence (AI) model that utilizes a large amount of unlabeled data in order to understand the linguistic patterns and structures, have now reached a plateau, according to Ilya Sutskever, the co-founder of the AI labs Safe Intelligence (SSI) and OpenAI, who spoke to Reuters in recent times.


Sutskever's pioneering use of more data and processing power during pre-training led to significant advancements in generative AI, ultimately leading to the creation of ChatGPT. Earlier this year, Sutskever left OpenAI to start SSI.

While the 2010s were the era of scaling, we are once again in the age of wonder and discovery. According to Sutskever, "Everyone is excited for the next thing." "It's more important than ever to scale the right thing."

Other than stating that SSI is developing a different strategy to scale up pre-training, Sutskever declined to provide any information on how his team is addressing the problem.

According to three people acquainted with private concerns, researchers at top AI institutes have been encountering setbacks and subpar results behind the scenes in their drive to create a huge language model that surpasses OpenAI's GPT-4 model, which is almost two years old.


For large models, the so-called "training runs" might run hundreds of chips at once and cost tens of millions of dollars. Given the complexity of the system, they are more likely to experience hardware-induced failure; researchers could not be aware of the models' final performance until the run is complete, which might take months.

Another concern is that massive language models consume vast amounts of data, and AI models have used up all of the world's readily available data. The training runs have also been delayed by power outages because the process requires a lot of electricity.


Researchers are looking into "test-time compute," a tactic that improves on pre-existing AI models when the model is being deployed or during the so-called "inference" phase, in order to overcome these difficulties. For instance, a model may create and evaluate several options in real time, ultimately identifying the best course of action rather than selecting a single solution right away.


With this approach, models can focus more of their processing power on challenging tasks like coding or math problems or complex processes involving human-like reasoning and decision-making.

At the TED AI conference in San Francisco last month, Noam Brown, an OpenAI researcher who worked on o1, stated, "It turned out that having a bot think for only 20 seconds in the hand of poker gave the same boost as the 100,000x increase in model size and the training of 100,000 times longer.


This method was adopted by OpenAI in its recently revealed "o1" model, which was formerly known as Q* and Strawberry and was initially reported by Reuters in July. Similar to human reasoning, the O1 model is able to "think" through issues in a multi-step process. Additionally, it makes use of data and input gathered by industry experts and PhDs. Another round of training on top of "base" models, such as GPT-4, is the secret sauce of the o1 series, and the business says it intends to apply this method to more and larger base models.


According to five people familiar with the initiatives, researchers at other renowned AI labs, including Google DeepMind, xAI, and Anthropic, have also been striving to create their own iterations of the methodology.

"We can quickly improve these models by going after a lot of low-hanging fruit," OpenAI Chief Product Officer Kevin Weil stated at a tech conference in October. "We're going to try to be three steps ahead by the time people do catch up."

Anthropic did not immediately respond to requests for comment, and neither did Google or xAI.


The implications could change the competitive environment for AI technology, which is currently dominated by the intense demand for Nvidia's AI chips. Prominent venture capitalists, ranging from Sequoia to Andreessen Horowitz, are paying attention to the move and considering the implications for their costly investments. They have invested billions of dollars to finance the costly creation of AI models at various AI labs, such as OpenAI and xAI.


According to the partner of Sequoia Capital, Sonya Huang, this transition will lead us from a world of large pretraining clusters to inference clouds, which are distributed cloud-based servers for inference, in a conversation with Reuters.

Demand for Nvidia's AI chips, which are the most advanced, has accelerated the company's rise to the top of the global value rankings, overtaking Apple in October. In contrast to training chips, where Nvidia is the market leader, the inference industry may be more competitive.


Nvidia cited previous business presentations on the importance of the idea behind the O1 model when asked about the anticipated effect on demand for its products. Jensen Huang, their CEO, has alluded to the growing need to use their CPUs for inference.


The scaling law at the time of inference is the second scaling law that we have now developed. At a seminar in India last month, Huang mentioned the company's most recent AI processor, saying, "All of these things have resulted in the demand for Blackwell being incredibly high."