OpenAI’s ‘Strawberry’ Model Performs Complex Equations


On Sept. 12, OpenAI revealed a preview of its new model, OpenAI o1, designed to handle complex tasks such as writing code, solving math problems and performing deep reasoning. It is the first of the long-rumored next-generation AI family known as “Strawberry.”

ChatGPT Plus, Team users, and developers with OpenAI API usage Tier 5 can now access the preview version of the full model, o1-preview.

These users can also access o1-mini — a smaller, faster version of the o1 model that is particularly effective at coding. As a smaller model, the tech giant says it is “80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge.”

Open AI noted that ChatGPT Enterprise and Edu users will get access to both models beginning next week.

“We also are planning to bring o1-mini access to all ChatGPT Free users,” the company said in its release.

o1 takes more time to reason through more difficult problems

Instead of furthering GPT-4’s language capability, OpenAI o1 and o1-mini focus on science, creating and debugging code and math. A demonstration video shows the model building a playable game in the style of the Snake games in the 1970s. As OpenAI explained, o1 can be used by:

  • Health care researchers to annotate cell sequencing data.
  • Physicists to generate complicated mathematical formulas needed for quantum optics.
  • Developers in all fields to build and execute multi-step workflows.

OpenAI says o1 placed in the 89th percentile on the competitive programming test Codeforces and scored among the top 500 students in the U.S. in a qualifier for the USA Math Olympiad.

By nature, o1 will take longer to answer than ChatGPT or GPT-4.

o1 will display a loading message indicating that it is “thinking.” Image: OpenAI

o1-preview can output a maximum of 32k tokens, while o1-mini can output a maximum of 64k tokens.A token can be as short as one character or as long as one word, depending on the complexity of the text.  Both versions of the new model support text input only, not audio or images.

OpenAI created a best practices guide for developers to determine whether o1 is right for their work.

In the model’s system card, where OpenAI outlines red-teaming efforts and other security considerations, o1 received a “medium” safety rating in two categories. Independent research group Apollo Research noted o1 “has the basic capabilities needed to do simple in-context scheming,” meaning “gaming their oversight mechanisms as a means to achieve a goal.” On the other hand, the deeper reasoning gives the model a better understanding of safety policies.





Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Exit mobile version