OPEN O1 MODEL

Open o1 Model

This content was originally featured in the September 18th, 2024 newsletter found here: INBOX INSIGHTS, September 18, 2024: Overcoming AI Resistance, OpenAI o1 Model

In this week’s Data Diaries, let’s talk about the new OpenAI model, 01. It was released last Thursday to much fanfare, formerly known as Strawberry, and it’s a new model that is supposedly capable of greater reasoning. So, what does this mean? What in the world is this, and how do we make use of this?

First of all, we should be clear that a reasoning model is really more about knowing that the model is thinking things through in a greater capacity than today’s standard models.

There’s a prompting technique in generative AI called chain of thought. What chain of thought does is it asks a machine to think through and evaluate its work step by step—to say, “Show me how you solved this problem.” This is akin to what you used to do in high school, where you would do things like show your proofs in your math class to demonstrate you actually know how to solve the problem.

Today’s average generative AI model does not do this. You give it a prompt, it follows the instructions for the prompt, and it tries to generate the answer you want. A reasoning model requires the model — and you, to a lesser degree — to think through not just the goal you’re trying to accomplish, but how you’re going to accomplish that goal. Within the o1 model, there is allegedly a reward framework as well, in which the model self-evaluates its work as it is trying to show its work.

I say allegedly because OpenAI has been especially cagey about what’s actually going on inside the model.

OpenAI has suppressed the output of the chain of thought in exact words; what it only shows is a brief summary of what it thought through. We will leave discussion of this feature to another time, but for now, we know that the o1 model is essentially a chain of thought model where you cannot turn off the chain of thought process.

Now, who needs this? Who asked for this? Who wants this? This model, and models like it, are going to most strongly benefit people who don’t put a lot of thought into their prompts, people who write very naive prompts, like, “Hey, write me a blog post about B2B marketing,” where they provide insufficient detail and insufficient information. For those very naive prompts that generate extremely bland, boring outputs or have no sense of self-evaluation, a reasoning model like o1 will be a significant improvement in the quality of what they can get generative AI to do.

If you are a person who already writes highly detailed prompts with lots of chain of thought of your own and lots of data you provide, you will still see benefit in a reasoning model, but you will see less benefit because you’re already leveraging the power of generative AI models to think things through, to self-evaluate, and perhaps even to have a reward function built in, such as building a scoring rubric and having the model score its own work and then try to improve itself.

That’s essentially what a reinforcement learning model with chain of thought built in is doing. It is programmatically providing the same infrastructure as you writing prompts that say, “Evaluate your work, score it against a rubric, and then refine your work until it reaches the minimum score level.”

There are three key takeaways here. First, reasoning models, because they have to do so much self-evaluation, are approximately 100 times more computationally expensive than a regular model, which means that things like cost and consumption of electricity go up substantially. That’s one of the reasons why the o1 preview has such strong limits — 50 messages per week as of September 17.

Second, if your prompts are already robust and already incorporate things like chain of thought or tree of thought and evaluation metrics, o1 will be a small incremental improvement on the work you’re already doing. The added benefit is that if you are already learning and mastering techniques like self-evaluation and scoring, you can apply these to any AI tool like Claude or Gemini and not be constrained to the OpenAI system.

The third key takeaway is that reasoning models, by very definition, and especially because you can’t see what’s going on underneath the hood in OpenAI’s system, will tend to create less creative outputs. If you’re using it for something like content generation, you’re going to get more thorough outputs, but they will be less creative, and you will probably need to put it through a model that is a more creative writer to reduce the monotonous and robotic tone that a reasoning model is going to spit out.

This is inherent to reasoning models because reasoning models, by definition, are applying logic and thought and process to your generative outputs. Those are all high-probability tasks, and interesting writing is often characterized by low probability use of words, words that are interesting and rare and uncommon. Reasoning models will not provide that by definition, and to create that will require fairly advanced prompting, which will in turn cost significant resources.

As a result, you will need to give greater thought to your use of generative AI in terms of whether the task you are pursuing is a reasoning task or a creativity task and then choose the appropriate model for the task.

The days of one model for everything are rapidly coming to a close. Instead, we will pick up models like tools and use them for the appropriate task in the same way that we pick a word processor or a spreadsheet or presentation software for specific tasks. Can you, for example, use Microsoft Excel during a presentation? Yes. Should you use it as your presentation software? Probably not.

Give o1 a try with your current book of prompts and see how it performs. If you see a significant improvement on some prompts, that’s a sign that those prompts themselves need to be improved.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This