ai model benchmarking

AI Model Benchmarking

This data was originally featured in the March 12th, 2025 newsletter found here: INBOX INSIGHTS, March 12, 2025: Foundations in AI, AI Model Benchmarking

In this week’s Data Diaries, let’s talk about AI benchmarks. In the AI world, and especially in the generative AI world, AI models are often rated by how they perform on a variety of standardized tests with obscure names like MMLU, GPQA, and other abbreviations that could be secret government agencies, AI tests, or Danzig song titles.

These tests do serve a purpose, and that purpose is to provide apples to apples comparisons – in theory. However, many tests have issues, namely that their test data was included in the training data of models. It’s easy to pass a test if you’ve read the answer sheet in advance.

More important, for you and I, these tests don’t simulate the way WE need to use AI. Your job probably doesn’t require linear algebra on a daily basis, except for a few professions. Your job may not require fluency in 15 different languages. Your job definitely requires reasoning, but not in the abstract, in very tangible, practical use cases that are relevant to your work.

So how do you know whether a new model is worth your time or not? Develop your own benchmark tests. Here’s a simple example of a prompt I might use to benchmark test a series of different models.

You’re a Google Analytics 4 expert. You know GA4, Google Tag Manager (GTM), Google Bigquery, Google Looker Studio (Google Data Studio). You know marketing analytics, metrics, attribution, multitouch attribution (MTA), uplift modeling. Today we’ll be looking at a snapshot of Google Analytics data for the company TrustInsights.ai, a management consulting firm specializing in analytics, data science, and AI. Trust Insights serves midmarket companies, B2B and B2C. Here is a snapshot of key events conversion data, showing the attribution paths from GA4. The conversion chosen is Contact Us form fills, indicating that someone has asked Trust Insights for assistance. (Don’t forget to include the actual screenshot!) Explain the following:

  • What do you see in the data?
  • What data should Trust Insights pay attention to?
  • What data is less important to Trust Insights?
  • What next steps should Trust Insights take to improve its marketing results?

Here’s the critical part: whatever you choose for a benchmark test, you should know the answer. You should know what the correct answer is, and then measure how well a given model’s response matches that answer.

Here’s a snapshot of this in action, using our screen shot, prompt, and the LM Arena Chatbot Arena battleground:

LM Arena Results

We can see easily above that the model on the left, GPT-4.5, did a far better job than the model on the right, Discovery.

Your benchmark test suite should encompass the different tasks you value most, from competitive analysis to content creation. Critically, once you settle on your test suite, don’t change the data! Use the exact same prompts, screen shots, and supplementary data so you get apples to apples comparisons across models.

When you implement benchmark testing this way, you’ll figure out which AI products, tools, services, and models best fit YOUR specific needs.


Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.


Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Trust Insights
ai model benchmarking
LM Arena Results
Instagram Logo
linkedin Logo
Instagram Logo
linkedin Logo
TikTok Logo
Twitter Logo
Youtube Logo
Email Icon
🗞️
🗞️
🗞️
Trust Insights
Instagram Logo
linkedin Logo
TikTok Logo
Twitter Logo
Youtube Logo
Email Icon
Share This