INBOX INSIGHTS, May 8, 2024: Being Data-Driven, AI Hallucinations

Inbox Insights from Trust Insights

Catch the replay of yesterday's webinar, Generative AI for Professional Associations!

Do I Still Need to Be Data-Driven?

Why are we obsessed with being data-driven? Does it matter? Does it actually give us a competitive advantage? Where does AI fit in?

Being “data-driven” is just another way to say informed decision. Can you ignore the data? Yes, many people do when the data doesn’t tell them what they want to hear. But should you at least listen? Also, yes.

Let me caveat that by saying, if you want to use your data with AI, you have to be data-driven.

Why aren’t more companies data-driven?

The benefits of being truly data-driven are clear. You know exactly what is happening. Good or bad. With that comes a lot of hard work. Collecting data is the easy part. Easy in that you can turn the systems on. Collecting the right data means requirements gathering, data governance, and maintenance. Three phrases that no team wants to hear. They might as well be curse words.

Requirements Gathering

Believe it or not, gathering requirements doesn’t have to be a long, drawn-out process. I’ve been on projects where the requirements gathering took months. This is an opportunity to use the 5P Framework with generative AI.

The 5P Framework is Purpose, People, Process, Platform, and Performance.

Start by defining your purpose and performance. What problems are you trying to solve with a CRM? What business processes do you want to improve? What insights do you need from the data? What are the outcomes you want to see?

Next, determine which people you are involving. You’ll want to figure out who owns the data, who analyzes it, who uses it, and who maintains it. In some companies, this is all the same person. In other companies, these tasks are spread out over multiple departments.

Since you already know your platform, you can start to use generative AI for the process. You could start by asking generative AI to give you a template for data requirements. If you already have one, you can load that into the system.

Continue building your prompt by feeding the system the information you have from the 5P exercise. This will help the system put together the requirements that you need to consider.

I did this as a very simple test. I told Google Gemini that I did not have a requirements template. I also told Gemini that I was the only person in my company and what my goals for using a CRM would be. What I got back was essentially a questionnaire that I could fill out to serve as my requirements. The system asked questions about my sales cycle, what functionality I had to have, and gave me some practical considerations. The whole thing took about ten minutes.

Data Governance

This is where a lot of teams fall short with their data. I asked Google Gemini why data governance is so hard? This is what it told me:

Successful data governance necessitates:

  • Strong Executive Leadership: Getting top-down buy-in is necessary for the cultural shift and resources.
  • Start small, scale up: Pilot projects on focused use cases help establish processes and get early wins.
  • Invest in the right tools and people: Automated processes and skilled data stewards streamline effort.
  • Emphasize education: Ongoing training for all employees builds data-driven culture in the long term.

It’s a solid answer. In my experience, the first two points are really where things fall apart. Good data governance is, in fact, part of a company’s culture. It means following protocols and creating documentation. Basically, the things that most people really don’t want to do. Without support and accountability from the top, it generally doesn’t get done. The other issue, starting small, is true for a lot of initiatives. AI, for example, is also a culture shift. To find success with integration of data governance (or AI), companies need to pilot and test. It’s changing process, training people, and ensuring compliance.

In regards to being data-driven, a lack of data-governance goes hand-in-hand with a lack of trustworthy data. If you’re considering using generative AI to help with your analysis tasks, you might want to evaluate your governance plan.


Data collection and data analysis are not one-and-done tasks. They are processes that you need to have running consistently. Once you set up a new data collection system, like a CRM, you have to maintain it. As your business changes, so should your data collection. As your team changes, so should your data collection. As your customers change, so should your data collection. Even if you review your data systems once a year, you’re doing more than a lot of companies. I would personally recommend at least once a quarter.

So, do you really need to be data-driven? To make thoughtful decisions and use new tools like generative AI, yes.

If you want help becoming more data-driven, you know how to reach me.

Are you data-driven? Reply to this email to tell me or come join the conversation in our Free Slack Group, Analytics for Marketers.

– Katie Robbert, CEO

Binge Watch and Listen

In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris delve into the world of Agile methodology. You will gain a clear understanding of Agile principles and how they compare to traditional waterfall project management. Discover the benefits of Agile for marketing, including increased flexibility, customer-centricity, and the ability to adapt to changing needs. Finally, explore the challenges of implementing Agile, such as the need for discipline, planning, and stakeholder management.

Watch/listen to this episode of In-Ear Insights here »

Last time on So What? The Marketing Analytics and Insights Livestream, we walked through how to use AI for email marketing. Catch the episode replay here!

On this week’s So What? The Marketing Analytics and Insights Live show, we’ll be digging into how to use AI for social media marketing. Are you following our YouTube channel? If not, click/tap here to follow us!

In Case You Missed It

Here’s some of our content from recent days that you might have missed. If you read something and enjoy it, please share it with a friend or colleague!

Data Diaries: Interesting Data We Found

In this week’s Data Diaries, let’s discuss generative AI hallucination, especially in the context of large language models. What is it? Why do tools like ChatGPT hallucinate?

To answer this question, we need to open up the hood and see what’s actually happening inside a language model when we give it a prompt. All generative AI models have two kinds of memory – latent space and context windows. To simplify this, think of these as long-term memory and short-term memory.

When a model is built, vast amounts of data are transformed into statistics that become the long-term memory of a model. The more data they’re trained on, the more robust their long-term memories – at the expense of needing more computational power to run. A model like the one that powers ChatGPT requires buildings full of servers and hardware to run. A model like Meta’s LLaMa 3 model can run on your laptop.

The bigger a model is, the more knowledge it has in its long-term memory; that’s why huge models like Anthropic’s Claude 3 and Google Gemini 1.5 tend to hallucinate less. They have bigger long-term memories.

When you start prompting a model, you’re interacting with its short-term memory. You ask it something, or tell it to do something, and like a librarian, it goes into its long-term memory, finds the appropriate probabilities, and converts them back into words.

In fact, a librarian working in a library is a great way to think about how models work. Except instead of entire books, our librarian is effectively retrieving words, one at a time, and bringing them back to us. Imagine a library and a librarian who just write the book for you, and that’s basically how a generative AI model works.

But what happens if you ask it for knowledge it doesn’t have? Our conceptual “librarian” grabs the next nearest book off the shelf – even if that book isn’t what we asked for. In the case of generative AI, it grabs the next nearest word, even if the word is factually wrong.

Let’s look at an example. Using Meta’s LLaMa 8B model – the one that can run on your laptop, so it doesn’t have nearly as big a reference library as a big model like ChatGPT – I’ll ask it who the CEO of Trust Insights is:

LLaMa 3 answer of who the CEO of Trust Insights is

The model returned the wrong answer – a hallucination: “According to my knowledge, the CEO of Trust Insights is Chuck Palmer.”

When we look at what’s happening under the hood, we see the initial query is turned into numbers (point 1 in the image above). The “librarian” goes into the long-term memory (point 2), and then comes back with its results. Look at the name selection at point 3:

Generating (12 / 1024 tokens) [( Chuck 26.89%) ( Conor 11.19%) ( Susan 6.98%) ( Amy 6.31%)]
Generating (13 / 1024 tokens) [( Palmer 52.87%) ( G 42.14%) ( Pal 4.99%)]

What’s returned are the probabilities and their associated confidence numbers. The model KNOWS that it’s guessing. We don’t see this in the consumer, web-based interfaces of generative AI, but if you run the developer version like I am in this example, you can see that it’s just grabbing statistically relevant but factually wrong information. This is a hallucination – it grabbed the next nearest “book” off the shelf because it couldn’t find what we asked it for.

This is why it’s so important to use the Trust Insights PARE framework with generative AI. By asking a model what it knows, you can quickly diagnose – even in the consumer version – whether a model is going to hallucinate about your specific topic. If it is, then you know you’ll need to provide it with the data. Check out what happens if I give it our About page as part of the prompt:

Prompt with background data

It nails it. Katie’s name is an 80% chance that it’s the correct word to return. Why only 80%? Because the model is weighing whether to return a slightly different sentence structure. Instead of saying “According to the provided background information, Katie Robbert is the Co-Founder and CEO of Trust Insights”, it was likely also considering, “According to the provided background information, the CEO of Trust Insights is Katie Robbert.” You can also see the percentages in the image above – there are many more 100% confidence percentages compared to the previous example.

So what? Use the PARE framework in your prompting to understand how well any AI system knows what you’re asking of it, and if it doesn’t know, you will need to provide the information.

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

