So What? Use cases for Retrieval Augmented Generation

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this episode of So What? The Trust Insights weekly livestream, you’ll learn about retrieval augmented generation (RAG) and how it helps you solve marketing and data privacy use cases. You’ll discover practical applications of RAG systems and learn how to set up a simple RAG system. You will learn when to use RAG and when to use other tools. Finally, you’ll understand the limitations of RAG and how to avoid common pitfalls.

Watch the video here:

So What? Use cases for Retrieval Augmented Generation

Watch this video on YouTube

Can’t see anything? Watch it on YouTube here.

In this episode you’ll learn:

What Retrieval Augmented Generation (R.A.G) is
What use cases make sense for it
When not to use R.A.G.

Transcript:

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Katie Robbert – 00:30
Well, hey there everyone! Welcome to “So What?” The Marketing Analytics and Insights live show, which is looking a little orange this week. I was like, oh, is that me? Nope, it’s definitely orange for some reason. But anyway, we move on. How’s it going, guys?

Christopher Penn – 00:48
Going orange? Yeah.

Katie Robbert – 00:54
It’s one of those things. Anyway, this week we’re talking about use cases for retrieval augmented generation. If you missed it, Chris and I talked about what retrieval augmented generation is and why you should be thinking about it as a marketer, or in operations, or any other professional who interacts with artificial intelligence. In a nutshell, retrieval augmented generation is used in situations where you have a contained amount of information—sometimes proprietary information—that you only want to pull answers from.

Katie Robbert – 01:44
For example, and I know we’re going to go over different use cases, but the way you talked about it that really stuck with me is this: if you have customer data, proprietary information that you don’t want going out to large language models for training data, you would use a retrieval augmented generation system—like NotebookLM, for example. You put the information in there, and you can only search that. In some ways, it’s like old-school wikis and knowledge bases that we used to build in SharePoint—or that people still build. I don’t build them anymore. So, that’s my high-level understanding of retrieval augmented generation, or a RAG system. Am I close?

Christopher Penn – 02:29
You are close. Let’s review the system diagram to understand what a RAG system is and where it belongs. In normal generative AI, when you fire up ChatGPT or a similar system, you put in a prompt, and then it processes it, thinks, and gives you an answer. When you use retrieval augmented generation, you put in a prompt, and the system tries to figure out if it needs additional information. It goes into a retrieval system—which is located here—that has a very traditional database-style feel to it.

Christopher Penn – 03:11
It gets the data, retrieves the data, converts it into content that the LLM can work with, and then merges the prompt and external data. That goes to the AI system, and it comes out with a response. That’s essentially how RAG works today. There are many variants—agentic RAG, and so on—that we may or may not get to. However, there are two considerations for when you want to use a RAG system.

Christopher Penn – 04:01
Number one is privacy. There’s some data you just can’t let out of your control. If you’d like a copy of this chart, it’s available on the Trust Insights website—go to TrustInsights.ai/data-privacy. You can download the chart there; it’s on our Instant Insights. No forms to fill out. When it comes to privacy, there are differing levels. With generative AI, the only system guaranteed private is one you run locally on your hardware—your personal laptop, servers inside your company, whatever. That’s the only system where you’re guaranteed that no one else will ever look at your information.

Christopher Penn – 04:54
There are systems that are mostly private, meaning they don’t train on your data. But a human being can look at your stuff if you cause a terms-of-service violation. If you ask for step-by-step instructions to do something bad, the model will say you shouldn’t do that. And if you do that several times, it’ll trip a flag saying it looks like you’re trying to do something bad. A human being will then check. Because you’ve tripped a circuit breaker, a human being can look at your data, and it’s no longer guaranteed confidential.

Katie Robbert – 05:20
We could probably do a whole AMA episode on what constitutes confidential data. There are guidelines and laws around confidential data privacy, but there’s always a gray area, especially regarding terms of use and whether someone has read them. We could say you’re giving us confidential information, but we’re going to use it. I’m not saying that’s what our documents say, but if someone just clicks “I agree,” that usually happens with consumers. Then they’re like, “Wait a second, what do you mean you’re selling my data? Well, you said it was okay.” It’s like those people who post on Facebook, “I do not give this platform…”

Katie Robbert – 06:16
“…the authority to use my pictures.” You’re already on there; they’re already using it; you already said yes. The second you gave them login information, you already said yes. It’s too late to go back. That’s a rant, but it’s important. Chris, thank you for covering data privacy and confidential data, because that’s why retrieval augmented generation exists.

Christopher Penn – 06:43
Exactly. If the data you’re working with has restrictions requiring it to be mostly private, that’s a consideration. If you’re working with blog post content, not medical records or national security secrets, you might not need RAG. That’s one thing. The second consideration is size. Many systems…

Katie Robbert – 07:21
We’ll pause for a second. Can you bring that chart back up? We have a question about it.

Christopher Penn – 07:27
Sure.

Katie Robbert – 07:28
Brian is asking where NotebookLM falls on this data privacy matrix. Is it the same as Gemini?

Christopher Penn – 07:36
It depends on whether it’s the free or paid version. The free version has the same general restrictions as Gemini Free; the paid version has the same restrictions as Gemini Paid. Essentially, if it’s free, your data is the payment.

Katie Robbert – 07:54
That’s a fun tagline! Go ahead.

Christopher Penn – 08:00
Check the Terms of Service. That’s the best thing you can do. If you’re unclear, put it through a generative AI system and ask, “How private is my data on a scale of 1 to 10?”

Katie Robbert – 08:21
That’s cyclical, because first you need a system to read the Terms of Service, and then if it’s not right, you go to a different system. That seems like a lot of work.

Christopher Penn – 08:30
We’ll put a link in our Analytics for Marketers Slack group to our custom GPT that we built to evaluate terms of service. Just copy and paste the URL or the Terms of Service, and it’ll give you a score on how private your data is. For any given system’s terms of service, go ahead and check that; we’ll put that in at the end of the show. So privacy is one aspect. The second is size. How much data are we talking about? Every language model has a context window—short-term memory of how much information it can remember in any given chat. ChatGPT holds 128,000 tokens, about 90,000 words.

Christopher Penn – 09:15
So it can hold one of these. If you’re working with data bigger than that—like your entire sales CRM—ChatGPT can’t remember it all, meaning you’d want a RAG system. On the other hand, Google’s Gemini can hold 2.5 of these. If your data is smaller than that, you might not need RAG. It can be expensive, but it’s highly functional. Meta’s new Llama 2 has a 10-million-token context window.

Christopher Penn – 10:01
It can hold seven of these in its memory and retrieve from them. If you have the hardware and bandwidth, you may not need RAG. If your data is bigger, you would need a RAG system. Our two dimensions are privacy and data size. If you need absolute privacy, you want a local RAG system. If you have really big data and only need certain parts, like going into my medical records and pulling out every mention of a condition, I’d use a RAG system to pull only those pieces from my database. Those are general situations for when you’d use RAG.

Katie Robbert – 10:50
Would there be a scenario where, say, size is the consideration? I have my whole EMR situation. Instead of bringing it into a third-party system, would you lay an interface on top of it or do an API call instead of moving the data? Would that bypass the size issue?

Christopher Penn – 11:18
It could, depending on how good your query system is on the underlying system. Systems like Epic look like they’re straight out of Windows 95, and the backend data is about the same. Unlike Mick Jagger, that data doesn’t age well, and as a result, you could build…

Katie Robbert – 11:44
I don’t know that he’s aging well.

Christopher Penn – 11:47
Well, yeah, he’s better than Keith Richards.

Katie Robbert – 11:52
These are rough examples. Moving on.

Christopher Penn – 11:56
There’s an entire set of systems evolving rapidly—one is called Model Context Protocol—which is for another episode. You can interface with traditional databases and query records with SQL and pull back limited pieces. When we think about a RAG system, the purple external knowledge source is typically a vector database, but it doesn’t have to be. It could be a traditional SQL database. It’s a question of whether your LLM system can talk to it. Most LLM systems today aren’t designed for that; you need a retrieval system. Maybe it’s a Python script or a whole ecosystem.

Katie Robbert – 12:48
That’s why you need to consider size, because LLMs don’t talk to your existing databases; you have to bring the data over.

Christopher Penn – 12:59
Yes. Today’s RAG systems typically have a special kind of database—a vector database. When you put in your prompt, behind the scenes, the machines turn that into numbers, and those numbers get processed and matched with the internal database the AI is trained on and your external system. It’s all blended and sent to the AI model. A vector database has your data, but it’s pre-encoded for AI, so you don’t need a translation layer, which can be slow depending on the size of your data. It’s all pre-sorted and processed.

Christopher Penn – 13:52
It’s AI-first, in the same format as the language model itself—just statistics; no words or identifiers. That’s what makes vector-based RAG systems so efficient and fast. They’ve pre-processed your data into a format AI can understand.

Katie Robbert – 14:19
This has gotten very technical. John, are you keeping up? I’m hanging on by a thread, but I think I’m there.

John Wall – 14:44
I’m getting it. I don’t understand the state of the industry. I understand how not to do it, using NotebookLM so you don’t grab other data. But I have no idea which tools are doing that. A lot of this is under agentic AI, where you build something to retrieve things. Are there any available systems that do this without having to string it together yourself?

Christopher Penn – 15:31
There are; we’ll look at a couple of examples.

Katie Robbert – 15:36
We have another question from Brian: Can you build a RAG tool using a foundational model? Do you have to use API calls? Or are these completely different models used to build RAG?

Christopher Penn – 15:48
When we get to the local example, we’ll cover that. Yes, you can use everything locally, but there are some gotchas.

Katie Robbert – 16:00
Yes, there always are.

Christopher Penn – 16:01
The most familiar RAG tool is NotebookLM. If you don’t know it, we’ve covered it before on the livestream (TrustInsights.ai/YouTube). It’s super simple. You go into a new notebook, add your sources, drop your documents in, and NotebookLM (a Google product) will digest them and let you ask questions. The free version holds 50 documents; the paid version holds 300. The free version is constrained.

Christopher Penn – 16:47
I think it’s 300 megabytes, but either way, it’s smaller than the large version, which can hold a lot. If your data is under 500 megabytes, you can throw it all in and ask questions. There are fun things you can do with NotebookLM, like mind maps. You can explore it; it’s a convenient interface.

Christopher Penn – 17:28
The downside is that it’s self-contained; it’s difficult to get data in and out. It’s great as a research or processing tool, but terrible in a workflow because you can’t get stuff out; there’s no API or automated data export. You have to manually spit things out.

Katie Robbert – 17:57
I used NotebookLM this week to outline a four-part series on creating an AI strategy for our newsletter. I wanted to string together pieces, so I used NotebookLM to see what we’d written before, how we’d done it, what frameworks we’d referenced. It was an easy tool to use. I love the mind map; it’s fun. I also like the FAQs.

Katie Robbert – 18:37
One challenge our clients tell us is that people don’t know where to start or what questions to ask. The FAQs self-populate and act as a seed starter for questions.

Christopher Penn – 19:01
It’s a great tool. However, data input and output is a big deal. If you want to integrate your private data or too-big data into other systems, that’s where things blow up. You can download and install your own vector database—PostgreSQL (with PGVector), Pinecone, Weaviate, Milvus, MongoDB, Supabase, Zilliz. Many are open source (qdrant, Milvus, Weaviate), and you can run them locally. That’s handy when you want to build on top of your data.

Christopher Penn – 19:59
Here’s a simple example using the AnythingLLM app. You can choose your LLM (I’m using Gemini). There’s a vector database; you can use the built-in one (no configuration needed) or choose from many vendors. If you’re using this for yourself, use the built-in one.

Christopher Penn – 20:43
If you’re using this with a team, you want a central vector database that everyone accesses. It’s silly to make multiple copies of data on everyone’s laptop.

Katie Robbert – 21:00
That makes sense. A challenge people are having with generative AI is siloing things, unless you have a shared account.

Christopher Penn – 21:11
Exactly. If you set up a Weaviate server, everyone can use it. Many have cloud-hosted options. But if you’re dealing with national security secrets, you want the database locally, not in the cloud. If you’re dealing with PHI, have it on a service you trust. That’s where things go wrong.

Christopher Penn – 21:57
To convert documents into a language an LLM understands, you need an embedder. The catch is that your embedder has to match your language model, because everyone does things differently. If you’re using OpenAI and ChatGPT, use an OpenAI embedder. If you’re using Google’s Gemini, use Gemini embeddings. If you’re using Mistral, use the Tekken 7 embedder.

Christopher Penn – 22:52
People screw this up by picking a simple embedding system that’s a mismatch for the model. It’s like putting diesel in a Prius. Many RAG implementations go off the rails because of this. When I see someone using an OpenAI embedder and sending data to Gemini or Claude, I know it won’t go well.

Katie Robbert – 23:38
You have to have a handle on what you’re doing when setting up tech. You can set up anything, but it doesn’t mean it’ll be right.

John Wall – 23:57
Will it just melt down? Will the data set become useless?

Christopher Penn – 24:04
It causes the LLM to return poor-quality results. OpenAI’s embedding might be at the word level. Say you use “marketing over coffee,” but the wrong embedder breaks the words into different segments that don’t match the target LLM. The LLM and data store must use the same pieces of information. If not, you get poor-quality results that don’t make sense.

John Wall – 24:49
You’re getting nonsense.

Katie Robbert – 24:51
We’re walking through the setup, but I also want to spend time on the actual use cases. Is that next?

Christopher Penn – 25:02
Yes. Let’s drop some customer service emails into our embedding system. The file is embedded; it’s converted into the document format the model needs. Let’s summarize the negative emails. This data is synthetic from our new Generative AI Use Cases for Marketers course (TrustInsights.ai/Use-Cases-Course). This file has thousands of lines of data.

Christopher Penn – 26:09
It went through and pulled out a few things that matched the prompt. That was a terrible prompt, but behind the scenes, the tool took the prompt, broke it up, and found relevant things in the vector database.

Christopher Penn – 26:52
I can now query my data piece by piece without loading the whole document, because that would exceed the memory of the GEMINI model.

Katie Robbert – 27:12
Let’s say my boss wants all negative customer service emails from the past year. We get 2,000 emails a day. Because of system limitations, I’d have to repeat the process.

Christopher Penn – 27:41
Is that correct in a chat interface? Yes, completely.

Katie Robbert – 27:47
Is that a good or bad use case for a retrieval augmented generation system?

Christopher Penn – 27:53
It’s a good use case, but the wrong tool.

Katie Robbert – 27:59
That’s important.

Christopher Penn – 28:02
The data looks like this because it’s synthetic. There are no dates. The data has to have the required fields (data governance). If it’s not in the training data, we can’t fulfill the request. You’d have to tell the boss that you haven’t been collecting date information.

Katie Robbert – 28:42
That’s where the 5P framework could help: Purpose, People, Process, Platform, and Privacy. Before you jump into using a RAG system, go through each of these. For example, you probably want to talk to your customer support team. They’re the ones collecting the information. What is the process? What data do you need? If a date stamp is important, figure that out first. Then you can pick a platform. Don’t just set up a cool vector database and start querying it. That’s the wrong order.

Christopher Penn – 30:04
What’s the right implementation and use case? When you’re retrieving a lot of data sequentially, you’ll likely want a system like N8N (which we’ll cover later). You’d have your LLM and attach a memory tool or a vector data store.

Christopher Penn – 30:52
Based on the model you’re using, you’ll choose your embedding. Because I’m using Gemini, I’ll use Gemini embeddings. I’ve got my Gemini embeddings and my vector data store. I’ll add my original data. The order should be opposite: Send your chat message to the AI agent. The agent would feed the data store, retrieve from it, and repeat the process.

Christopher Penn – 31:35
That setup sequentially filters data to get to the curated data set. To answer the question—”Show me emails sent to this client last year, broken out by month”—you need an automation that involves AI, but AI doesn’t power it.

Katie Robbert – 32:22
That’s a good case for getting organized first. It’s easy to start making connections in N8N, but is that the best use of your time? Get organized first. Then if you need a RAG system, you’ll know.

Christopher Penn – 33:02
Going back to Brian’s question, you can build a RAG tool using a foundational model. The first part is getting the data, building the embeddings, storing them in a vector database, and then you can use the foundational model to talk to that database. That can be local, in the cloud, or hybrid. Building the vector database is independent of using it. In NotebookLM, you load all the data first, then you ask questions.

Christopher Penn – 33:43
It’s the same process with a more advanced RAG system. You load all your data first. Having something that grabs and embeds data on the fly is inefficient and will likely break.

Katie Robbert – 34:06
Let’s talk about some practical use cases for when you should and shouldn’t use a RAG system. A sales playbook is a good candidate—it tends to be proprietary. We have one, and it seems like the perfect candidate along with customer information to put into a RAG system to keep it contained and proprietary.

Christopher Penn – 34:40
That’s a great starting use case. Two questions: How private does it need to be? And how large is the data?

Katie Robbert – 34:49
I don’t know the answer to those, but I know we want to keep our customer data private. And the sales playbook is our secret sauce.

Christopher Penn – 35:22
The Sales Playbook is a small document and easily fits in any LLM’s memory. Size is off the table. CRM data, on the other hand, is a good fit because it’s so large. The second consideration is privacy.

Christopher Penn – 36:02
Would you be okay with a mostly private system, or does it need to be locally hosted?

Katie Robbert – 36:14
It depends on what we mean by “conditionally private.” Am I giving away Social Security numbers, or can someone see which companies are our clients?

Christopher Penn – 36:32
In the Microsoft OpenAI ecosystem, training is on by default, but you can turn it off.

Katie Robbert – 36:44
So it’s conditionally private because it’s not private out of the box, but you can make it private.

Christopher Penn – 36:51
That’s…

Katie Robbert – 36:51
If I know how to configure things correctly, I’d be fine.

Christopher Penn – 36:58
If you’re just using the Sales Playbook by itself, use a green tool (mostly private by default). If you’re using it with CRM data, you could use a RAG system. I could use DeepSea, Azure OpenAI, or Google Vertex Gemini.

Christopher Penn – 37:45
I could use the Google Vertex edition of Google Gemini, and that’s governed by the SLA and our account. I’d have my vector database hosted somewhere with my sales data, and then one of the cloud-based models doing the processing. That’s a hybrid system.

Christopher Penn – 38:35
We know whether the deal closed, what language was used, what adjectives and adverbs were used. We can build a classical AI model to see what all the closed calls have in common. That’s using RAG because you’re pulling data from a massive database.

Christopher Penn – 39:30
The ultimate outcome is an analysis of what works. For sales and marketing leaders, that’s interesting. Can you learn the language of what works for effective sales calls? Yes.

Katie Robbert – 39:50
That’s an amazing use case. I’d add the proposals we’ve written and closed. Voice of customer data is another good use case. Where do you get that data?

Katie Robbert – 40:41
Voice of customer could be a dozen things. That seems like another really great place for a RAG system.

Christopher Penn – 41:00
It’s a good option, especially if you want to blend it all into a single source of truth. For example, in advertising and marketing, you’d have comments on social media. You could vectorize it and put it in a vector database. You could also use market research and focus group transcripts.

Christopher Penn – 41:38
You could look at social media conversations, Reddit threads, product and service reviews. These are heterogeneous data sources. You could vectorize them, put them in a single database, and then query that database.

Christopher Penn – 42:30
You’re voicing customer interactions into one central concept. Generally speaking, people like the bacon oatmeal better than the chicken oatmeal.

Katie Robbert – 42:41
Analyzing qualitative data is exponentially harder than analyzing quantitative data. A RAG system brings all that together.

Christopher Penn – 43:11
The downside is that it has the same vulnerabilities as generative AI. A customer might ask a stupid question, like, “Do you have a kosher version of bacon oatmeal?” It’s not kosher; it’s bacon. But if that question gets loaded into a vector database, the word associations will get blended together, and the RAG system could mistakenly interpret that.

Christopher Penn – 43:53
Even though a RAG system is locked to your data, there’s still the possibility of hallucination. If the data is garbage going in, the data coming out will be garbage.

Katie Robbert – 44:12
You still need the experts. There needs to be human intervention in the process, especially in planning (the 5Ps) and at the output. If someone asks about kosher bacon, you need to correct the system.

Christopher Penn – 45:05
You need solid system instructions and guardrails on the data. The data has to be good, and the system has to have guardrails. AnythingLLM doesn’t have any guardrails built in; you have to provide them.

Katie Robbert – 45:35
John, what’s a use case you’ve been thinking about?

John Wall – 45:40
Financial data. I feel like I spend a lot of time without enough data.

Christopher Penn – 46:07
Financial data is one of the worst things to use with generative AI, because it involves math, and they’re bad at math. A RAG system can be helpful here. You can load your entire code base into a RAG system.

Christopher Penn – 46:53
One problem with debugging using AI is that it doesn’t necessarily always understand the dependencies from file to file. If you have good code with good documentation, it will know where to look.

Katie Robbert – 47:15
You’re talking about the unicorn—good code with good documentation.

Christopher Penn – 47:20
My code these days has great documentation, because AI writes it. A core requirement is robust documentation.

Katie Robbert – 47:37
That’s smart. We talked about financial data. What are some other use cases that are off the table?

Christopher Penn – 48:10
It goes back to size and privacy. If you can fit the data inside the working memory, a RAG system is overkill. If you’re a marketer using this for marketing strategy, the information will likely fit inside the context window. If it doesn’t—like if you’re in a super-competitive industry—then a RAG system would make sense.

Christopher Penn – 49:02
Okay. That’s going to be a lot of information to dig through. A RAG system would make sense in those cases. Size and privacy are the two factors. If you have both, you definitely want a RAG system. If you have one, but not the other, it might or might not be the right choice. If you have neither, don’t overcomplicate things.

Katie Robbert – 49:24
When in doubt, use the five Ps. Don’t overcomplicate things. Our Trust Insights Use Cases course launches in five days (TrustInsights.ai/use-cases-course).

Katie Robbert – 50:09
If you want more information, join our free Analytics for Marketers Slack group (TrustInsights.ai/analytics-for-marketers).

Christopher Penn – 50:22
That’s it. We’re off next week. We’ll be back in two weeks.

Katie Robbert – 50:29
That’s right.

Christopher Penn – 50:31
Thanks for watching. Subscribe to our show wherever you’re watching it. For more resources, check out the Trust Insights podcast (TrustInsights.ai/podcast) and our weekly email newsletter (TrustInsights.ai/newsletter). Got questions? Join our free Analytics for Marketers Slack group (TrustInsights.ai/analytics-for-marketers). See you next time!

Need help with your marketing AI and analytics?

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

In this episode you’ll learn:

Transcript:

5 thoughts on “So What? Use cases for Retrieval Augmented Generation”

Leave a Reply Cancel reply

Subscribe to our Weekly Newsletter

Pin It on Pinterest