dealing with missing data

Dealing with Missing Data

This data was originally featured in the February 2nd, 2025 newsletter found here: INBOX INSIGHTS, February 2, 2025: Finding Clarity, Dealing with Missing Data

In this week’s Data Diaries, we’re going to get adjacent to the third rail (politics), but it’s timely and important. Recent changes by the US government have taken thousands of useful datasets offline, especially from research organizations like the Centers for Disease Control and Prevention (CDC) and the National Institutes for Health (NIH). Other datasets are being modified in place, with data being changed retroactively.

To someone like me who is obsessed with clean, complete data, this is obviously reprehensible. But it raises the bigger picture question: what do you do when data you rely on goes missing?

This is not new; we’ve seen in the last decade marketing data sources that were exceptionally valuable just vanish, such as Meta’s Crowdtangle, Twitter’s API, and even Google Analytics (Universal Analytics, we miss you). Those folks who didn’t back up their Universal Analytics data lost years, even decades of historical data.

The loss of historical data is bad, but the loss of current data is worse, because even with predictive analytics and AI, you can only forecast so much. Like all prediction, the further you get away from a source of truth, the more a prediction degrades.

So what do we do? We work with the best data still available to us. We use AI to help us construct an understanding of proxy indicators from data we do have so that we understand the data we don’t have.

Here’s an example. In 2022, reporting for COVID data changed, as did the amount of testing reported to the CDC. That lack of testing made things seem safer than they actually were (which continues to this day). However, wastewater data was an almost perfect match with the high quality test data we developed in 2021 – so good that it was almost a perfect correlation.

As a result, even though official test data showed one number, inference from wastewater data (which is a more reliable data source) showed a very different number. For folks who wanted the most accurate data, we had a terrific proxy number.

So what’s the first step towards doing this? If you don’t have access to statistical software, you do have access to generative AI. You could, with one of today’s reasoning models (which are excellent coders, like OpenAI o1, Google Gemini 2 Flash Thinking, Deepseek R1, etc.) ask it to build you Python or R code to do that statistical analysis for you. While you never want generative AI doing the math, you absolutely do want it writing code to do the math.

You provide examples of the data you have, ask it to recommend a statistical method of correlation best suited for the data (such as Pearson, Spearman, or Kendall-Tau), build out the requirements for the software, and then have it generate the code. You’ll have working software you can reuse over and over again.

As Katie said in the opening, when things get overwhelming, when situations out of our control are happening – especially to our data – we focus on what we CAN do, what’s within our reach. When data goes missing, it’s okay to react negatively about it. Take a moment, feel the feels, and then start building your plan for what you’ll do to get around it. Missing or corrupted data is just damage, and our goal is always to route around damage so that we get to where we want to go.


Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.


Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Trust Insights
dealing with missing data
Instagram Logo
linkedin Logo
Instagram Logo
linkedin Logo
TikTok Logo
Twitter Logo
Youtube Logo
Email Icon
🗞️
🗞️
🗞️
Trust Insights
Instagram Logo
linkedin Logo
TikTok Logo
Twitter Logo
Youtube Logo
Email Icon
Share This