This data was originally featured in the November 10, 2021 newsletter found here:
In this week’s Data Diaries, let’s take a peek at some Tiktok data. Tiktok doesn’t have a public, sanctioned API, so any datasets around it have to be collected from software that crawls and scrapes Tiktok data. A number of enterprising data science enthusiasts have done so; for this look, we’ll be using a dataset published on Kaggle.
As with any exploratory dataset, we first should understand what’s available to us. In this particular Kaggle kernel (a dataset plus code) of the top 1,000 trending videos at the time of capture, we find basic metrics like views, comments, shares, and diggs (likes). We also obtain data like the music being played, the author name, and any accompanying text.

The key question most marketers will inevitably have when looking at beginning analytics like this is, what outcome should we be aiming for? Generally speaking, with social media channels like Tiktok, our initial efforts should be awareness-based – getting people to even see our content. For that, there are two metrics worth considering. First, we have playCount – the number of times a piece of content is seen. That’s a useful metric, literally describing what we’re after. The second is shareCount, which is the number of times a piece of content has been shared. If we want social media efforts to be effective without having to spend extraordinary amounts of budget and time, we need the help of other people to distribute our content.
For today’s purposes, let’s use shares as our objective. Using data science tools like IBM Watson Studio or Dataiku, we can take all this data and ask the software to build a model that tells us what variables most correlate with the outcome we care about:

What we see from this initial dataset is that comments plus views, followed by comments alone, have the highest correlation with the outcome we care about. Thus, if we’re producing content on Tiktok, we might want to focus our efforts on encouraging comments and see if that then yields an increase in the number of shares, thereby proving causality. After all, it’s entirely possible that reverse causation exists – someone shares it, and that causes people to comment.
What’s missing from this data is any of the more sophisticated feature engineering that might guide our content efforts better, such as what the topic or subject of the video is itself. Because Tiktok is still a relatively new platform with no real, official data, we must rely on gathering the data ourselves and doing this work in lieu of it being provided.
If you’re producing content for Tiktok, let us know how you determine your analytics and content strategy in our free Slack group, Analytics for Marketers!
Methodology: Trust Insights used the Kaggle Tiktok top 1000 trending videos dataset provided by Kaggle. The timeframe of the data is December, 2020. The date of study is November 10, 2021. Trust Insights is the sole sponsor of the study and neither gave nor received compensation for data used, beyond applicable service fees to software vendors, and declares no competing interests.
Need help with your marketing AI and analytics? |
You might also enjoy:
|
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday! |
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday. |
Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.
One thought on “How to extract data from Tiktok”