Like pizza? Like content analytics data?
Our newsletter delivers both.
Join us!

How Mashable’s Data Scientists Spend Time Analyzing Data, Not Accessing It

two data scientists working

The promise of a data science team for Mashable meant that they could create innovative technology for the media industry. They accomplished this with their impressive work on Velocity Technology Suite, but the team still faced an abundance of questions about the Mashable audience.

What makes a reader more likely to subscribe to a newsletter? What topics are prime for additional coverage or investment? How could advertising revenue be increased?

But what did the Mashable data science team actually spend their time doing? The answer, according Haile Owusu, the Chief Data Scientist, was too often finding themselves doing dirty work: identifying the right data, requesting it from their data service, cleaning it, realizing they had the wrong data and doing it all over again.

Even though the company had invested in a data science team early on, a simple question about how audiences were watching a video series meant a manual back-and-forth process with their historical data provider. It was a frustrating way to get access to data about their own readers.

Mashable Finds a New Data Pipeline

Meanwhile, the editorial team received real-time content analytics from their Parse.ly dashboard and reporting suite. “I knew there was this dashboard that the editorial team checked,” said Owusu, “but it wasn’t something that I thought we could use.” Out of curiosity though, he asked what other data the Mashable data science team could access since Parse.ly was already processing their site content.

It turned out that Owusu and his team could get direct access to raw, de-aggregated data in a denormalized schema they’ve enriched with content metadata, geolocation and device-level information. Having data at this level of granularity means no compromises from the queries they can ask. So in 2016, the Mashable team integrated the Data Pipeline.

Mashable uses content-level data from the data pipeline

This decision has made creating audience profiles and article recommendations more adjustable in ways that were not possible with Mashable’s previous data provider. Indeed, Mashable didn’t even broach certain analyses because the time cost of retrieving data was prohibitive.

“This is no small thing: we can experiment with an idea and not worry that that could constitute vast amounts of time wasted.” – Haile Owusu, Chief Data Scientist, Mashable

For digital teams lucky enough to have data scientists on-board, the last thing they want is for those resources to be spent just getting access to the data. Instead they want to answer questions that will make a difference to their business: What are the differences between our audience segments? What circumstances encourage readers to return? How should we split our time with distributed platforms?

Now, the organization has a whole range of accessible approaches to strategic questions. Most recently, they have been exploring questions related to general business intelligence and personalization of its site experience.

Read more about how Mashable makes the most of their audience data, in the full case study.