Audience Analytics Lights the Data Leakage Fuse

As data collection started to take off on the advertiser side, companies like comScore and Nielsen were simultaneously trying to do more and more to build a story around demographic behaviors online, which is a huge challenge because of how inherently fragmented the internet is vs. traditional media.  The standard model for those traditional media measurement companies is to set up a small panel, reach statistical significance for a few major publishers, and extrapolate the results into official ratings.  The smaller the panel, the wider the margin of error, but typically they can get within a few % points with just a few hundred people.  Sounds great, but on the internet there aren’t a handful of major networks with distribution to every home in America, there are a hundred million destinations with a wide range of viewerships, some quite minuscule, so sampling with a small group is impossible.  The result is a panel approach doesn’t really work all that well, as evidenced by the typically huge discrepancy in what publishers report as the number of unique visitors per month and what comScore might tell you.

Then, in 2006, everything changed when a few really smart guys founded a visionary company called Quantcast.  Quantcast’s history is critical to understand as part of the data leakage story because they were the first company to really get people thinking about the value of audience data and then make that audience data actionable.  While they didn’t invent the mechanism to build an online audience, they were the first to figure out how to build a system that could algorithmically tie demographic information to a specific cookie, keep the data current, and scale that service at scale on hundreds of millions of cookies. They accomplished all this by directly measuring audiences instead of using a small panel of users.  This is a standard methodology today – every data exchange currently in market (BlueKai, AudienceScience, eXelate, and others) relies on having a redirect to their cookie sitting on thousands of sites to help them build a cookie pool they can profile, but no one was doing it in 2006.

Quantcast was able to pull off direct measurement of audiences on sites they didn’t own through a unique data-sharing proposition to publishers – put a pixel on your site, allow Quantcast to cookie your users and they would give you demographic analytics on your site audience – free.  Or rather, in exchange for surrendering your rights to any data Quantcast could collect.  Publishers big and small signed up in droves and in a short while, Quantcast was measuring tens of millions of people on tens of thousands of sites.  This mountain of data allowed them to do really sophisticated audience modeling and infer demographic and psychographic characteristics at a cookie level.  After building a unique audience profile on each cookie, they could aggregate that data for the unique cookies on any given publisher and report accurate demographic profiles for any publisher.  It was the reverse of traditional media measurement – publishers contributed thousands of data points on each user rather than users contributing data points on the publisher, but this solved the fragmentation problem.  By building confidence at the cookie level, Quantcast could simply re-purpose the data they had on a cookie for whatever group they saw on a tiny publisher.  You could have a site with thousands of visitors a month instead of tens of millions and still have the same extremely accurate demographic reporting.   It was pretty slick stuff and so effective that eventually it forced comScore and Nielsen to start doing direct measurement as well.

Suddenly, people saw the power of the cookie.

So now, you had a company with a truckload of audience data on a huge majority of the US internet population, down to the cookie level, and you had a ton of advertisers looking to get those exact same audience metrics on their pile of cookies.  What a great coincidence!  Now advertisers could use the same technology and instead of (or in addition to) dropping their own cookie on a user, they could drop Quantcast’s cookie on that user and then access the same sophisticated audience metrics that Quantcast had collected from publishers.

While impressive as a new frontier in media analytics, it all came together when Quantcast figured out how to enable ad targeting on their cookie pools.  Through a simple tag integration with an advertiser and publisher account, Quantcast could actually pass a key-value into an ad tag and target an advertiser’s ads against their cookie pool.  True, explicit audience targeting was born!  The best part was that advertisers didn’t have to build that cookie pool on any specific publisher in order to target against it.  Advertisers could cookie the audience on a premium site and then target that audience on another cheaper site.  In fact, you didn’t have to build a cookie pool at all, Quantcast would just sell you a cookie pool with your choice of demographic data if you wanted.  You could even get really fancy and take a small, directly cookied audience and scale it up exponentially with a statistical correlation against the Quantcast database Quantcast called look-alike modeling.  Basically, Quantcast was able to look at a small group of cookies, figure out what was similar about them, and then find other cookies not explicitly cookied by the advertiser to scale that audience into a much, much larger cookie pool.  Again, this is a standard offering with most DSPs and data exchanges today, but unheard of a few years ago.  Audiences were now portable, and true audience targeting was born.

To be fair, it’s not like Quantcast held a gun to publishers heads, the publishers readily volunteered access to their data – gave it away even!  They got access to new analytics and pushed media management from a panel based system to a data-driven model.  Small publishers in particular, which comScore and Nielsen wouldn’t have bothered with got a big helping hand from Quantcast’s trusted, 3rd party metrics when trying to sell their sites.  And, if publishers wanted, Quantcast was more than happy to help them productize their inventory by demographic characteristics to sell directly to advertisers.  I’m not sure the digital publishing community ever really got on board with that concept, but the ad networks certainly ran with the idea and used it to differentiate and add value.  The point is that Quantcast isn’t a bad actor in the industry, rather they are an innovator, and trusted by plenty of major media companies on the buy and sell side.  But they had a foundational role in the mechanism that potentially puts a great many publishers at risk of commoditizing their audience.  Publishers have to start paying attention to the potential risks.

Quantcast’s innovations in the media measurement and data management space forever changed the value of data by making it actionable, and would soon spawn a number of competitors that sought to do the same thing.  Eventually, this data management space would collide with the ad network and ad exchange space and throw a bucket of gasoline  on the whole issue of data leakage.

Read Next – Understanding the Costs of Data Leakage


  1. Ben,

    As always very very informative content and thank you for continuing to educate us. I was reading through How DSPs, SSPs, and Ad Exchanges Work and my question is where does a behavioral targeting engine fit in that diagram? how do they interact specifically with an Ad Exchange system modules? how do cookies get passed from the Ad Exchange to the BT engine? Maybe a diagram will help

    Thank You,


  2. Hi Lisa,

    It depends what you mean by a behavioral targeting engine. I assume you are referring to a buying platform like a DSP in which case I would encourage you to look at this other post (it looks like you found it) called Diagramming the SSP, DSP, and RTB Redirect Path for a visual of where each piece sits.

    If you meant a data exchange or data management platform that is really just in the business of profiling audiences and building behavioral cookie pools (vs. targeting them on an exchange) than I would say that system would also sit in parallel to the DSP in the diagram on the aforementioned post. Essentially what would happen is you would have an integration between the DMP and the DSP, and sych those cookies together so the DSP could effective identify the users you wanted to reach. For more on cookie syncing and how that works you may be interested to read this post: SSP to DSP Cookie-Synching. The process is basically the same for a DMP as an SSP, although since a DMP to DSP relationship would be one you can entirely control, you can probably do a server to server integration, depending on who you work with, to eliminate the issue of latency and data loss that happens in the redirect sync method.

    Hope that helps, if you’re still unclear write me back with some more details and I’ll try to assist however I can.


  3. Ben,

    You are ahead of your time. Incredibly insightful command of the space and how the pieces fit together. I hope your current employer understands the asset they have!

  4. Hi Ben,

    Have you heard of a company called BTBuckets? They are offering free behavioural targetting tracking.

    Are they trying to do what Quantcast did?

    Publishers taking their free offer, potentially could be subjected to data leak?


  5. Hi Ken,

    I hadn’t heard of BTBuckets before you mentioned it, but after checking them out, it sounds like they plan to have a hybrid business model between a Quantcast and a DMP that has a data selling business, like Lotame or BlueKai, for example. On their About page, they say “Our business model will be based on large sites (with over 5 million requests per month) and professional services (such as advanced support).” To me, that means they probably have some type of audience profiling technology that they need to feed with data, and they plan to get a bunch of small sites to sign up for their service for free in order to get enough data to make the platform useful to a company that can actually pay for it. A Quantcast casts the same type of wide net, but just provides analytics, they don’t let anyone actually use that behavioral data unless they pay for it. Lotame and BlueKai also cast a wide net to power their own data products, have internal technology to profile those users into behavioral buckets, and then sell that data to pretty much anyone who wants to pay for it through the ad exchanges, DSPs, etc. The main difference is they too aren’t giving it away for free.

    I don’t really know enough about BTBuckets to really say much about them, the above is just my speculation as to their most likely strategy, but I really don’t know. I would assume that if you are a publisher, your data is being aggregated and profiled with lots of other small site audiences and then sold as targetable segments – so yes, it sounds like a potential source of data leakage to me, but it’s really up to you if you think that’s a meaningful risk to your business or not. If you get to take advantage of the collective information in the platform for free, you might think of it more like a data co-op instead of data leakage, and it might not be such a bad deal. I would read their contract closely and reach out to them with any questions. It looks like a small company so I would think they would get back to you pretty quickly.

    Best of luck –


Leave a Reply

Your email address will not be published. Required fields are marked *