The Cost of Data Leakage

If you are a publisher that depends on advertising dollars to fund your operations, data leakage is a critical threat to your bottom line.  If you remember nothing else from this post, remember this – data means audience, and audiences are what advertisers pay to reach.  If they can reach them without  buying expensive content adjacency, they will.

Reaching a specific audience used to be hard.  Really hard.  That’s not to say you couldn’t buy it – any number of vertical publishers were happy to sell you millions of impressions if you wanted, but needed deep pockets and what advertisers want most after reaching a target audience  is to scale it to the hilt for the lowest possible cost. Anyone who doubts that can look toward the meteoric rise of ad networks and programmatic exchange buying, which has rocketed to a double-digit chunk of the display industry spend in just over a year.  Cost is a major factor in driving that.

That’s not to say expensive sponsorships and content adjacency are stupid or a waste of money, far from it – but content adjacency is usually a proxy for an audience, reached at scale in an operationally efficient manner, in the right frame of mind to drive brand awareness and brand recall.  Splashy sponsorships and content adjacency are what we call top of the funnel strategies, and they are expensive because it is incredibly difficult to attract a large audience looking to research a certain brand of car, or an HDTV, or their 401K allocation.  Vertical sites can charge a premium because it is not easy to build a deep, engaged, and reliably large audience. Advertisers are very aware of this.

By allowing advertisers to cookie users via pixel fires out of an ad tag, publishers are enabling their clients and ad network partners to remove them from the value chain.  If an advertiser can build a cookie pool on a publisher’s audience, it can readily retarget that audience for a much lower cost on the ad exchanges by using either a DSP or an Ad Network.  From the advertiser perspective this is a great way to extend reach, lower costs, and drive ROI.  The benefits are so great that it would seem absurd not to try, as if the publisher had simply left an unattended briefcase full of money outside the agency’s door.  Publishers without a way to secure their data are pretty much asking to have their audience filtered away from them.

A cookie pool on the loose has a number of negative impacts – first, it erodes the value of the publisher’s audience by allowing advertisers to access it through cheaper channels.  Publishers make enormous investments in technology and quality editorial to attract their audiences, which eventually becomes a competitive edge.  There is a long list of vertical publishers that have cornered the market in their chosen topic over years of hard work, and a marketer willing to pay premium CPMs to reach that audience is the reward.  If the advertiser doesn’t need the publisher to reach that audience any longer, that audience is suddenly worth less.  The audience is everywhere, on thousands of sites.  It is no longer Publisher X’s Unique Audience, it is CookiePool123, it is a commodity.

Finally, from a technical perspective, data leakage potentially exacts a huge cost on your site’s user experience through page latency.  All those third party ad pixels take time to execute, and in many cases may not work through an iFrame tag, meaning they must finish before the page content can continue to load.  At 20+ms for each call in addition to the time it takes for your ads to load, it doesn’t take much to make for a sluggish site from the user perspective. Anyone will tell you slow pages degrade almost every major site metric, not to mention can have a significant impact on SEO rankings.  Chew on that for a bit!

So what can a publisher do?  Read Next – Managing Data Leakage and find out.

Audience Analytics Lights the Data Leakage Fuse

As data collection started to take off on the advertiser side, companies like comScore and Nielsen were simultaneously trying to do more and more to build a story around demographic behaviors online, which is a huge challenge because of how inherently fragmented the internet is vs. traditional media.  The standard model for those traditional media measurement companies is to set up a small panel, reach statistical significance for a few major publishers, and extrapolate the results into official ratings.  The smaller the panel, the wider the margin of error, but typically they can get within a few % points with just a few hundred people.  Sounds great, but on the internet there aren’t a handful of major networks with distribution to every home in America, there are a hundred million destinations with a wide range of viewerships, some quite minuscule, so sampling with a small group is impossible.  The result is a panel approach doesn’t really work all that well, as evidenced by the typically huge discrepancy in what publishers report as the number of unique visitors per month and what comScore might tell you.

Then, in 2006, everything changed when a few really smart guys founded a visionary company called Quantcast.  Quantcast’s history is critical to understand as part of the data leakage story because they were the first company to really get people thinking about the value of audience data and then make that audience data actionable.  While they didn’t invent the mechanism to build an online audience, they were the first to figure out how to build a system that could algorithmically tie demographic information to a specific cookie, keep the data current, and scale that service at scale on hundreds of millions of cookies. They accomplished all this by directly measuring audiences instead of using a small panel of users.  This is a standard methodology today – every data exchange currently in market (BlueKai, AudienceScience, eXelate, and others) relies on having a redirect to their cookie sitting on thousands of sites to help them build a cookie pool they can profile, but no one was doing it in 2006.

Quantcast was able to pull off direct measurement of audiences on sites they didn’t own through a unique data-sharing proposition to publishers – put a pixel on your site, allow Quantcast to cookie your users and they would give you demographic analytics on your site audience – free.  Or rather, in exchange for surrendering your rights to any data Quantcast could collect.  Publishers big and small signed up in droves and in a short while, Quantcast was measuring tens of millions of people on tens of thousands of sites.  This mountain of data allowed them to do really sophisticated audience modeling and infer demographic and psychographic characteristics at a cookie level.  After building a unique audience profile on each cookie, they could aggregate that data for the unique cookies on any given publisher and report accurate demographic profiles for any publisher.  It was the reverse of traditional media measurement – publishers contributed thousands of data points on each user rather than users contributing data points on the publisher, but this solved the fragmentation problem.  By building confidence at the cookie level, Quantcast could simply re-purpose the data they had on a cookie for whatever group they saw on a tiny publisher.  You could have a site with thousands of visitors a month instead of tens of millions and still have the same extremely accurate demographic reporting.   It was pretty slick stuff and so effective that eventually it forced comScore and Nielsen to start doing direct measurement as well.

Suddenly, people saw the power of the cookie.

So now, you had a company with a truckload of audience data on a huge majority of the US internet population, down to the cookie level, and you had a ton of advertisers looking to get those exact same audience metrics on their pile of cookies.  What a great coincidence!  Now advertisers could use the same technology and instead of (or in addition to) dropping their own cookie on a user, they could drop Quantcast’s cookie on that user and then access the same sophisticated audience metrics that Quantcast had collected from publishers.

While impressive as a new frontier in media analytics, it all came together when Quantcast figured out how to enable ad targeting on their cookie pools.  Through a simple tag integration with an advertiser and publisher account, Quantcast could actually pass a key-value into an ad tag and target an advertiser’s ads against their cookie pool.  True, explicit audience targeting was born!  The best part was that advertisers didn’t have to build that cookie pool on any specific publisher in order to target against it.  Advertisers could cookie the audience on a premium site and then target that audience on another cheaper site.  In fact, you didn’t have to build a cookie pool at all, Quantcast would just sell you a cookie pool with your choice of demographic data if you wanted.  You could even get really fancy and take a small, directly cookied audience and scale it up exponentially with a statistical correlation against the Quantcast database Quantcast called look-alike modeling.  Basically, Quantcast was able to look at a small group of cookies, figure out what was similar about them, and then find other cookies not explicitly cookied by the advertiser to scale that audience into a much, much larger cookie pool.  Again, this is a standard offering with most DSPs and data exchanges today, but unheard of a few years ago.  Audiences were now portable, and true audience targeting was born.

To be fair, it’s not like Quantcast held a gun to publishers heads, the publishers readily volunteered access to their data – gave it away even!  They got access to new analytics and pushed media management from a panel based system to a data-driven model.  Small publishers in particular, which comScore and Nielsen wouldn’t have bothered with got a big helping hand from Quantcast’s trusted, 3rd party metrics when trying to sell their sites.  And, if publishers wanted, Quantcast was more than happy to help them productize their inventory by demographic characteristics to sell directly to advertisers.  I’m not sure the digital publishing community ever really got on board with that concept, but the ad networks certainly ran with the idea and used it to differentiate and add value.  The point is that Quantcast isn’t a bad actor in the industry, rather they are an innovator, and trusted by plenty of major media companies on the buy and sell side.  But they had a foundational role in the mechanism that potentially puts a great many publishers at risk of commoditizing their audience.  Publishers have to start paying attention to the potential risks.

Quantcast’s innovations in the media measurement and data management space forever changed the value of data by making it actionable, and would soon spawn a number of competitors that sought to do the same thing.  Eventually, this data management space would collide with the ad network and ad exchange space and throw a bucket of gasoline  on the whole issue of data leakage.

Read Next – Understanding the Costs of Data Leakage

A Primer on Data Leakage for Digital Publishers

In this new four-part series on data leakage, I’ll explore how data leakage snuck up on the digital publishing industry as a critical business risk, how data leakage happens, what the costs are, and how publishers can create a policy around their data to manage the risk and capitalize on the opportunity.

What is Data Leakage?

In the digital advertising world, data leakage means the unwanted or unknowing transfer of audience data from one party to another, typically from a publisher to an advertiser, although in some cases, from an advertiser to an intermediary, such as a data exchange or ad network.

That’s my attempt at a Webster’s definition, but plainly speaking, when people talk about data leakage as it relates to interactive advertising, in almost all cases they’re talking about advertisers, ad networks, and data companies dropping cookies on users through ad redirects running on a publisher without that publisher knowing it or wanting it.  The thing is, advertisers have been doing that for years for benign purposes–like tracking ROI, for example, to see how many users from a content buy made it to their website, or conversion page.  Advertisers would drop a cookie on a user through their ad tag, and if the same cookie was recognized on a landing page at some point in the future, they could value to their ad buy, what the ad world calls ‘attribution’.  Measuring ROI was great, but that’s about all you could do with that cookie pool.  As an advertiser, even if you knew all the people in your cookie pool were sourced while reading up on leasing a new Rolls Royce, thus including them in an extremely high-value and rare audience segment, what could you really do with that pile of cookies?

Nothing, that’s what.  So publishers didn’t pay much attention to the practice.  For an advertiser though, it’s pretty easy to drop a cookie with a callback in your redirect, so dropping third party cookies out of ad buys was fairly common in a short while. After all, this is the internet – if you can measure something, why not measure it?

Gradually though, through the increased innovation in the industry and regular practice of cookie or pixel-dropping, publishers have been caught with their pants down.  Today as an advertiser you can absolutely take action against any data you can collect or cookie pool you can build, and often those actions are in direct competition with a publisher’s sales force.  The potential impact to revenue is huge, especially as programmatic buying through ad exchanges continues to build steam.

So what happened?  How did the cookie go from a background distraction to a covert business liability? In the next post, I’ll review a brief history of data collection online and explain how data leakage made it the mainstream.

Read Next – Audience Analytics Lights the Data Leakage Fuse

History of the Ad Exchange Landscape Part IV: The Ad Network Is Dead, Long Live the Ad Exchange

Part IV: The Ad Network Is Dead, Long Live the Ad Exchange

I’ve written a few times about Ad Exchanges on this site, as they are one of the more exciting areas to think about in digital advertising right now.  Ad Exchanges such as Google Exchange, Right Media Exchange, or Microsoft Exchange are an elegant solution to this mess of redirects and inefficient monetization for the industry at large and offer the revolutionary opportunity to do true audience targeting.  Via an ad exchange, sellers can auction their inventory to the highest bidder through a single redirect, and buyers can evaluate and bid on that inventory impression by impression using rich pools of their own or 3rd party data.  This rich targeting gives ad exchanges a big advantage over ad networks in terms of transparency and targeting capabilities and since ad exchanges aggregate the inventory from thousands of publishers in the same place, exchanges also offer exponentially more reach than any ad network ever could.  Combine that with the fact that the exchange is also highly transparent in terms of pricing and doesn’t mask a markup on the media like an ad network, and you can see why both advertisers and publishers are pretty excited about the ad exchange opportunity.

Of course, if your business is optimizing ad network spend, the exchange is actually a threat, since the exchange aims to do exactly the same thing as a network optimizer, but just on a larger scale and therefore, with greater efficiency.  But unlike many of the ad networks out there, the network optimizers weren’t just an office of sales people, they had real technology, teams of developers, and knew how to build product.  So instead of trying to push publishers away from the ad exchanges, they’ve for the most part embraced ad exchanges and real-time-buying (RTB) systems, and have worked to aggressively assimilate demand from the ad exchanges into their auction and optimization algorithms.  RTB is certainly a growth area for digital advertising, but since ad networks still provide about 70 – 80% of the demand dollars out there most publishers can’t walk away from that source of revenue just yet.

So where does digital advertising go from here?  Well, there are a few other issues lurking in the background where Supply Side Platforms will play a key role.  The first issue is data leakage, where advertisers seek to data-mine publisher audiences though dropping their own cookies out of ad buys, or even using javascript tags to scrape the page content the ad serves on as well.  In my experience, publishers drastically underestimate the risk of data leakage to their business, mostly because there aren’t many tools in place to help them manage the problem.  Currently there are only blunt-instrument approaches, but the SSPs are hard at work building new tools to expose the problem to publishers and help them take on this issue head first.  Also on the horizon is the concept of audience futures, or guaranteed RTB, which combines the targeting benefits of RTB with the placement and priority guarantees of a premium ad buy, and by the way, can bypass the ad exchange altogether.  Data Management Platforms like Demdex and RedAril have already built a model for audience profiling systems, but it’s such a natural place for the Supply Side Platforms to go because it provides a new premium sales product to publishers and uses most of the same infrastructure as the ad exchange monetization plumbing they already provide.

Hard to say exactly where this will all land, but it should be interesting to find out!

Read the prior sections of this series:

Part I: Rise of the Ad Networks

Part II: Network Fragmentation and the Ad Ops Problem

Part III: Network Optimizers to the Rescue (?)

History of the Ad Exchange Landscape Part III: Network Optimizers to the Rescue (?)

Part III: Network Optimizers to the Rescue (?)

Network Optimizers weren’t created to stop the bleeding of publisher sales to network sales, they were built to solve the latency and user experience problem by figuring out which network to serve what ad and outsource the implementation and management process for publishers.  They accomplished this feat with raw manpower to start, and then got their hands on some venture cap money to build out technology and algorithms to figure it out and scale their operations.

As I mentioned in Part I, no one ad network wanted all the impressions a publisher could provide, they just wanted a little, or up to a certain frequency, or within a certain geography, or during a certain time of day, or with some other specific characteristic.  If you were a publisher you could setup this kind of targeting to point users with certain characteristics to certain networks, but after a certain level of scale, things started to fall apart.  it was much easier to just traffic one redirect to a network optimizer, and let them figure out which network was most likely to take the impression.  Usually publishers would pull a weekly report to figure this out for themselves, but a network optimizer with an API connection to the ad network’s reporting server could pull data every few minutes and start to learn what impression the network wanted, and which ones it didn’t.  By figuring out which ad network was most willing to take the impression on the first try and not keep redirecting users through the so-called daisy-chain of ad calls, network optimizers could reduce latency from the ad server, improve user experience, and increase revenue to the publisher.  Optimizers could also handle the operational hassles of managing an advertiser blocklist through multiple ad networks, enforcing ad quality guidelines to keep tobacco or alcohol advertising off a publisher’s site, and could even do the bill collecting if need be.  It was like outsourcing an entire back office of Ad Operations, Reporting, and Billing team all at once for a small revenue share.

As a business strategy, this worked pretty well – major digital publishers signed contracts to manage billions of unsold impressions with network optimizers like Collective Media, Pubmatic, AdMeld, and Rubicon Project.  It was a giant step forward for publishers, but another industry force was just starting to emerge that would present publishers with a new opportunity as well as new challenges: the ad exchange.

Read More: History of the Ad Exchange Landscape Part IV: The Ad Network Is Dead, Long Live the Ad Exchange