Author: Ben Kneen

A Primer on Data Leakage for Digital Publishers

In this new four-part series on data leakage, I’ll explore how data leakage snuck up on the digital publishing industry as a critical business risk, how data leakage happens, what the costs are, and how publishers can create a policy around their data to manage the risk and capitalize on the opportunity.

What is Data Leakage?

In the digital advertising world, data leakage means the unwanted or unknowing transfer of audience data from one party to another, typically from a publisher to an advertiser, although in some cases, from an advertiser to an intermediary, such as a data exchange or ad network.

That’s my attempt at a Webster’s definition, but plainly speaking, when people talk about data leakage as it relates to interactive advertising, in almost all cases they’re talking about advertisers, ad networks, and data companies dropping cookies on users through ad redirects running on a publisher without that publisher knowing it or wanting it.  The thing is, advertisers have been doing that for years for benign purposes–like tracking ROI, for example, to see how many users from a content buy made it to their website, or conversion page.  Advertisers would drop a cookie on a user through their ad tag, and if the same cookie was recognized on a landing page at some point in the future, they could value to their ad buy, what the ad world calls ‘attribution’.  Measuring ROI was great, but that’s about all you could do with that cookie pool.  As an advertiser, even if you knew all the people in your cookie pool were sourced while reading up on leasing a new Rolls Royce, thus including them in an extremely high-value and rare audience segment, what could you really do with that pile of cookies?

Nothing, that’s what.  So publishers didn’t pay much attention to the practice.  For an advertiser though, it’s pretty easy to drop a cookie with a callback in your redirect, so dropping third party cookies out of ad buys was fairly common in a short while. After all, this is the internet – if you can measure something, why not measure it?

Gradually though, through the increased innovation in the industry and regular practice of cookie or pixel-dropping, publishers have been caught with their pants down.  Today as an advertiser you can absolutely take action against any data you can collect or cookie pool you can build, and often those actions are in direct competition with a publisher’s sales force.  The potential impact to revenue is huge, especially as programmatic buying through ad exchanges continues to build steam.

So what happened?  How did the cookie go from a background distraction to a covert business liability? In the next post, I’ll review a brief history of data collection online and explain how data leakage made it the mainstream.

Read Next – Audience Analytics Lights the Data Leakage Fuse

History of the Ad Exchange Landscape Part IV: The Ad Network Is Dead, Long Live the Ad Exchange

Part IV: The Ad Network Is Dead, Long Live the Ad Exchange

I’ve written a few times about Ad Exchanges on this site, as they are one of the more exciting areas to think about in digital advertising right now.  Ad Exchanges such as Google Exchange, Right Media Exchange, or Microsoft Exchange are an elegant solution to this mess of redirects and inefficient monetization for the industry at large and offer the revolutionary opportunity to do true audience targeting.  Via an ad exchange, sellers can auction their inventory to the highest bidder through a single redirect, and buyers can evaluate and bid on that inventory impression by impression using rich pools of their own or 3rd party data.  This rich targeting gives ad exchanges a big advantage over ad networks in terms of transparency and targeting capabilities and since ad exchanges aggregate the inventory from thousands of publishers in the same place, exchanges also offer exponentially more reach than any ad network ever could.  Combine that with the fact that the exchange is also highly transparent in terms of pricing and doesn’t mask a markup on the media like an ad network, and you can see why both advertisers and publishers are pretty excited about the ad exchange opportunity.

Of course, if your business is optimizing ad network spend, the exchange is actually a threat, since the exchange aims to do exactly the same thing as a network optimizer, but just on a larger scale and therefore, with greater efficiency.  But unlike many of the ad networks out there, the network optimizers weren’t just an office of sales people, they had real technology, teams of developers, and knew how to build product.  So instead of trying to push publishers away from the ad exchanges, they’ve for the most part embraced ad exchanges and real-time-buying (RTB) systems, and have worked to aggressively assimilate demand from the ad exchanges into their auction and optimization algorithms.  RTB is certainly a growth area for digital advertising, but since ad networks still provide about 70 – 80% of the demand dollars out there most publishers can’t walk away from that source of revenue just yet.

So where does digital advertising go from here?  Well, there are a few other issues lurking in the background where Supply Side Platforms will play a key role.  The first issue is data leakage, where advertisers seek to data-mine publisher audiences though dropping their own cookies out of ad buys, or even using javascript tags to scrape the page content the ad serves on as well.  In my experience, publishers drastically underestimate the risk of data leakage to their business, mostly because there aren’t many tools in place to help them manage the problem.  Currently there are only blunt-instrument approaches, but the SSPs are hard at work building new tools to expose the problem to publishers and help them take on this issue head first.  Also on the horizon is the concept of audience futures, or guaranteed RTB, which combines the targeting benefits of RTB with the placement and priority guarantees of a premium ad buy, and by the way, can bypass the ad exchange altogether.  Data Management Platforms like Demdex and RedAril have already built a model for audience profiling systems, but it’s such a natural place for the Supply Side Platforms to go because it provides a new premium sales product to publishers and uses most of the same infrastructure as the ad exchange monetization plumbing they already provide.

Hard to say exactly where this will all land, but it should be interesting to find out!

Read the prior sections of this series:

Part I: Rise of the Ad Networks

Part II: Network Fragmentation and the Ad Ops Problem

Part III: Network Optimizers to the Rescue (?)

History of the Ad Exchange Landscape Part III: Network Optimizers to the Rescue (?)

Part III: Network Optimizers to the Rescue (?)

Network Optimizers weren’t created to stop the bleeding of publisher sales to network sales, they were built to solve the latency and user experience problem by figuring out which network to serve what ad and outsource the implementation and management process for publishers.  They accomplished this feat with raw manpower to start, and then got their hands on some venture cap money to build out technology and algorithms to figure it out and scale their operations.

As I mentioned in Part I, no one ad network wanted all the impressions a publisher could provide, they just wanted a little, or up to a certain frequency, or within a certain geography, or during a certain time of day, or with some other specific characteristic.  If you were a publisher you could setup this kind of targeting to point users with certain characteristics to certain networks, but after a certain level of scale, things started to fall apart.  it was much easier to just traffic one redirect to a network optimizer, and let them figure out which network was most likely to take the impression.  Usually publishers would pull a weekly report to figure this out for themselves, but a network optimizer with an API connection to the ad network’s reporting server could pull data every few minutes and start to learn what impression the network wanted, and which ones it didn’t.  By figuring out which ad network was most willing to take the impression on the first try and not keep redirecting users through the so-called daisy-chain of ad calls, network optimizers could reduce latency from the ad server, improve user experience, and increase revenue to the publisher.  Optimizers could also handle the operational hassles of managing an advertiser blocklist through multiple ad networks, enforcing ad quality guidelines to keep tobacco or alcohol advertising off a publisher’s site, and could even do the bill collecting if need be.  It was like outsourcing an entire back office of Ad Operations, Reporting, and Billing team all at once for a small revenue share.

As a business strategy, this worked pretty well – major digital publishers signed contracts to manage billions of unsold impressions with network optimizers like Collective Media, Pubmatic, AdMeld, and Rubicon Project.  It was a giant step forward for publishers, but another industry force was just starting to emerge that would present publishers with a new opportunity as well as new challenges: the ad exchange.

Read More: History of the Ad Exchange Landscape Part IV: The Ad Network Is Dead, Long Live the Ad Exchange

SSP to DSP Cookie Syncing Explained

The matching process of the SSP cookie ID to the DSP cookie ID happens through a parallel process to serving ads called cookie syncing. A cookie sync is necessary because as a standard security process, web servers of any kind can only request cookies that are set to their own domain. Since the SSP sits between the end-user and all the DSP bidders in a real-time auction however, the DSP needs a way to identify the users it is looking for.

Why Cookie Syncing is Necessary

So let’s take a simple example on a store trying to retarget their users. Let’s say that you run storeABC, and user123 drops in and adds a pair of $150 shoes to their shopping cart, but never makes it to the checkout. You want to retarget that user and serve them an ad directing them back to your store to try and close the sale. Since you work with DSP456, you have a 1×1 pixel sitting on your shopping cart page, which forces the user to call out to DSP456′s web server as they load the shopping cart page, giving DSP456 a chance to drop a cookie on user123. That cookie ID is DSPcookie789. Now, user123 is surfing around the web, and lands on, which is using SSP123 to monetize their ad inventory. serves a 3rd party redirect to SSP123, which drops a cookie on user123. That cookie ID is SSPcookieXYZ. SSP123 now requests a bid from DSP456 among other bidders for the impression that SSPcookieXYZ is about to view. But wait, how does DSP456 know that SSPcookieXYZ = DSPcookie789? On this first ad, it doesn’t, so your DSP doesn’t bid on the impression. Bummer. Remember, the SSP can only read and pass its own cookie ID to bidders.

Piggyback Scripts Power Cookie Syncs

After SSP123 selects a winning bidder though, it runs one last piece of javascript that forces user123 to call out to a handful of regular bidders, including DSP456. In that redirect, using a query string, the SSP passes its cookie ID on user123 (SSPcookieXYZ). Now the user IS calling DSP456′s web server and the DSP can request its own cookie from user123 in a process the industry refers to as “piggybacking”. Eureka! SSPcookieXYZ also has DSPcookie789 – it’s user123! The DSP knows the SSP’s cookie ID because of the query string in the piggyback call, and it can read its own cookie ID because that user called its web server as the end destination with the piggyback call. DSP456 now writes into its database that DSPcookie789 = SSPcookieXYZ for bid requests from SSP123. The next time user123 hits a page that SSP123 is helping to monetize, DSP456 will know it is the same user that abandoned their shopping cart and can bid appropriately. The process repeats for sites using other SSPs.

It sounds complex, but all it means is that for every ad served through RTB, about 10x as many technology companies are involved in the cookie-sync process. The reason the syncing process occurs after the auction happens and not before it is because of latency and user experience. If user123 had to call 10 DSPs, wait for those DSPs to cookie their machine, write the matching SSP id into their database, and then bid, it would dramatically slow down the entire auction process for everyone. If the cookie sync fails on the other hand, well there will be many more opportunities for that.

Cookie Syncing Step-by-Step

Below is a simplified diagram of the cookie sync process, where user123 visits the Marketer’s website first (1), is then redirected to the Marketer’s DSP (2), calls the Marketer’s DSP (3), receives the DSP’s cookie (4) and is simultaneously redirected to various SSPs that the DSP is cookie syncing with (5).  When the user calls those SSPs, the call passes DSP456’s cookie ID on that user in a key value parameter.  The process completes when the DSP456’s ID is logged in the match table for each SSP, and receives the SSP’s cookie in return.


cookie syncing

As an interesting post-script to this, the SSP might redirect the user back to DSP if the DSP is hosting the match table instead of the SSP.  This would be the same process as 4 / 5, just in the reverse order, with the SSP passing its ID to the DSP so the DSP can log both IDs in its own match table.  Who hosts the match table is a commercial arrangement that DSPs and SSPs / Exchanges make with each other, with the match table host typically paid to shoulder the cost of maintaining the database.

Example Cookie Sync Scripts

If you are interested, here are the URL’s that run the piggyback script for each major SSP – these pages look blank, so you’ll have to ‘view source’ to see the code.
Pubmatic: It looks as though ‘vcode’ is the item passing the cookie ID.
Rubicon Project: It looks as though ‘nid’ is the item passing the cookie ID.

History of the Ad Exchange Landscape Part II: Network Fragmentation and the Ad Ops Problem

In Part I of this series, I talked about the Rise of the Ad Networks, and how publisher ad space was commoditized by inventory aggregators known as Ad Networks.  Part II talks about the start of network fragmentation and the technical and operational challenges this caused for publishers.

Part II: Network Fragmentation and the Ad Ops Problem

As the networks fragmented, Ops teams had to add more and more redirect chains to force users through, and created a reporting nightmare for themselves.  Here’s basically what happened – with one ad network, you trafficked a third party tag to that network, and also gave the network a tag back to your site.  The reason being is that no ad network would pay for an unlimited amount of impressions on a CPM basis, so if there was a traffic spike and the network denied, or defaulted on the impression, they had to have a way to send the user back to the publisher ad server so the publisher could figure out something else to serve.  That something else was usually another ad network tag, and the process repeated until the ad was filled.  For a browser, each call to an ad server might take 20 – 50ms, which seems fast, but if the publisher had three ads on a page and the code was written in-line, meaning each ad has to finish loading before the rest of the page content to load, once you start to pass three or four ad calls per tag, the page starts to feel sluggish to a user.  Keep redirecting that user and sooner or later, the ads don’t have time to finish loading before the user moves to the next page, which causes a discrepancy between the publisher and the network, not to mention a lousy experience for users on the publisher’s website.  The publisher thinks an ad was served, but the network’s ad never finished loading.  Ad server reports grew less accurate because the same impression could be counted multiple times as networks sent a user back to the publisher, inflating the numbers and throwing a wrench in any inventory forecasts as well.  Not only that, but from an Ops perspective, the more unsold inventory there was, the more relationships were necessary with ad networks to fill the inventory. Yikes!

The result was a complex and inefficient setups in the publisher’s ad server, with lots of redirects strung together to pass a user from ad network to ad network until one was willing to serve an impression.  All this caused page latency, a lousy user experience, high ad server discrepancies, a billing nightmare, and an accelerating erosion of publisher ad sales.

In other words, it was a huge business opportunity – enter the Network Optimizer, ancestor of the Supply Side Platform.

Next – History of the Ad Exchange Landscape Part III: Network Optimizers to the Rescue (?)