Lookalike Modeling Your Ad Ops Team Can Build With a DMP

You Can Model Clickers and Converters, Too

The technique above is especially useful for finding ways to optimize campaigns that are focused on a click or online conversion metric – you simply track the campaign clickers or converters with a new audience in your DMP, and then index all audiences in your platform against their overlap in the clicking or converting audience.  You could, for example, start running every performance based campaign in ROS to expose every audience to the campaign, and then after a short period of time figure out which audiences are responding more favorably and reliably to the campaign goal.

In an ideal world you have lots of audiences you can overlap against a target; hundreds or even thousands.  You could then index all of them against your target, sort them by the index, and then optimize your campaign targeting into the top choices.  Which segments you pick, the highest indexing or the largest scale (there will rarely ever be an option that is both large and high quality), depending on your goals for the campaign, budget, etc. You can also exclude the lowest indexing audiences as a technique, and reduce your distribution against lower performing audienciences.

The risk to this technique is that the amount of overlapping users is so small that you lack enough of a sample to reach a statistically significant index.  In other words you don’t have enough data to trust the lookalike.  To precisely calculate this, you’d need to employ a statistician, however my rule of thumb has been to rely on standard sample size tables that clearly define how many users you need to sample from a given population for the result to meet a particular confidence level.  You can easily build this check into Excel to compare your overlapping users in the test segment (pet owners in our case) to the target segment (women).

As you can see though, in a population of almost any size, a mere 400 users is all you need for a representative sample to meet a 95% confidence level with a ± 5% margin of error.  You can use this same check on creating general lookalike audiences, but it tends to be more relevant when working with very small target segments, like users who had to take a particular action.  Of course, this isn’t the most sophisticated audience modeling method out there, far from it; but for Ad Ops teams who need to play fast and loose with campaign optimization, it’s a place to start, and a great way to get more out of your investment in a DMP.


  1. Hey Ben,
    Could you please explain how indexing works? I don’t really understand the multiplication with 100 method.

  2. Hi Chico,

    Indexing is just a way to create a relative metric so that you can compare two things that are different sizes from an absolute point of view. For example, in the article I mention how you might determine which country in the world has the most tall people; to do that you wouldn’t just want to count the sheer number of tall people in each country, because every country has a different population. Of course China will have more tall people than Norway, because it’s total population is 250 times as large. Rather, you’d want to know how many tall people per some standard unit, like 1000. If you could know how many tall people per thousand people each country has, you could then say which country truly has more tall people on a per capita basis. And that’s what indexing is, it’s converting any absolute figure into a relative, or per capita figure that you can use to make accurate comparisons.

    Specifically to your question, the only reason I end up multiplying by 100 is to make the number easy to read. .03 vs. .11 is the same as 3 vs. 11, but I find 3 vs. 11 to be easier figures to work with, so I just multiply all figures by 100 to change a percentage into a whole number.

    Hope that makes sense –


  3. I just happened upon this article and I have the same question regarding your index #. Specifically, how are you arriving at the 150 index for Pet Owners? No matter how I try to slice the data I can’t get the math to work. Can you explain the equation?

  4. Hi LAQ,

    The calculation divides the concentration of Pet Owners (the test segment) in Female segment (7.5%) into the concentration of Pet Owners in the Total Population (5%). So, 7.5 / 5 = 1.5, and then 1.5 *100 = 150. We multiply by 100 just to make the number easier to read, and because we do it to every test segment’s result, we’re not changing the relationship between the figures, we’re just transforming them up by a factor of 100.

    What we’re basically saying is “how concentrated are Pet Owners among women vs Pet Owners on the site in general?” If the index is higher than average (the site in general), then we know, generally speaking, Pet Owners have a higher propensity to be women than not, and we can quantify that propensity with our index.

    Hope that helps!

  5. Hey Ben,

    In test segments like Pet owners etc., how did you arrive at overlap no. (i.e. overlap no. of female in the segment)?


  6. Hi MN,

    Your DMP should be able to provide this to you through a custom report; essentially you need a matrix style report.

    Something like this: matrix overlap report

    For your data, you’d want all your segment IDs as column headers as well as row headers, with the overlapping members between any two segments as the cell value where they intersect. Another way to do it would be to create a segment that combines Pet Owners AND Women in the definition, but you’ll have to create a huge number of segments to get at every possibility. The whole point of this analysis is that you want to find the strong correlations without any prejudice and simply let the data tell you what matters.

    Hope that helps!

Leave a Reply

Your email address will not be published. Required fields are marked *