Digital Publishers and Advertisers that have access to a Data Management Platform (DMP) can bootstrap their own data modeling, or lookalike model capabilities with some simple index-based approaches. That is to say, if you can understand both the total population of users for every segment and for any specific segment, how many users of every other segment overlap in that target segment, you can build a fast and easily understood audience model with a little legwork. It’s not the rocket science approach of a regression model or black box algorithm, but it works, and it’s pretty easy for people without a degree in data science to execute once you figure out how to get the right data out of your system.
How to Do Lookalike Modeling Yourself
The first step to building a lookalike segment is to first define what you are trying to model, that is, what audience you want want more of. This will be your ‘target’ – for our example here, let’s consider the following audiences:
|Segment||Qualified Users||% of Total|
Let’s say we’re trying to reach females. Unfortunately, we only have 20,000 we can identify, out of a total population of 100,000. Now let’s assume that our content isn’t skewed to one gender or another, and therefore there’s clearly some users in the 80,000 other users that we can expect would be female. But we need to find a signal within that group that directs us to which other audiences are likely to be female.
What we need to do then is compare every other audience to our female audience, and figure out how many users of each of our other segments overlap with our female segment. To do that, we need to pull another table of data – let’s add a few more audiences while we’re at it.
|Test Segment||Total Users in Test Segment||Overlap (Number of Females in Test Segment)|
Now, since every audience has a different total population, and every overlap of one audience to another is also different, we need a way to compare one overlap to another. For example, just because there are a greater number of men over 6 feet tall in China than in Norway doesn’t mean Chinese men are more likely to be over 6 feet tall that Norwegians – to know for sure, you need to know the total population of each country and figure out if men are more likely to be over 6 feet tall in China or Norway relative to their population. And that’s exactly what we need to do when building our lookalike segment, we need to determine if one audience is more or less likely to be female relative to its population.
To do that, we need to divide the overlap of each test segment audience (pet owners, coffee drinkers, etc.) to target segment audience by the population of the target segment audience (females), so that we can compare that to the target segment audience overlap in the overall population. So, with some simple division, we divide the overlap figures from the table above into the total population of females, and get the following:
|Segment||Total Users in Test Segment||Overlap||Total Females||Concentration of Test Segment in Female Segment|
Finally, if we divide the relative concentration of females in each test segment to the concentration of each test segment in the total population, we can create an index, or a comparison of one relative figure to another. All we need to do this is multiply each comparison by 100, which is our benchmark. Any audience with an index greater than 100 tells us the test segment is more likely to contain female users that the general population, and any audience with an index less than 100 tells us the test segment is less likely to contain female users than the general population.
|Test Segment||Total Users in Test Segment||Overlap||Concentration of Test Segment in Total Population||Concentration of Test Segment in Female Segment||Relative Concentration of Test Segment in Female Segment (Index)|
So now with the data above, if you wanted to model an audience to find those who are likely to be women, but not necessarily known to be women, you could build a segment of pet owners or sports fans, neither of which is a coffee drinker, and know they were more likely than not to be women using the data below. In boolean logic is would be (pet owners OR sports fans) NOT coffee drinkers. After you create the new compound audience, you can see how it ends up indexing to your total once the overlapping users are de-duplicated into a single segment, and then refine as necessary.
You Can Model Clickers and Converters, Too
The technique above is especially useful for finding ways to optimize campaigns that are focused on a click or online conversion metric – you simply track the campaign clickers or converters with a new audience in your DMP, and then index all audiences in your platform against their overlap in the clicking or converting audience. You could, for example, start running every performance based campaign in ROS to expose every audience to the campaign, and then after a short period of time figure out which audiences are responding more favorably and reliably to the campaign goal.
In an ideal world you have lots of audiences you can overlap against a target; hundreds or even thousands. You could then index all of them against your target, sort them by the index, and then optimize your campaign targeting into the top choices. Which segments you pick, the highest indexing or the largest scale (there will rarely ever be an option that is both large and high quality), depending on your goals for the campaign, budget, etc. You can also exclude the lowest indexing audiences as a technique, and reduce your distribution against lower performing audienciences.
The risk to this technique is that the amount of overlapping users is so small that you lack enough of a sample to reach a statistically significant index. In other words you don’t have enough data to trust the lookalike. To precisely calculate this, you’d need to employ a statistician, however my rule of thumb has been to rely on standard sample size tables that clearly define how many users you need to sample from a given population for the result to meet a particular confidence level. You can easily build this check into Excel to compare your overlapping users in the test segment (pet owners in our case) to the target segment (women).
As you can see though, in a population of almost any size, a mere 400 users is all you need for a representative sample to meet a 95% confidence level with a ± 5% margin of error. You can use this same check on creating general lookalike audiences, but it tends to be more relevant when working with very small target segments, like users who had to take a particular action. Of course, this isn’t the most sophisticated audience modeling method out there, far from it; but for Ad Ops teams who need to play fast and loose with campaign optimization, it’s a place to start, and a great way to get more out of your investment in a DMP.