I'll come right out and say it: Many demand-side platforms (DSPs) and ad networks that claim to use "sophisticated algorithmic optimization" to determine the value of impressions actually do no such thing. Until now, they've been able to throw around these fancy words in their marketing presentations, while masquerading behind the opaque veil of so-called "black-box IP," when in fact the box was empty. Hopefully, that's about to change.
To be clear, what I mean by "sophisticated algorithmic optimization" is a robust machine-learning modeling approach that makes a prediction of the value of each impression based on myriad media and user variables associated with that impression. "Always bid $X if an impression meets certain targeting criteria" is not an algorithm, and neither is a human exporting data to an Excel spreadsheet to sort performance by a few variables.
It's easy to understand that bidding the right price for a real-time bidding (RTB) impression is of critical importance. Bid too high, and you risk paying too much, failing to meet the advertiser's performance goals. Bid too low, and you risk not winning the inventory at all, under-pacing against the advertiser's spending goals. But it may be less obvious why it's so important to implement an algorithmic machine-learning approach. Consider the following simple formula:
CPM bid = Goal Value x Action Rate x 1,000
It says the cost per thousand (CPM) you should bid to achieve the desired campaign goal is equal to the advertiser's stated goal value multiplied by the predicted "action rate." If the advertiser goal is oriented around engagement, such as a cost-per-click (CPC), then the action rate is a click-through rate; if the advertiser goal is direct-response oriented, such as a cost-per-action (CPA), then the action rate is a response rate. The objective of a machine-learning algorithm is to accurately predict the action rate for every impression, based on all the associated data. But here's the challenge:
- The data is enormous -- There are now tens of billions of RTB impressions available daily. Each one of those carries dozens, if not hundreds, of non-personally identifiable data points describing both the media (e.g., publisher, content category, ad size, time of day, day of week, etc.) and the user (including first-party advertiser data, third-party data from various data providers, and non-cookie-based IP data). Moreover, each of those variables can take on many different values. For example, "designated market area (DMA)" and "site" are two variables that come on each impression -- there are hundreds of possible DMAs and hundreds of thousands of possible sites. On top of that is the complication that those variables aren't all the same across all the exchanges (e.g., one exchange may pass content category data for its sites, while another may not). Only an algorithmic approach can digest the sheer amount of data in question and properly "normalize" data across different RTB sources.
- The data is dynamic -- If you've ever purchased RTB inventory to achieve a campaign objective, you know (frustratingly so) that the same things that work one day don't necessarily work the next. And that's not just due to fluctuating consumer behaviors, week-in/week-out variations in product sales, and factors like creative burnout. It's also due to the highly variable interaction of supply and demand on the exchanges. For example, RTB supply spontaneously moves into and out of the exchanges inversely with publisher guaranteed sales. And don't forget about competing demand: The same impressions may be valued very differently by a tax advertiser on April 10 vs. May 10. The variation in each advertiser's demand has an effect on all bidders on the exchanges. This ever-changing environment means you don't just need to come up with an answer once, but rather every day, for every advertiser, and for every creative that advertiser may be running. That's something only an algorithmic approach can do at this scale.
- The data is non-linear -- In plain English, that means you can't just look at individual variables in isolation; you need to look at how they interact. For example, one might conclude from a simple spreadsheet analysis that "Mondays are good for this campaign" in general, when in fact, Mondays are actually bad for that campaign on certain publishers. But those publishers do actually perform well on Mondays from 1-3 p.m. But not in certain sizes. And so on. In fact, failing to account for interactions between variables is often why buyers find it hard to reproduce results. They're just looking at a one-dimensional view of the data and failing to glean the critical relationships between variables. No human, no matter how bright or how much time they have, can begin to scratch the mathematical complexity of all these inter-relationships.
- The data is real-time -- The required round trip time between an RTB bid request going out and a bid response being received is typically about 0.05 to 0.1 seconds. If you don't respond within that interval, you "time out" and lose the impression. So needless to say, the algorithm in question has to be fast, which has implications for both the modeling and coding approach. That in turn speaks to the underlying bidding infrastructure on which the algorithm sits -- the queries per second (QPS) that the DSP technology can support and the timeout rate that measures how often it fails to provide a response within the necessary window. So it's not just the algorithm that has to be fast, but the entire system within which it resides.
The above challenges make building an algorithm for RTB a uniquely complex problem. In fact, I would argue there are no "off the shelf" techniques well-suited to the particular needs of RTB. Not only do you need a custom algorithm that can handle billions of daily impressions, with possibly hundreds of variables per impression, and thousands of values per variable, but it will also need to normalize across disparate data sources, dynamically adjust to new, rapidly varying data on a daily basis, and account for nonlinear relationships between variables. And above all that, your RTB algorithm needs to provide transparent insight into what it's doing.
An algorithm can generate unprecedented performance to the delight of the buyer, but sooner or later, they will rightfully demand to know why. Which variables are the most important and least important? How does that translate into bid prices for different types of impressions? The days of opaque ad networks are gone. Performance is necessary, but not sufficient. Because today's buyers -- whether advertisers, trading desks, or operating agencies -- aren't just buying performance, but the ability to understand and replicate that performance, to transfer media and audience learnings across channels, and upstream into campaign strategy. When these savvy buyers use a DSP, they know they have the right to understand exactly how the platform values each impression, and why. Sure, the claimed algorithm itself is proprietary IP, but the output is not. In fact, that information -- the algorithmic determination of how much each impression is worth -- is really the property of the buyer. They've paid for it!
So what do you do if you are one of the many DSPs who don't actually have such an algorithm? You claim you do, and you fake it. How? If you talk to the leading RTB exchanges, it seems most DSPs are doing one of two things: Either they try to place flat bids for specific cookie-defined audiences in the hopes that these audiences will meet the advertisers' goals, or they blast out low bids for impressions in the hopes that they can buy enough cheap tonnage to make it work. But neither approach works.
In the case of pure audience buying, we all know first-party data (i.e., retargeting) works great, but scale is extremely limited, and further exacerbated by cookie deletion. If you layer on third-party audience buying, you still face scale issues due to the limited size of most cookie pools, but also the fact that third-party data segments often just don't perform as needed, especially when factoring in the cost of the data. Moreover, even if you do find audiences that work, you still have little or no guidance on what price to bid based on all the other variables that come into play. Ironically, by narrowly defining the buy through the lens of cookie pools available on only a fraction of impressions, audience buying foregoes using the far more predictive non-cookie-based variables available on 100 percent of impressions. Not to mention attracting the eye of Capitol Hill.
In the case of spraying low bids in the hopes of winning lots of cheap impressions, well, you get what you pay for. Remember that formula on the previous page for the CPM bid? Well, if you use a machine-learning algorithm you know that you are looking for impressions that have a high predicted action rate. For a given goal value, that means they translate into a high CPM, not a low one. In other words, impressions that get high bids are the ones that work. Impressions that get low bids (because they have low action rates) are the ones that don't. So if all you're doing is bidding low, you're the proverbial blind squirrel. You may get lucky and find a nut, but more likely your campaign performance will just suck, because you don't actually know what you should be buying and how much to be paying for it.
So, what do users of these DSP platforms, without machine-learning algorithms, have to do? They have to set up lots and lots of "line items" for every campaign: Try buying these 10 audiences and varying the bid amount for each. Try these five combinations of targeting variables and then test 10 others. They have to manage and evaluate all these line items every day and keep making changes. It's complex, labor-intensive, and a waste of precious time and resources. They're spending their time doing the tedious work that an algorithm should do (though not as efficiently or effectively), which means less time spent thinking about campaign strategy and bringing new ideas to the client. Campaign managers using these DSPs typically tend to be able to run four to five campaigns at a time, often failing to be able to scale effectively and hit performance goals.
What about users of DSPs that employ machine-learning algorithms? They can simply set up a one or a few line items per campaign, enter the advertiser goal, and have the algorithm automatically learn how much to bid for each impression based on performance. If the DSP is really worth its salt, it actually provides reporting that reveals exactly what is being bid for each impression and the variables driving that, empowering the user to know what's working and what isn't and take those strategic insights back to the client. Campaign managers using these DSPs tend to be able to run 25-35 campaigns at a time, while outscaling and outperforming other media partners.
Granted, not all algorithms are equal -- far from it. There's a tremendous amount of difference in performance even between the few DSPs that actually have machine-learning algorithms, in part because not all algorithms are created equal, and in part because knowing what to bid isn't really the end of the algorithm story. Should the algorithm always submit the highest-bidding campaign or are there good reasons to make exceptions? What if the impression can be won at well below the CPM output by the model? Should you lower the bid, and if so, by how much? These and other considerations add important layers to the algorithmic sophistication that can dramatically impact performance. Head-to-head testing is a great way to see those differences first hand. But the truth is there's an even more basic distinction between the DSPs that use machine-learning algorithms at all, and the many that don't.
If you're not sure which kind of DSP you're using, try this: If you have any PhDs, statisticians, or modelers on staff, stick them in a room with the DSPs' "algorithm guys" and make them explain how it works, in detail. Even better, since talk is cheap, make them show you the output. How does each individual bid vary and what specific data factors drive that? Can they provide a daily feed of that output? Why not?
In short, the next time a DSP or anyone else says they use "sophisticated optimization algorithms," tell them to prove it. You'll get one of two responses: They will either hide behind black-box IP claims, weird metaphors, or other excuses, and be unable to actually justify their claim.
Or, they will just show you the data.
On Twitter? Follow iMediaConnection at @iMediaTweet.