Publisher-side ad servers such as DoubleClick for Publishers, Open AdStream, FreeWheel, and others are the most critical components of the ad industry. They're responsible ultimately for coordination of all the revenue collected by the publisher, and they do an amazing amount of work.
Many people in the industry -- especially on the business side of the industry -- look at their ad server as mission critical, sort of in the way they look at the electricity provided by their power utility. Critical -- but only in that it delivers ads. To ad operations or salespeople, the ad server is most often associated with how they use the user interface -- really the workflow they interact with directly. But this is an oversight on their part.
The way that the ad server operates under the surface is actually something everyone in the industry should understand. Only by understanding some of the details of how these systems function can good business decisions be made.
Ad servers by nature make use of several real-time systems, the most critical being ad delivery. But ad delivery is not a name that adequately describes what those systems do. An ad delivery system is really a decision engine. It reviews an ad impression in the exact moment that it is created (by a user visiting a page), reviews all the information about that impression, and makes the decision about which ad it should deliver. But the real question is this: How does that decision get made?
An impression could be thought of as a molecule made up of atoms. Each atom is an attribute that describes something about that impression. These atomic attributes can be simple media attributes, such as the page location that the ad is imbedded into, the category of content that the page sits within, or the dimensions of the creative. They can be audience attributes such as demographic information taken from the user's registration data or a third-party data company. They can be complex audience segments provided by a DMP such as "soccer mom" -- which is in itself almost a molecular object made up of the attributes of female, parent, children in sports -- and of course various other demographic and psychographic atomic attributes.
When taken all together, these attributes define all the possible interpretations of that impression. The delivery engine now must decide (all within a few milliseconds) how to allocate that impression against available line items. This real-time inventory allocation issue is the most critical moment in the life of an impression. Most people in our industry have no understanding of what happens in that moment, which has led to many uninformed business, partnership, and vendor licensing decisions over the years, especially when it comes to operations, inventory management, and yield.
Real-time inventory allocation decides which line items will be matched against an impression. The way these decisions get made reflects the relative importance placed on them by the engineers who wrote the allocation rules. These, of course, are informed by business people who are responsible for yield and revenue, but the reality is that the tuning of allocation against a specific publisher's needs is not possible in a large shared system. So the rules get tuned as best they can to match the overarching case that most customers face.
Well before the impression is generated and has to be allocated out to the impressions in real-time, inventory was sold in advance based on predictions of how much volume would exist in the future. We call these predicted impressions "avails" (for "available to sell") in our industry, and they're essentially the basis for how all guaranteed impressions are sold.
We'll get back to the real-time allocation in a moment, but first let's talk a bit about avails. The avails calculation done by another component of the ad server, responsible for inventory prediction, is one of the hardest computer science problems facing the industry today. Predicting how much inventory will exist is hard -- and extremely complicated.
Imagine if you will that you've been asked to predict a different kind of problem than ad serving -- perhaps traffic patterns on a state highway system. As you might imagine, predicting how many cars will be on the entire highway next month is probably not very hard to do with a pretty high degree of accuracy. There's historical data going back years of time, month by month. So you could take a look at the month of April for the last five years, see if there's any significant variance, and use a bit of somewhat sophisticated math to determine a confidence interval for how many cars will be on the highway in the month of April 2013.
But imagine that you now wanted to zoom into a specific location -- let's say the Golden Gate Bridge. And you wanted to break that prediction down further, let's say Wednesday, April 3. And let's say that we wanted to predict not only how many cars would be on the bridge that day, but how many cars with only one passenger. And further, we wanted to know how many of those cars were red and driven by women. And of those red, female-driven cars, how many of them are convertible sports cars? Between 2 and 3 p.m.
Even if you could get some kind of idea how many matches you've had in the past, predicting at this level of granularity is very hard. Never mind that there are many outside factors that could affect this; there are short-term issues that could help get more accurate as you get closer in time to the event such as weather and sporting events, and there are much more unpredictable events such as car accidents, earthquakes, etc.
This is essentially the same kind of prediction problem as the avails prediction problem that we face in the online advertising industry. Each time we layer on one bit of data (some defining attribute) onto our inventory definition, we make it harder and harder to predict with any accuracy how many of those impressions will exist. And because we've signed up for a guarantee that this inventory will exist, the engineers creating the algorithms that predict how much inventory will exist need to be very conservative on their estimates.
When an ad campaign is booked by an account manager at the publisher, they "pull avails" based on their read of the RFP and media plan and try to find matching inventory. These avails are then reserved in the system (the system puts a hold on avails that are being sent back to the buyer based for a period of time) until the insertion order (I/O) is signed by the buyer. At this moment, a preliminary allocation of predicted avails (impressions that don't exist yet) is made by a reservation system, which divvies out the avails among the various I/Os. This is another kind of allocation that the ad server does in advance of the campaign actually running live, and it has as much (or even more) impact as the real-time allocation does on overall yield.