Recently, a client received the initial results of an ad effectiveness study -- and they were not pretty. The campaign appeared to be flat-lining, with no lift on the relevant metric of purchase consideration for the brand in question. What went wrong? Actually, nothing. And the campaign was doing pretty well.
Let me explain.
The campaign was targeted to a B2B audience. But really, the client was focusing on a subgroup within that B2B target -- people who are involved in making vehicle purchase decisions for their companies. So when we looked, at first, at an aggregated view, the data seemed flat -- it included responses from all -- even respondents who do not have influence over vehicle decisions for their companies. However, the client ran a multi-question study and collected several hundred completes, so we were able to cut the data by respondents with purchase influence (we had asked the question), and the results changed dramatically.
In research, as in many other areas, it is impossible to have good, fast, and cheap at the same time. Compromises have to be made, and it is the research vendor's responsibility to educate the clients about the various trade-offs. If tainted (or improperly collected) data paints an unflattering portrait of the campaign or, worse, leads the media planner to the wrong conclusions; then, any short-term gain can become a much bigger long-term loss (no dollars; reputation impact.) Finding the right balance can be challenging, and I'd like to discuss three areas of this that come up a lot in our client conversations.
Number of completes
The size of the sample is often the first area to tackle, as the number of completes has a direct relationship to the additional media cost the client will incur. In order to get a good read on the results, researchers need to collect a sufficiently large sample. In classical sampling theory, the larger the sample, the smaller the "margin of error." Expressed another way, with a larger sample you can more confidently attribute a significant lift to the campaign and less to some undue bias that may occur by chance. But the incremental improvement on the margin of error diminishes as the sample size gets larger. For example, going from 100 to 200 completes will yield a margin of error improvement of almost 3 percentage points, dropping from 9.8 percent to 6.9 percent, while going from 200 to 300 improves precision by just 1.3 percentage points (from 6.9 percent to 5.7 percent.) To take this further, going from 750 to 1000 completes will net only about half a percentage point improvement in the margin of error.
On the other hand, there are valid economic reasons why the client cannot sacrifice infinite numbers of precious campaign impressions. Rather than being dogmatic about it, it falls on the research partner to explain the trade-offs between precision and cost. We generally suggest that our clients collect at least 100 completes per cell they want to evaluate. If they want analytics by site, that means 100 completes per site for control and for exposed. More is obviously better -- but once you get past 300, the additional cost, in most cases, outweighs the gain in precision.
Number of questions
There is a fine line between asking too many questions and not asking enough. Traditionally, researchers tend to err on the side of "more is better" (that is, collect it even if you might not use it.) This may be fine when you own a panel with high cooperation rates; but when collecting completes from online web visitors who can leave with a click of a mouse, this approach comes at the price of very low completion rates -- which, among other issues drives up the need for impressions and invitations and bumps up against publishers' needs to manage the user experience.
On the flip side, one-question polls often have higher response rates -- but they have no data granularity at all. In the example at the beginning of this article, we would never have had the chance to extract the story out of the data if we hadn't asked more than one question.
Here again, it falls on the cooperation between the researcher and the client to make hard choices about length and complexity to manage the balance of completion rates, impression needs and data granularity. Most of our studies have 4-6 questions -- but we do recognize that special cases warrant longer surveys.
Sample randomization and data collection
Sample quality is critical to obtaining actionable research results; but often the needs of the researcher and the realities of the media buyers and inventory owners are not perfectly aligned. There are two aspects to this: First, the sample for both control and exposed has to be collected in a randomized way; second, the sample needs to be collected within the true footprint of a campaign to reflect the composition of the audience that has an opportunity to see the campaign. The second step, in particular, is rather rare when it comes to measuring online ad effectiveness.
If, for example, the control group for a study is collected ROS, but the campaign utilized geo-targeting and behavioral-targeting (BT) data, comparing control and exposed is not a true measure of the impact of the campaign. Similarly, if the campaign runs in 300 x 250 slots below the fold, but survey invitations are appearing as site overlays to every nth visitor of the page, the "exposed" users may never have seen the ads, since they never scrolled down enough to be truly exposed.
On the media side, buyers and inventory owners will not be eager to part with a large number of scarce BT/ geo impressions to collect an unexposed control audience. So, it is important to explain that not running in the footprint may minimize the validity of the data and may not represent the true success of the campaign. We try to utilize most ad servers' ability to provide a truly random assignment and maintain users in control and exposed groups. In addition, we urge our clients to always run control and exposed surveys simultaneously, as well as in the footprint of the campaign and, whenever possible, as in-banner units. That way, the collected sample is a true random representation of people who have an opportunity to see the campaign.
For the study about commercial vehicles mentioned above, in the end, we were able to report to our client a significant lift in brand awareness and consideration among people with purchase influence between the control and exposed groups -- because we asked the right questions and collected sufficient, randomized sample to look at these segments. As a nice side effect, our results confirmed to the agency that they had bought some good, valuable targeting data. The number of purchase influencers in the sample proved to be quite high. When you dig into the data, there's no telling what you'll find -- as long as it's good data.
On Twitter? Follow iMedia Connection at @iMediaTweet.