Why data leakage is hurting our industry

Conversations with CROs and high-level VPs at publishers tend to go like this:

Me: So you're taking insertion orders from agencies with their own DSPs...

CRO: Yeah...

Me: And they're buying targeted inventory from you, based on data you've collected from your users...

CRO: Yeah...

Me: So they can go ahead and retarget those users and reach them more cost-effectively in other environments, then.

CRO: Right again.

Me: So why are you taking insertion orders from them?

CRO: Umm...

It's still questionable as to who "owns" the data that's used to target ads. But we can't continue to dance around that question. Publishers make investments in content and tools that result in site-based user profiles that make a tremendous difference with respect to an advertiser's ability to reach the right target audience. In many cases, these data points are simply leaking away to agencies, networks, and DSPs, keeping the publisher from adequately monetizing them and hurting their businesses in the long term.

Stay informed. For more insights into the latest digital marketing opportunities and challenges, attend the iMedia Agency Summit, Dec. 12-15. Request your invitation today.

The elephant in the room is that every call to a server that doesn't belong to the publisher, whether it's for an ad, a piece of content, or a 1x1 tracking pixel, is an opportunity for data to leak to an external party.

This is bad for the publisher. That much has been acknowledged. The IAB and the AAAA acknowledged it in their update of Terms and Conditions for media buys:

"Unless authorized by Media Company, Advertiser will not: (A) use Collected data for Repurposing... " sits on Page 10 of the IAB/AAAA 3.0 Terms and Conditions. The definition of "Repurposing" is a bit further up the page:

"'Repurposing' means retargeting a user or appending data to a non-public profile regarding a user for purposes other than the performance of the IO."

So what we have is an agreement on the part of the advertiser to avoid gathering data for these purposes. Here's the problem: The publisher has no way to reliably verify that data points gleaned from buys on their sites aren't being used for retargeting or data-gathering purposes. With the right set of tools, an agency or an advertiser could easily purchase media weight against those profiles on other sites (or even on the original site, if that site allows exchanges to auction off some of their ad inventory). The publisher would likely never know. Even if the publisher somehow found out, the deep forensic analysis required to prove the instance of data leakage is cost-prohibitive in most cases.

Additionally, biting the hand that feeds them isn't a very effective business strategy for most publishers. Calling an agency or advertiser out on bad data practices isn't very productive, especially when it's very difficult to prove.

This handshake agreement between publisher and advertiser (or agency) is ripe for abuse. Simply put, it's not realistic to expect that the bad eggs in the industry will show some restraint in a marketplace where data leakage is commonplace, and where proving that a company collected it and used it when they shouldn't have is exceptionally tough.

Regrettably, the systems that allow for this sort of data leakage are now a part of the infrastructure that allows publishers and agencies to do their jobs efficiently. The obvious solution would be to eliminate external server calls. That would throw the baby out with the bathwater, though, and result in a world where white hat agencies couldn't use tools like third-party ad servers. No third-party ad serving means no consolidated reporting, no creative optimization, and no control over creative rotation from the advertiser side, not to mention no closed-loop reporting for direct response-centric advertisers. Eliminating third-party serving would also mean significant increases in the labor necessary to launch and maintain campaigns, on both the publisher and agency side. We're too far down that path to go back now.

So what's the solution then? Do we build an auditing entity that certifies networks, DSPs, and agency tools in order to prevent the data leakage? Should more publishers follow the leading examples set by Google, Facebook, and a few other select sites and ban third-party serving?

Whatever the solution, we need it very, very soon. Not only do we have the long-term viability of our industry's content publishers to think about, but we also have the Federal Government to think about in the U.S. If we think this data leakage issue will simply go away if we continue to ignore it, we deserve any heavy-handed response we get from the government in order to protect consumers from having their information shared haphazardly.

Tom Hespos is the chairman and president of Underscore Marketing

Follow Tom on Twitter at @_MarketingLLC. Follow iMedia Connection at @imediatweet.

 

Comments

John Dietz
John Dietz November 11, 2010 at 11:46 AM

Tom, your point about identifying how the data is used is a very fair question. We monitor outbound data, crack cookies (including Flash cookies), and evaluate privacy policies of the vendors to try to determine what they are doing with the data, but the truth is that most companies are at least dropping unique ID cookies, and often for very reasonable purposes (counting uniques, tracking conversions).
Once a company has that information, they have the ability to retarget that user, so our approach has been on transparency and providing deeper knowledge about who these vendors are and what they do so a publisher can effectively talk with their advertisers about what is being collected and how it could be used.
We've found cases of a single ad tag that made calls out to 10 different vendors and dropped 20 cookies, and it becomes evident pretty quickly how some of that data could impact a publisher from a data perspective, not to mention page latency and the site's own privacy policies.

Tom Hespos
Tom Hespos November 11, 2010 at 11:22 AM

Thanks, George and John.

What I'm most concerned about is that once an advertiser or agency observes something about someone they advertise to, finding out how they utilize that data on a go-forward basis is pretty difficult. What I'd like to know about the PubMatic product and the Adometry service is how they distinguish between data-gathering that breaches the IAB/AAAA Ts & Cs and legitimate closed-loop reporting and adserving. Without insight into the advertiser/agency database and the tools running on top of it, I can't see how to accomplish that from a technological perspective without auditing.

John - "Data Leakage" might make you cringe, but I'm going with it for now. "Data Poaching" would imply much more strongly that the data belongs exclusively to the publisher. While I'm on board with the notion that these data points are critical to the publisher's business, I'm not yet personally decided as to whether or not the data belong exclusively to the publisher.

John Dietz
John Dietz November 11, 2010 at 11:13 AM

Thanks Tom, this is a great article. I have two points:
1. It is possible for publishers to do some auditing, we (Adometry) just announced a service this week to help publishers monitor data collection as well as ad latency, errors, cookies, etc. We've identified and classified over 250 companies involved in ad serving and data collection, and expose that data with our service.
2. With an audience of marketers, can we come up with a better term for this than "Data Leakage". It doesn't seem to really describe what's happening, and it makes some of us cringe. How about "Data Poaching" since that's what happens when a third party tracker is collecting that data from a publisher.

John Dietz - http://www.adometry.com/

George Simpson
George Simpson November 11, 2010 at 11:04 AM

There are tools to help with this problem:

PubMatic Launches Data Firewall for Publishers
To Prevent Audience Data Leakage

First Product of its Kind to Combat Billion-Dollar Problem

Palo Alto, CA (September 28, 2010) PubMatic, which provides online publishers — including the majority of the comScore Top 10 — with the technology and services to significantly increase revenue and better manage their advertising inventory, today launched its Data Firewall product for publishers. For the first time, publishers will get visibility into 3rd party companies that are tracking the publisher's audience via pixels or cookies and causing data leakage.

Data leakage occurs when 3rd parties track a publisher's audience without the knowledge of the publisher, a problem that has grown considerably in the past year. PubMatic estimates that data leakage is costing publishers up to $ 1 billion annually in lost revenue.

PubMatic's Data Firewall is unique in that it is the first product in the market that gives publishers transparency into data leakage and at the same time allows them to see which 3rd parties are producing the highest ad revenue for the publishers. This will allow publishers to evaluate the value of those 3rd party relationships.
The use of publisher data by 3rd parties in order to increase the value of the media they are selling independently of the publisher is a fast-growing practice and can cost the publisher significant revenue losses. Data Firewall will help publishers decide how much data collection from 3rd parties is too much.

"Data Firewall is another impressive step toward giving publishers more control over their advertising inventory," says Jared Friedman, CTO at Scribd, a publisher partner of PubMatic. "With Data Firewall we can take our industry leading stance on consumer privacy to the next level by ensuring our users' privacy is protected while continuing to allow many advertisers to run ads on our site."

"Privacy is not only important to consumers, but it is also of paramount importance to publishers,” says Rajeev Goel, Co-Founder and CEO of PubMatic. "Data Firewall is the world's first publisher-centric tool that provides publishers with the ability to clearly see who is dropping pixels on their websites, and take appropriate action.”
As part of the Premier platform, PubMatic's large base of premium publishers will be able to leverage Data Firewall through PubMatic's consolidated dashboard.

PubMatic's (http://www.PubMatic.com) ad management technology combines an impression-level ad auction, the most comprehensive brand protection tools, and enterprise ad operations support to give the Web's top publishers the most control over their revenue and brand. Some of the world's most respected online publishers have chosen to work with PubMatic, including The Huffington Post, eBay, United Online, TV Guide, and the majority of the comScore Top 10.

PubMatic is privately held, backed by funding from Draper Fisher Jurvetson, Nexus Venture Partners, and Helion Ventures, and has seven offices around the world in the US, Europe, and Asia.