This article is going to ask you to make a paradigm shift in how you think about identifying unique visitors. This article will describe new, cutting edge methodologies for identifying people, methodologies that -- at this point -- no web analytics product supports. We’re going to take a journey from first generation web analytics to second.
What is a paradigm shift?
First, a bit of background about the concept of a paradigm shift.
A paradigm shift is a sudden jump from one way of thinking to another. The concept comes from Thomas Kuhn’s 1962 book, “The Structure of Scientific Revolutions.” He wrote that science doesn’t gradually evolve a little at a time, but as a series of peaceful plateaus punctuated by violent upheavals. During these upheavals the conceptual world view (or paradigm) is replaced by a new one. Think Darwin, Einstein, Galileo. He showed that the intellectual violence of these upheavals was caused by people stubbornly trying to hang on to the old paradigm they were used to, even when it didn’t work any more. In science this generally means we move into a new paradigm when the scientists who grew up with the old one retire.
I hope to shift, fundamentally and gently, the way we think about metrics in this article.
The problem with users
We need to identify unique users on the web. It’s fundamental. We need to know how many people visit, what they read, for how long, how often they return, and at what frequency. These are the “atoms” of our metrics. Without this knowledge we really can’t do much.
If you look in detail at how metrics software works, you’ll see that it operates by negative logic. The raw data is page views and whom these were served to. The software is designed not to identify which set went to one person, but to identify sets which went to different people. It determines who the unique visitors were by looking at pages viewed at roughly the same time and deciding whether they went to the same person or not. In other words, to identify unique individuals, you have to identify different people.
We have two methods for identifying unique individuals. We can look at their IP address plus their User Agent on the basis that every unique combination constitutes a unique person. This is an audit-approved methodology. However, we know this is not reliable. We can’t guarantee a one-to-one relationship between IP and individual. ISPs often dynamically allocate a different IP address every time a user dials in. People will therefore have more than one IP address, and the same IP address can be applied to different people. Similarly, we know lots of people have the same user agent. So IP+User Agent is a rough estimate.
Cookies were supposed to increase certainty. We can plant a unique identifier on someone’s computer. If we see it there next time, we can be almost certain they are the same person. The problem has been that we’ve applied flawed logic to assume that the reverse holds true -- if we don’t see the same cookie we have assumed they are not the same person. In other words, we think these are two different people because they have different cookies.
News that users delete cookies blows this out of the water. How significant a problem this is depends on whose study you read. Estimates range from a high of 55 percent down to 30 percent.
In all cases the percentage is enough for it to matter.
This means that we have been over-estimating the number of unique visitors, underestimating frequency, and have no idea about LTV or any other metric based on understanding repeat visit cycles. We’ve been off by at least 30 percent, maybe more.
How do you feel about a 30 percent margin of error on your ROI?
It means all our methods for identifying users are unreliable.
Poland to the rescue
Soon after the news about cookies started to break, I was contacted by two researchers from Poland: Magdalena Urbanska and Thomas Urbanski. They believe they have a solution. For some time I was under a nondisclosure agreement because they hadn’t published their research, but all can now be revealed.
Their solution is to do away with a single method and use a hierarchy of steps to determine if we have a unique visitor.
Before I detail the steps, it’s time to take the paradigm shift. Here it is:
We have been assuming that we can use a single method to identify unique individuals. We have been looking for yes-no answers and absolute numbers. We have done all the analysis within the framework of a single software system. We can’t do this any more. No single test is perfectly reliable, so we have to apply multiple tests. Some of those tests yield yes-no answers, and some of them yield probabilities, so the count of unique visitors will be a probabilistic estimate. Some of the tests depend on knowledge of IP topology, so we can’t restrict our analysis to a confined block of data analyzed by an isolated system.
In a nut-shell: To determine a web metric we should apply multiple tests, not just count one thing.
The Magdalena and Thomas methodology
Each of these steps is applied in order:
- If the same cookie is present on multiple visits, it’s the same person.
- We next sort our visits by cookie ID and look at the cookie life spans. Different cookies that overlap in time are different users. In other words, one person can’t have two cookies at the same time.
- This leaves us with sets of cookie IDs that could belong to the same person because they occur at different times, so we now look at IP addresses.
- We know some IP addresses cannot be shared by one person. These are the ones that would require a person to move faster than possible. If we have one IP address in New York, then one in Tokyo 60 minutes later, we know it can’t be the same person because you can’t get from New York to Tokyo in one hour.
- This leaves us with those IP addresses that can’t be eliminated on the basis of geography. We now switch emphasis. Instead of looking for proof of difference, we now look for combinations which indicate it’s the same person. These are IP addresses we know to be owned by the same ISP or company.
- We can refine this test by going back over the IP address/Cookie combination. We can look at all the IP addresses that a cookie had. Do we see one of those addresses used on a new cookie? Do both cookies have the same User Agent? If we get the same pool of IP addresses showing up on multiple cookies over time, with the same User Agent, this probably indicates the same person.
- You can also throw Flash Shared Objects (FSO) into the mix. FSOs can’t replace cookies, but if someone does support FSO you can use FSOs to record cookie IDs. This way Flash can report to the system all the cookies a machine has held. In addition to identifying users, you can use this information to understand the cookie behavior of your flash users and extrapolate to the rest of your visitor population.
Magdalena and Thomas don’t apply the same weight to each test, and they tell me their analysis of IP topology uses some smart technology they’d rather keep to themselves. Some tests produce overlapping probability distributions rather than discrete groups.
While the logic is attractive, the question is whether it works or not. Magdalena and Thomas have run some preliminary tests on three large sites that indicate the number of unique visitors is really around half what existing metrics tell us. Both they and I are anxious to run more detailed tests to validate this methodology.
We need three months of data from a site doing two million page views or more.
We’ve approached some household names in metrics, online advertising delivery, and major search engines. The response has been zilch. No one wants to know.
My company’s system, InSite, doesn’t set dated cookies, so I don’t have the data we need. As a result of lack of interest from the industry, we are now working with Magdalena and Thomas to build some software and gather data in the correct format. Once the test is run I’ll publish the results here in iMedia Connection. It would be better for all of us if we could run more than one test, and the data came from many different sites. If you think you can help, I’d love to hear from you.
The problem with cookie deletion is not that it happens, but that we’ve been relying on a single method for identifying people.
We have to move to a world in which we identify unique visitors by a series of tests. These tests have to take into account the way the internet is built. The result will be a statistical estimate, not an absolute number. The degree of certainty we hold about unique visitors will vary -- some visitors will be identified with near certainty, some will be close to guesses. This means analysis should separate unique visitors according to the certainty we have about their identification, rather than treating them as a homogenous mass.
As a general principle I think this is the way forward -- in the long run many key metrics will morph from single numbers into ranges. We’ll derive those ranges through multiple tests instead of just a basic count. We’ll use different portions of these ranges for different forms of analysis. Web analytics really is a branch of statistics, not just a fancy form of counting.
NOTE: Research Paper Available: Magdalena Urbanska and Thomas Urbanski’s original research paper, complete with eight pages of mathematical proof, is available. This paper was delivered at the ARF/Esomar Conference on Research into Worldwide Audience Measurements, June 2005. Because of patent and copyright restrictions we cannot make it freely available for download over the web -- we need to identify the individuals who get it. However, if you would like a copy, please contact me and I'll arrange to have one sent to you.
Brandt Dainow is CEO of Think Metrics, creator of the InSite Web reporting system. Read full bio.
Some innovative campaigns
GSK wanted to introduce the first FDA-approved, over-the-counter weight-loss pill. imc2 launched an unbranded site -- QuestionEverything.com -- in the summer of 2006, which contained credible information and professionally moderated discussion aimed at targeted dieters. It emphasized the importance of committing to a weight-loss program, sans photos and claims. In six months, with minimal media coverage, the site attracted 700,000 unique visitors and 38,000 registrants for a still-to-be named product.
A few months later, FDA granted approval for the product, and last February, imc2 created three sites: myalli.com, mialli.com (Spanish site) and allihcp.com (for healthcare providers). The agency designed the first two sites for people interested in understanding how to use the new drug, called 'alli.' The healthcare providers gave practitioners the information and tools needed to offer patients an alternative to current over-the-counter weight-loss options. Content/tools on the consumer sites included a BMI calculator and a readiness quiz. A 'Commitment Letter' feature enabled people to share past diet experiences. More than 1.5 million unique visitors logged on during the first four months.
Client: Procter & Gamble
Last summer, Procter & Gamble wanted to generate more buzz about Secret deodorant and highlight the brand's 50th anniversary. The campaign asked women "are you strong enough to share your secret?" They were directed to a microsite designed by imc2, which generated more than 25,000 responses and doubled site traffic for the brand. Participants were encouraged, preferably anonymously, to spill their guts.
Average monthly user sessions from July through September were 119,000; page views were 767,000 and average time on the site was just over five minutes. The microsite, according to Procter & Gamble, was very successful at creating an emotional connection between the consumers and the brand, and allowed women a lot of freedom to share their secrets.
Campaign: "Security Unleashed"
Unisys wanted to heighten awareness of its security services to C-level executives at Fortune 500 companies. Unisys' print ad agency created personalized billboards and airport posters in areas each executive frequented, along with custom Fortune Magazine cover stories featuring each executive as the hero. In concert with the print campaign, imc2 created two digital elements:
- Personalized websites for each target featuring a 'news report' given by well-known news reporters highlighting the targeted industry's future success with Unisys, content about their industry's trends/business challenges and an overview of Unisys capabilities and success stories.
- SecurityUnleashed.com, for anyone else who saw the ads and wanted to learn more about the campaign.
The campaign encouraged people to think about Unisys as a strategic partner to enable secure business operations. Nearly two-thirds of targeted executives logged in to get the Security Unleashed story, and Unisys received feedback from them confirming the campaign has increased awareness.
Five imc2 principals have played an integral role in all this creativity and growth. Some snapshots:
President & Founder
Doug Levy honed his entrepreneurial skills while an undergrad at the University of Pennsylvania. He started up a discount textbook business called Campus Text, then created an online/resource tool for college students called Internet University, which attracted interest from ABC Sports. Levy developed the Monday Night Football website and, shortly thereafter, launched imc2.
Levy is a summa cum laude graduate of the Wharton School of Business at the University of Pennsylvania. He also earned a Bachelor of Science in entrepreneurial management and a Bachelor of Arts in urban policy from the university.
Senior Vice President
Marc Blumberg is a recognized expert in digital marketing and has played a key role in developing the company's service offerings. He has helped build imc2 from six to a staff of more than 450 people.
Blumberg has developed several of the company's proprietary interactive service offerings. Prior to joining the company, he was a strategy consultant for Gemini Consulting's MAC Group and for the New England Consulting Group. He received a Bachelor of Science in economics from the University of Pennsylvania's Wharton School of Business.
Senior Vice President, Managing Director, Dallas
Beth Kuykendall has created and implemented successful digital marketing programs for clients such as GlaxoSmithKline, Eli Lilly and Company, Nestle, Pizza Hut and Kellogg's. She has also played a leading role in establishing the company's relationship marketing practice.
Before imc2, Kuykendall was executive vice president of client results at Targetbase. She earned a Master of Science degree in statistics from Southern Methodist University and a Bachelor of Science in math from Stephen F. Austin State University.
Senior Vice President, Executive Creative Director
Alan Schulman oversees the development of interactive campaigns for numerous clients at imc2. MEDIA Magazine recently named him one of the 100 most influential people in advertising.
Prior to joining the company, Schulman was chief creative officer and co-founder of New York City-based Brand New World. He also served as senior vice president/creative director of Universal McCann Futures, the digital ad group of McCann North America. Schulman serves on the board of advisors for TiVo, board of governors of the National Academy of Television Arts & Sciences and is a member of the New York Marketing & Media Technologies Council.
Senior Vice President, Business Development
Ian Wolfman helps clients identify innovative ways to leverage the internet, achieve their business objectives and measure results.
Before coming to imc2, Wolfman worked in sports marketing and general advertising on accounts such as the NFL, Nokia, Corona and Bombardier. He also teaches a class at Southern Methodist University's Cox School of Business entitled, 'Leading Strategic Internet Marketing' for full-time and professional MBA programs.
Wolfman received his MBA from Southern Methodist University and a Bachelor of Science in corporate communications from the University of Texas.
Keeping on the cutting edge
imc2, noted Levy, is constantly creating and developing innovative digital tools to meet client needs and remain among the digital vanguard. Some of these have included:
ConversionMonitor: A proprietary methodology that helps determine if a company's site is effective at driving sales. When consumers visit the site, for example, the program helps determine if the site changes their perception of the brand, ultimately leading to an increased intent to purchase.
Media Marketplace: Delivers higher volumes of media inventory. A media-buying process, it features a secure online forum where qualified media properties compete anonymously by bidding up inventory amounts they'll provide for the fixed budget.
Vivilogue: A unique, proprietary multimedia education tool that leverages rich interactive content and multimedia to deliver cost-effective and visually relevant messages to key target audiences.
SiteInformant: Delivers an activity report via email each time a competitor's website changes, as well as summary weekly/monthly reports. Helps track when the competition changes product positioning, when new products are introduced and when prices are adjusted.
Dispatch: A suite of tools providing marketers with the ability to reach, capture and retain target audiences online through RSS-based marketing. It has an RSS publishing platform, branded desktop reader, measurement system and an eCRM dynamic feed system.
"We're in business to help advance relationships. It's as simple as that," said Levy.
Neal Leavitt is president of Fallbrook, CA-based Leavitt Communications, an international marketing communications company with affiliates in Brazil, France, Germany, Hong Kong, India and the United Kingdom. Read full bio.
That's no keyword; it's your target audience
One of the horrendously bad habits of some of your less experienced search engine marketers is to jump right to keyword research before determining the target audience of the product or service that is being sold. When performing audits of existing paid search accounts or websites for organic search issues, this mistake stands out like the one guy who decides to dress up as Jar Jar at a sci-fi convention.
For instance, it would be really easy to just slap the name of the product, its product category, and some other quick thoughts into a keyword research tool, then run off and start buying those terms simply because "people are searching for them."
The problem is, while people may in fact be searching for and even clicking on those keywords, it doesn't mean they're the kind of terms that bring in people who have a strong chance of actually purchasing the product.
Here's a great example of some additional research that we pulled together for one client:
And this leads to even deeper data like this:
From here, you can start making some smart decisions on your keywords based on the type of activities your target audience is actually interested in. This also allows you to add in some really smart negative keywords based on everything that they are not. These terms usually present themselves during that keyword research phase, but without the proper determination of the target audience, you could waste thousands of dollars of the client's limited budget on terms that never should have been considered in the first place.
Usually, when I hear a client start a conversation about search engine marketing with something like, "We're ranking well for these terms, but we're just not making any money!" the absence of a proper understanding of the target audience is usually the cause.
Your media mix shouldn't be based on what you think is cool
One of my favorite jokes from this year's White House Correspondents' Dinner came from Joel McHale when he said, "Thanks to Obamacare…millions of newly insured young Americans can visit the doctor's office and see what a print magazine actually looks like."
While you can't help but laugh, it also reminds you that print advertising, something that was once a keystone of most media plans, is now a mere shadow of itself. The sadder thing is that this lack of use as an advertising medium has very little to do with readership and everything to do with a rush to use the new, shiny object in the room -- digital media. Although it's hard to argue its reach now, even in its infancy, before it had the reach it does today, digital media started stealing ad dollars from print and other "traditional media" without so much as a hint of research to discover if a product's target audience was even there.
This is a failing of an aspect of media planning called media mix or media usage analysis. This old-school marketing activity basically uses a collection of different data sources to determine, first and foremost, the types of media, including the internet, that your target uses on a regular basis.
For instance, here's some data that was part of a recent pitch that demonstrates this particular client's target audience is a heavy user of not only the internet (woo!), but also magazines, outdoor, and radio.
Without this type of research, the client may have jumped right into doing TV or newspapers, where their media buying activities would be the least cost-efficient.
Once you have this data in hand, media planners would usually work with other data sources to determine specific publications or websites. For instance, you may utilize comScore to determine a list of websites and programmatic ad networks that make for great candidates to become a part of the final media plan. Additionally, you can work with Google to determine other specific media usage habits, such as their propensity for mobile and tablet usage.
After you have all this data in hand, you can finally start the process of determining how much of your budget should be allocated to specific media tactics like paid search, display, mobile, and so on.
Don't have access to MRI, comScore, and a direct line to Google data? Trust me, with enough research, you can find out plenty of information about your target audience via sites like MarketingCharts.com, Compete, and countless others. But let me assure you, having the good data close by makes those arguments that the entire campaign should be TV-based a lot shorter.
So, the next time your boss/client bursts in and says they want to "own mobile," remind them that you're not even sure if you target is using the internet, much less doing so on mobile devices.
Your public relations team makes the best link builders
I have a confession to make: While my agency does a lot of work in the area of search engine optimization (SEO), I really don't believe that SEO as a profession should have ever come into existence. And while this belief doesn't stop me from taking on new SEO clients each year, it has influenced the way that we do business.
To me, SEO is really a collection of marketing and site development best practices that any self-respecting website owner should be doing anyway, even if Google and the other search engines didn't exist. Basically, with a rare exception (that I won't go into here), everything that has been claimed as an SEO activity these days was probably much better off in the hands of an assortment of other functions within an organization.
One of these functions in particular is "link building," the practice of creating links back to a website purely in the name of improving organic search ranking for specific keywords. This is one of those aspects of Google's page ranking algorithm that I really wish they had kept to themselves because, as with every other aspect that has been revealed, the first thought that seems to cross the SEO's mind is, "How can I exploit this newfound knowledge?" -- rather than realizing that a lack of links to a website may be the cause for poor organic search performance.
Over the past few years, Google has made considerable efforts to update its algorithm to catch those that have wielded link building as an unfair weapon in the battle for expanded reach in organic search. With each update, you can hear the cries of some SEOs who somehow felt they were being singled out as some sort of monster for doing what is basically cheating. As a response, Matt Cutts and many other Google representatives have done their best to explain what a more naturally occurring link to a website might look like; however, most SEOs are still at a loss as to how these links could actually come about.
Those of us who have even the most basic knowledge of the old-school marketing activity called public relations recognized this sage advice almost immediately -- it's called "getting coverage," and PR professionals have been doing it for years.
While SEOs have been sliding links into comments on various websites, using deep crawl data to discover old links that lead to bad links, and creating networks of thin content purely for the sake of linking to deep content, public relations professionals have been building relationships, working the phones, and working with members of the press to get the word out about their clients' products and services. And in this day and age of digital media, that coverage usually has a really great link back to the website -- and Google and the other search engines eat it up like digital candy to a digital baby.
We don't need more "tips and tricks"
If you search the web for advice on how to improve the amount of traffic to your website or how to better promote your business, you will no doubt come across blog post after blog post pushing a myriad number of tips, tricks, hacks, or other tom foolery. You know, stuff that "the big websites don't want you to know." Basically, you're being sold "marketing best practices" the same way that shysters sell snake oil on late night television.
What we need now is more marketing professionals who use digital media as one of their tools instead of button pushers who don't understand how the button works -- or why they're even pushing it in the first place.
So the next time your boss or client asks, "How can we turn things up a notch here?" tell them you want to go old school and actually do some real marketing.
On Twitter? Follow iMedia Connection at @iMediaTweet.