Things That Throw Your Stats (Part 1)

Web analytics is growing more sophisticated. We’re developing methods for understanding customers, predicting trends and assessing ROI. Every month analytics gurus amaze you with the latest revelations to sharpen your focus and tune your spend. What no one is telling you is that all these systems and numbers are based on inaccurate numbers. The god of web analytics has feet of clay -- 100 percent accuracy is impossible.

Web analysis is based on counting a very limited number of things. People visit websites and read pages. Therefore we can count people, visits and page views. That’s all. Financial details are linked to these things, not inherent within them. If I buy PPC from Google, Google is charging me for visits it sends. In other words, it’s just counting visits. If all we can measure is people, visits and page views, it’s important to understand how accurate we can be about them. The bad news is that we can’t assess any of these with perfect accuracy.

This article is the first of a two-part series exploring the errors in all web analytics. In this first part, I’ll discuss the unavoidable inaccuracies that are caused by the nature of internet technology itself. In the second part, I’ll discuss problems that result from user behavior and the current state of web analytics software.

We Can’t Count Visitors

It’s not possible to count people on the web. They don’t exist. People don’t visit web sites. Their computers do. The exact process is that a browser requests a copy of a page be sent to it from a server. The browser reads that page and uses it to display something on screen. People aren’t even reading your site’s pages. They’re reading what their browser did to copies of those pages. Ask any designer how consistent that process is, then duck.

What few standards there are for web metrics have been laid down by JICWEBS. This is an international body composed of the audit standards bureaus for most countries, including the USA and all European countries. The JICWEBS standard for identifying a unique visitor is that it is the combination of the User Agent and IP address. The User Agent identifies the browser and operating system.

For example, mine is “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322).” MSIE 6.0 tells me it’s Internet Explorer 6.0. Windows NT 5.1 tells me I’m running Windows XP. Many people are running IE 6 on XP, so that information alone is not enough to uniquely identify me. I also have an IP address, my internet address. By combining the full User Agent with the IP address, in theory, I am uniquely identified. This is far from accurate.

Every single person inside the Ford corporation has the same IP address. They all go onto the web from the same gateway in Chicago (even the 88,000 in Europe). Corporations hide internal IP addresses for valid security reasons. Most people in Ford have the same browser and operating system (what Ford calls the Global Client). Thus, according to the official standards, more than 320,000 people are the same unique visitor. This will hold true for any corporation with shared internet access and a common standard for their workstations.

On the reverse side, most home users or small businesses will be given a different IP address every time they connect to the web. This means they’ll look like a different person every visit. This is OK for unique visitors, but means you’re under-counting repeat visitors.

Cookies can improve this, and JICWEBS allows (but does not require) the use of cookies to identify unique visitors. However, people block or remove cookies. This is not just limited to dated, or stored, cookies that last between visits. We’ve done detailed research into this and estimate between three percent and five percent of all visitors are blocking session-only cookies as well. The percentage is highest among Unix users, and lowest among Mac users. In other words, the more techie the visitor, the more chance they’ll avoid being counted.

What all of this means is that you’re probably only getting about 90 percent accuracy with identification of unique visitors.

Monthly stats can be even more misleading on occasion. JICWEBS sets the standard for calculating unique visitors per month when producing audits. The official method for counting unique visitors per month is to calculate how many you got in a single day, and then multiply that by 31. Thus if 100,000 unique visitors came to my news site in a day, I have 3.1 million unique visitors per month. So be aware that “unique visitors per month” may not be counting how many different people actually visited the site that month. Or it may. It depends.

We Can’t Count Duration

Think about what happens when someone is reading your site. They ask for a webpage. Then later they ask for another one. The time taken between the two requests is deemed to be spent reading the first page. Add all these durations up and you’ve got the total time of the visit.

This creates a problem for one-page visits. Since there is no second page, we can’t calculate a page duration. Officially a one-page visit is not a visit; it has to be two pages to count as a visit. Some packages won’t count the zero duration one-page visits when they determine average visit duration, but you’d be surprised how many do. If you are using one that does, you think people spend half the time on your site they really do.

Now think about what happens when someone reads the last page. There is no duration we can calculate for this page. What this means is that all web analytics packages are under-reporting the time people spend on your site. They have to because they can’t tell how long someone spends on the last page -- it never gets calculated.

We Can’t Count Visits

The JICWEBS definition for a visit is that it is a series of page requests with a gap of no more than 30 minutes between each one. If someone asks for a page 31 minutes after the preceding page, it must be counted as a new visit. You’d be surprised how often this happens with complex products like mortgages, insurance, and other financial products. Generally, the more detailed the page, the more commitment required to buy, the more chance you’ll get the occasional page view that exceeds 30 minutes.

On the other hand, what if someone views your site, goes off and compares it with a competitor, then returns after 20 minutes? That still counts as part of the same visit. Technically it constitutes a single visit of two sessions, but no one follows the differentiation of sessions and visits as the standard allows.

What both of these cases illustrate is that our counting of visits is based on an arbitrary selection of 30 minutes as the magic number. For most purposes this is fine, so long as you accept it is our best attempt at a workable number, not an accurate measurement of reality.

Conclusions

Web analysis is statistics, not accounting. Absolute precision is impossible. The problems listed above are an inevitable consequence of the nature of internet technology, not because we don’t care or because analytics software is shoddy.

This inaccuracy is okay so long as you don’t get too excited about the fine detail. Statistics is fuzzy around the edges, so you shouldn’t make decisions based on small differences. Understand that your visitor stats are accurate plus or minus five percent or even 10 percent. Recognize that people are spending a little longer on your site than you can ever know, or maybe a little less. It depends on what you’re looking it.

Add a margin of error to financial and ROI calculations. In statistical analysis, there is the concept of “degrees of certainty,” what us ordinary folks call “margin of error.” It is possible to calculate this with slightly more precision than guesswork. If you want to get into extreme details with your analysis, you need to start incorporating concepts like this into your numbers.

If you design your processes accordingly, the exact numbers shouldn’t matter too much. You are where you are today. You want to improve on this. The key to success is to concentrate on trends over time, not individual numbers.

Monday: Problems that result from user behavior and the current state of web analytics software.

Brandt Dainow is CEO of Think Metrics, creator of the InSite Web reporting system. Read full bio.

 

Comments