After more than a decade in the web analytics industry, one of the things we frequently hear is people telling others not to sweat data quality. All that matters, they say, are the trends.
Although true to a certain extent -- web analytics should never be mistaken for an accounting tool -- you should not gloss over data quality. The integrity of your data is paramount to good web analytics. If your data is inaccurate, then you're effectively trending bad data -- and that means you are making important decisions based on erroneous data.
Here are three reasons why you may not be getting accurate -- and hence, useful -- data out of your web analytics system:
What you get out of your web analytics system is a direct result of what you put into it, or in other words, how you implement it. If your implementation is wrong, you are not going to get good quality data, trended or otherwise. If, as we've seen in one particularly vivid example, you mistakenly swap two variables in your ecommerce tagging, such as quantity and price, you'll certainly drive trends. Instead of showing the sale of one large screen HDTV television set for $1,000, you end up showing the sale of 1,000 HDTVs for $1 (actual case). It's a perfect example of why data quality matters. It's actually easier to figure out you made a mistake on ecommerce sites because you have an external checkpoint in your back-office systems to compare with. However, there are instances where you have no back-end systems to compare with. If you miss tagging pages, then your views per visit and total page views will both trend down, and you may never know the real reason. Check and re-check your implementation. Get professionals involved to make sure your implementation is correct, and you are capturing the right data points. Only then is your data worth trending.
Even if your implementation is correct, there could be other factors affecting your data quality. Five years ago, the digital world was much simpler. Everyone accessed your website through a computer, which made it very easy for you to determine -- through simple tracking tools -- how many people were visiting your website. Today, people access your website through a PC or laptop, an iPhone or an Android phone, an iPad or another tablet. Now you have a situation where the same visitor is likely showing up three times within your global system. Over time, you may see a large increase in your user base. You might be under the impression you are doubling or tripling your user base, when in reality, you are reaching the same audience as before. Here's another real-world example of new technologies impacting your data quality: the steady increase in cookie blocking/deletion. Thanks in part to increasing government scrutiny and heightened consumer awareness, more and more people are deleting or blocking cookies in their browsers. Cookies are small files that websites put on your computer to show whether you (on an anonymous basis) have visited that site in the past. Without the cookie, your web analytics system will show them as new visitors -- even though they could have visited your site many times before. If you've been seeing a trend of increasing "new visitors" over the last two years (or fewer visits per visitor), what you're really measuring is an increasing data quality problem. Trending doesn't solve the problem, it is the problem.
Sometimes, the changes in your trends are a direct result of vendors changing their algorithms. There is no correlation between these changes and what's happening on your site. If you think these changes don't happen often, think again. Over the last year, the two biggest web analytics providers have changed the way they define visits or site sessions.
In August 2011, Google announced a sudden change to the way they measure visits in Google Analytics. Prior to the change, a visit was defined as instances where the elapsed time between two page views was more than 30 minutes. In this model, a visitor that clicked on two separate Google search results within a 30-minute period would only have counted as one visit. Now, new visits are defined as instances where the elapsed time is more than 30 minutes, or if any of the referring information associated with the visitor is different. In other words, if people keep trying different search phrases to find you on Google within a 30-minute period, they're counted as multiple visits. In most cases, there is an automatic increase in site visits in Google Analytics without you doing a thing. Similar changes occurred when Omniture rolled out SiteCatalyst V15. In V15, changes (improvements, fortunately) were made to the way visits without cookies are counted and the way time on site is calculated. The first change, in particular, can result in quite a significant change in the visits metric. Trending visits without adjusting for the change in the tool methodology is completely misleading.
The belief that data quality problems only randomly impact your trended data is clearly erroneous and shouldn't be downplayed. You need to have confidence in the integrity of your data before you can truly analyze your data and make decisions. Data quality comes first on your checklist, then trending.
Ali Behnam is co-founder of Tealium.
Gary Angel is co-founder and president of Semphonic.
On Twitter? Follow iMedia Connection at @iMediaTweet.
"Recycle bin filled with crumpled papers" image and "Analytics" image via Shutterstock.