ellipsis flag icon-blogicon-check icon-comments icon-email icon-error icon-facebook icon-follow-comment icon-googleicon-hamburger icon-imedia-blog icon-imediaicon-instagramicon-left-arrow icon-linked-in icon-linked icon-linkedin icon-multi-page-view icon-person icon-print icon-right-arrow icon-save icon-searchicon-share-arrow icon-single-page-view icon-tag icon-twitter icon-unfollow icon-upload icon-valid icon-video-play icon-views icon-website icon-youtubelogo-imedia-white logo-imedia logo-mediaWhite review-star thumbs_down thumbs_up

The web analytics standard that failed us

The web analytics standard that failed us Brandt Dainow
VIEW SINGLE PAGE
In my last article, I made reference to some definitions in web analytics standards and was criticized by a number of people who said I had ignored what was in the International Web Analytics Association's (WAA) standard. They were right. The reason why I ignored the WAA standard is because I don't consider it to be a standard. To be a standard, something must be either rigorously precise or in widespread use. The WAA document is neither.

One of the world's leading standards bodies is the British Standards Institute (BSI), whose standards are used in the majority of countries in the world, including the U.S. The BSI's definition of a standard states that to be a standard, a document should contain "a technical specification or other precise criteria designed to be used consistently." The WAA document is anything but precise.

Web technology is a branch of computing science, which is a branch of engineering. All computing standards can be reduced to precise engineering specifications, which can themselves be reduced to mathematical equations in physics.

If you wanted to, you could express a computing standard, such as HTTP, in terms of the mathematical formulas of the raw sub-atomic physics occurring within the electronic components of a computer. If the WAA standard was following this model, it would provide precise definitions. For example, it would be able to say that a "visit" consists of a series of "page requests" for combinations of specified types of files. The standard would then define a "page request" as a specific type of HTTP GET request. We could then refer to the standard for HTTP if we wanted to break this down further.

HTTP GET is defined in the Internet Engineering Task Force's (IETF) standard for HTTP 1.1, RFC 2616. The communications between browser and web server contained within RFC 2616 can be defined in terms of the standard for TCP/IP, which is RFC 1122, and so on, all the way down to the pure math of sub-atomic physics.

Next page >>

These definitions should be provided in a clear and unambiguous way that leaves absolutely no room for interpretation. Most internet standards come from the IETF or from W3C. In broad terms, IETF handles the hardware-related stuff, such as TCP/IP and HTTP, while W3C handles the "soft" stuff such as HTML and XML. Where possible, both provide their definitions in a special format called Extended Backus-Naur Format, or EBNF. EBNF is also used to define the syntax and operation of programming languages. EBNF is so precise you can create EBNF processing software that is able to run any programming language for which it has the EBNF definition.


EBNF achieves this precision by avoiding words and using symbolic notation instead. For example, the definition for a vowel would look like this in EBNF:


Vowel := A | a | E | e | I | i | O | o | U | u


As you may have guessed, "|" is used as "or." Notice the precision. There is no possibility of ambiguity or misunderstanding -- a computer could understand this. This is the level of precision you need in order to build software that can implement your rules.


By contrast, the WAA's definition of a page is "an analyst definable unit of content." In other words, a page is whatever you say it is. That's not just a vague definition, it literally is no definition at all. What this says is "whatever you want." I cannot implement that in web analytics software; I cannot check to see if someone is compliant with the standard because (by definition) everyone is; I can't even compare my data with someone else's because I don't know if they are measuring the same thing as me.


EBNF is not an academic ideal. HTTP, HTML, XML and many other standards are written in EBNF. In other words, the web runs on EBNF. For example, here's the EBNF definition for specifying the version of HTTP in a message:


HTTP-Version := "HTTP" "/" 1*DIGIT "." 1*DIGIT


Some people have argued it is not possible to achieve this level of precision in web analytics. That is, of course, rubbish -- it has already been done, and done many times.

Every piece of web analytics software operates to the level of precision offered by EBNF -- it has to. Web Trends, Omniture and Google Analytics can't decide on the fly what constitutes a page. These are just programs. They work mathematically, with mathematical precision. The people who wrote them had to put 100 percent precise definitions into their system. It's simply not possible to write vague computer code. Computers can't decide for themselves. Even when they appear to, all they can do is select from a set of pre-determined, precisely-defined alternatives. 


Not only have programmers created precise web analytics definitions, other standards bodies have as well. The first web analytics standards were created in the 1990s by the Joint Industry Committee for Web Standards (JICWEBS). For example, WAA says a unique visitor is "an inferred person." By contrast, JICWEBS defines a unique visitor as "a unique and valid identifier. Sites may use IP+User-Agent and/or Cookie." This is how it is done in practice by software. For example, Google says "a visitor is defined using a unique numeric identifier stored in the Google Analytics tracking cookies." Google couldn't implement the WAA definition if it wanted to because the concept of an "inferred person" is meaningless in computing terms.


There was a push by some people to have the WAA standards committee work to this level of precision in the early days. However, the committee decided that it didn't want anything in the standard that would be "too complicated." In other words, it was more concerned with being popular than doing the job it was appointed for.


I think the WAA underestimated our intelligence.


The standard doesn't even meet the requirements the WAA standards committee set. When the committee started work on the standard, it decided to accept the existing standards from bodies like JICWEBS. It recognized, or said it did, that having competing standards would move the industry backwards, not forwards. I guess you could have a debate about whether the WAA followed through on this or not.  While the WAA's definitions are certainly at a variance from the JICWEBS standards, you could argue that they don't compete because they aren't clear enough to be an alternative -- vague waffling can't be seen as an alternative to a precise definition.


However, it is not necessary to be this precise in order to be a standard. A standard can become a standard if people actually use it as such. For example, the IETF states that something is a standard if it "has multiple, independent, and interoperable implementations." In other words, if there are multiple systems deployed that use a definition in the same way.


This is certainly the case for the JICWEBS standards -- they are used universally for online auditing. Furthermore, every online audit is "interoperable." This means the readership numbers you get from one online publication have been calculated in exactly the same way as the readership numbers from a competing publication. When you compare them, you are comparing like with like.

The problem for the WAA is that absolutely nobody is using its standard, and nobody can -- it's too vague. Companies like Google and Indextools have publicly stated where they comply with the WAA standards, but none are 100 percent compliant. Even if they were, it wouldn't mean much. For example, if a page view can be whatever you like, to be compliant all you need to do is report page views, no matter how you calculated them. This would not make WAA-compliant systems interoperable. They could be calculating totally different things under the same name.


We could do better. WAA standards are not vague because it's impossible to define things, or measure them, but because no one can be bothered to raise an intellectual sweat and try. Creating precise, robust, useable standards has been done for much more complicated things than visitor behavior on websites. My guess is that some definitions would take things to new levels such that no software was 100 percent compliant. This has been the problem with the JICWEBS standards. The groups that define their standards are dominated by the web analytics software manufacturers, and they have said they won't create a standard that isn't already implemented in their systems.


Both JICWEBS and WAA are wrong in this regard. You don't look at the state of things, write it down and call it a standard. Can you imagine if we created laws like that? "Well, people get angry and kill each other, so we'll make that legal."


The WAA should be setting the agenda, not following the crowd. The task of the WAA standards committee should be to determine how web analytics metrics should be calculated in order to achieve the highest degree of precision possible. The WAA should be laying out the roadmap for the way things should be. It then falls to the vendors to bring their software into line. It may be that some definitions that came out of such a process would be impossible to implement with today's technology. So what? That's what R&D is all about. If we only aimed for what we already knew, we'd still be sitting in caves picking fleas off each other. The entire web is the result of people creating standards first that were later picked up by vendors.


Tim Berners-Lee didn't invent HTML by copying what IBM and Microsoft were doing. He designed the system first and then found people to encode it in software. Google didn't copy what Yahoo was doing -- first it invented the algorithms for ranking on paper, then it worked out how to implement them in software. I'll bet Google had no idea how it was going to implement some of its algorithms when it wrote them. The company still has things it wants to do but no idea how to. That's what keeps Google moving forward.


What the WAA has done is a retrograde step -- the WAA standard has less precision and utility than the JICWEBS standards, so it moves us backward not forward. However, WAA is a major force in the world of web analytics and online marketing. What it says matters. In this light, the work of the WAA standards committee is a disaster for the web analytics community. It will take years to undo the damage and create proper precise standards that can be implemented in software. The WAA "standard" is not a standard, it's just second-rate muttering.



Brandt Dainow is an independent web analytics and marketing consultant working in the U.K. and Ireland.

Brandt is an independent web analyst, researcher and academic.  As a web analyst, he specialises in building bespoke (or customised) web analytic reporting systems.  This can range from building a customised report format to creating an...

View full biography

Comments

to leave comments.

Commenter: Vincent Amari

2009, January 12

Absolutely agree with this whole article, and well done to Brandt for trying to enforce some sanity into this subject.

Commenter: Ron Duquette

2009, January 06

I agree with you, there is no clear standard today for web analytics but many are trying to create one (sometime for their own benefit). I would focus on what works and doesnt work right now, standards will work their way in around success and braod adoptiability...

Ron Duquette - Neighborhood America