Getting Paid to Think: The Myth of Analytics

I've been writing web interfaces since the mid 1990s - which in this industry is eons.

When I first started, we compiled analytics directly from the web server, watching primarily how many visitors there were, how long it took to push the page(s) out, and what paths they followed when they came to the site. Looking at the data that showed what visitors were reading was helpful - we were able to see what the most popular pages were and what the most popular path was. For merchants, this gives a window into cross-selling opportunities and more.

As technology grew, we noticed that the longer it took to download pages, the less likely people were to buy stuff. So we started tracking page weight a little more closely...and then, as we realized that a 100KB image is not the same as 100KB of HTML, which is not the same as 100KB of CSS, which is not the same as 100KB of JavaScript, we started looking at the weight of each of the resources and then, ultimately, this mysterious measure we now call 'time to interactive'.

I should take a minute to point out that all of these numbers are important, and if you're not keeping track of all of them, and if your [insert your name for front-end developers here] don't understand the difference between each of these categories and why they are each important, you're setting yourself up for failure.

Here's the thing with this shift, though...the only way to get numbers directly from the user's machine (sometimes 'Real User Measurement', or RUM) is using JavaScript. So, the tool we use to measure our performance has to first be delivered to the browser in order to do the measurement.

So the first problem in this shift from server to client measurement is that the total amount of time cannot be accurately measured - it's not possible.

The closest we could come is to calculate the difference between the time (on the server) the response packet was pushed out and the time the request packet fired by the 'interactive' event was received. However, even this time is unreliable because it has two network traversals in it, and while there is a way to measure the upstream time, there is no way to measure the downstream time (and I have yet to see anything that will measure even upstream time).

The second problem in this shift from server to client measurement is that it relies on JavaScript.

Over the course of my career, I've heard numerous times that this reliance on JavaScript is not an issue, because the majority of users have JavaScript enabled, but again, we come into a number that cannot be accurately measured. In order for a user to be measured...

they have to get our response that contains the JavaScript code that sends the request(s) to the appropriate recording mechanism on the server
that code has to be loaded and compiled by the browser
that code cannot conflict with or be dependent on any other code that may or may not be loaded

...and even if all that can happen, it has to happen before the user gets frustrated enough to click away. It's also not a good idea to rely on all three of those stars aligning all the time - there have been instances where developers have included references to code that isn't loaded which have brought down entire sites. When your site goes down in this manner, it can become very difficult to resolve because you have no idea where the problem lies.

One group measuring interactions of a first-world community with a relatively high degree of trust (a group that would have little to no reason to have JavaScript disabled or otherwise unavailable) found that nearly 3 percent of interactions were untracked by their client-side solution.

If we assume, however, that everything goes well. All of our code is delivered to the end user, and there are no conflicts, and all the request packets are coming in and we're able to measure everything - even the upstream traversal time. All of this data gives us a table we'll call "page views by browser", which contains the following records - "Chrome, 67, 55%; Safari, 29, 24%; Firefox, 19, 15%; Other, 5, 6%".

What does this data tell you? It tells you that 67 visitors used Chrome, 29 visitors used Safari, and 19 visitors used Firefox. That's all. Even if you add the version(s) used and what the operating system was and you were able to tell which were mobile versus desktop users, that information is not typically tied into conversion information...and it doesn't tell you anything about the 5 people who stuck around long enough to be tracked but were small enough numbers that their information is not reported.

Additionally, if we were to combine the 6 percent in the "Other" category with the 3 percent that were untracked, that's 10 percent of customers you have very little information about.

The third problem in this shift from server to client measurement is that it does not measure anyone in the 'click away' category. Those who have everything loading properly but who get bored by the length of time your site is taking to respond.

Of course there are ways to correct some of the deficiencies in this shift from server to client measurement. There are also ways to address some of the issues your improved analytics can identify. You have to be willing to put in the work to identify where your analytics package/process is lacking analysis and improve it...or, alternatively, you can continue believing the myth that your 'analytics' is actually describing your (potential and actual) customers rather than just describing a subset of your customers.

I'd encourage you to review the data you're collecting and see the shortfall(s), from that point on it's just a matter of creative coding to rectify the issues.

Happy coding.

Getting Paid to Think

Sunday, July 8, 2018

The Myth of Analytics

No comments:

Post a Comment