Friday, August 26, 2016

Free Rum

People have asked me what has helped me debug nasty, not-so-easily-reproduced bugs and nearly always the answer is "exhaustive testing"...but what about when it isn't. Ok, most of the time it really is exhaustive testing. Of course, I make it a habit to write code to handle all different data types even when I'm 99.999% sure it will always only be one data type, because when you have 100M+ users using your code daily you'd be surprised how often 0.001% is.

So, how do we debug issues that seem to only occur 0.001% of the time? Obviously we're not talking about those use cases that fall into within the "normal" range, we're talking about outliers - or what I refer to as the oddball case. Here we're talking about those times when all our unit tests pass and still our code fails in the "real world" where there are a few cases - extreme corner cases, admittedly - when users are experiencing a "sub-par visit". These instances can easily bore a hole into our confidence (and ego...it's a little bit ego), leaving us scratching our head in confusion and frustration. What do we do then?

As with any good pirate, the answer is RUM, and lots of it!

Go ahead, start making your own pirate references and talk like a pirate...I'll wait...and I'm pretty sure I can hear you singing....
"Fifteen men on the dead man's chest —
...Yo-ho-ho, and a bottle of rum!
Drink and the devil had done for the rest —
...Yo-ho-ho, and a bottle of rum!"
Robert Louis Stevenson, Treasure Island

Of course in our case RUM does not refer to the pirate's beverage of choice but refers, instead, to Real-time (or sometimes Real) User Monitoring. The emphasis here, obviously, is on user monitoring, not testing or assuming what might be happening, and it's different than synthetic user monitoring, which is a simulation (and typically done as part of a comprehensive testing plan) because it uses real people in real-world situations.

This last point - that it uses real people in real-world situations - cannot be stressed enough. Why? Because, in the specific situation this post addresses, our software engineering has begun to move away from how people have evolved. People typically solve problems in a linear manner - it's how we've evolved. C came about because A and B happened. We make our selections in a store, go to the checkout, and pay. We don't suddenly jump out of line to go to the bank and apply for a credit card and expect to return to our place in the line with our basket full of our selections. For a very long time, our software matched this model exactly, or very nearly so. Oh, we might have a decision point where we would loop back into a process, but it was still a linear process. Most of the time, this works well, but as anyone who's stood in a queue hoping to order food only to find the customer ahead of them hasn't quite decided what they want knows, sometimes there are problems with synchronous, single-thread experiences.

We tried to resolve the synchronous, single-thread problem by adding threads. This often works alright, for the most part, but as we discovered, multiple blocked threads are not really any more productive. The answer, then, was an asynchronous (non-blocking) approach. Now, here we are, years later often suffering in callback hell and struggling to debug software because our linear, synchronous experience no longer applies to development of asynchronous systems...and as was mentioned earlier, that's why RUM enters our toolkit.

There are, of course, many ways to implement (or pour) RUM. If you're running any one of several traditional web servers, IIS or Apache, for example, Splunk is a good choice. I've written a proxy server for a Splunk app and reviewed a couple Splunk references - Splunk Developer's Guide and Learning Splunk Web Framework - so I'm not without respect; however, the downside of Splunk is that it can become expensive. For this reason alone, even if your operation uses Splunk in the production environment you may choose to forego it as a support for research and development.

As a no-cost alternative, however, you can pour your own RUM if you're using a node.js server. That's right, I said it - no-cost and RUM together - FREE RUM - (part of) every pirate's dream. Since it's relatively easy to do, and as long as you don't do something really boneheaded, you can build it securely, hoist the Jolly Roger, talk like a pirate, and follow the map below. Be warned, I'm not pointing out all the dangers (like how you can expose private or confidential information) - there are a few (fairly obvious) pitfalls, but this should be enough to get you in the general area of the treasure you seek (and I should add that you can find a more complete example/prototype in my rum github repo - https://github.com/hrobertking/rum).

First, you'll need to assemble a crew and put them on a ship. Do that by installing the socket.io module, adding it to your server module, and binding it to your httpServer instance.


Server-side JavaScript (Socket instantiation)

var server = http.createServer(handler),     io = require('socket.io')(server);

You should note that the server need not be the native node.js server, it can be an extension of the node.js httpServer object - e.g., an Express instance.

Now that you have a crew and ship, decide when you'll raise the Jolly Roger - do that by emitting an event through the socket and passing an object into the event, e.g., io.emit(event_type, event_object);.

The event type is a string literal so it can be named (nearly) anything and you can emit different events at different times, as in the example below.


Server-side JavaScript (Event Handlers)

var msg = {   id: unique_id(),   req: request,   res: response };   msg.req.on('end', function() {     msg.action = 'request';     io.emit('message', msg);   });   msg.res.on('finish', function() {     msg.action = 'response';     io.emit('message', msg);       if (msg.res.statusCode !== 200) {       io.emit('http error', msg);     }   });


Now put the spyglass to your eye and scan the horizon. Your spyglass is going to be a static page that uses the socket.io client script (https://cdn.socket.io/socket.io-1.3.5.js) and instantiates a socket. That socket will then be monitored and your event handlers will run when the event comes over the socket.


Your 'spyglass' document

<!DOCTYPE html> <html lang="en">   <head>     <meta http-equiv="Content-Type" content="text/html; charset=utf-8">     <style type="text/css">       .request { background-color:red; }       .response { background-color:yellow; }       .request.response { background-color:green; }     </style>   </head>   <body>     <script src="https://cdn.socket.io/socket.io-1.3.5.js"></script>     <script>       var socket = io();             socket.on('message', function(obj) {           var node = document.getElementById(obj.id), cls;           if (!node) {             node = document.createElement('div');             node.id = obj.id;             document.body.appendChild(node);           }           cls = node.className.split(' ');           cls.push(obj.action);           node.className = cls.join(' ');           node.innerHTML = '<p>' + obj.req.url + '</p>';         });             socket.on('http error', function(obj) {           var node = document.getElementById(obj.id);           node.className += 'http-error';           node.innerHTML += '<p>' + obj.res.statusCode + '</p>';         });     </script>   </body> </html>


Now, as you watch the 'spyglass' document in your browser, when the 'io' events are fired on the node.js httpServer they will be handled in the handlers specified in the spyglass.

Now you have insight into the important pieces of code, not as you test them (because you can get that elsewhere, like the Chrome dev tools), but as others use them, and it's debugging in those real-world situations that takes our development to the next level in our drive for results. Now, go get some pirate booty.

Happy coding.

No comments:

Post a Comment