Wednesday, March 4, 2015

Time is a river of events, and strong is its current

Time is a river of events, and strong is its current;
no sooner is a thing brought to sight than it is swept by
and another takes its place, and this too will be swept away.
~ Marcus Aurelias ~
It is doubtful that one can survive in today's software engineering environment without an understanding of asynchronous, event-driven development, and that's especially true for web developers. It's seems a little difficult to believe, but we've had AJAX for more than 10 years, and for the last 6 years or so we've had a very powerful asynchronous server-side tool in nodejs.

Is being asynchronous all it's cracked up to be, however? You may be asking yourself, if I'm just building a tool that does one thing, and I know the order in which the tasks need to be done, why not just use synchronous code...and that is a really good question. Let's take a hypothetical journey and see how easily we navigate this river of time using a real-time approach.

Let's say I am building a website. To keep costs low and security high, I'm going open-source and using Apache as my web server and I'm using Apache Sling™ to manage my content. Now I can easily build a single website and get it working without a problem; however, my use case requires that I set up thousands of 'child' sites - one for each of my local offices. Since this is a hypothetical, we don't need to bother with what sort of product is being offered - it could be an educational institution with satellite offices or a religious institution, for example - we just need to serve each local 'office' as it presents different hours, a different location, and offers slightly different services.

I can roll this whole parent-child monster out by creating the right architecture - a master template with child templates and specialized content. Doing this manually is feasible on a very small scale, but quickly becomes something that cannot be managed easily. One possible solution to this dilemma is for each office to be responsible for building their own custom site using the master site we set up; however, a significant amount of education and pre-configuration would be required to make this option a reality, as each office will need either dedicated attention or an in-house 'expert'.

Luckily for us, Apache Sling is itself built on a ReSTful framework, so all we need to do is gather the data for each of the offices and then use curl or some other utility to generate the correct HTTP commands to create the site, pass in the data to update all the special values, and voilĂ  - we build a website in less time than a human can even react to a signal. Easy, right? Not so fast.

When a person is using the interface to create the website, then going through and changing the content, there is a lower limit on the time it takes to perform those activities. Assuming the web server responded instantaneously (which we know is a fantasy itself), there still would be a response and hundreds of milliseconds between requests. What happens when a tool, like a team of paddlers in a raft, issue the HTTP requests too quickly, purely asynchronously without waiting for a response? There are a few possibilities - the web server starts seeing your attempts at creating and updating content as a DoS attack and responds accordingly, or it doesn't respond distinctly to a DoS attack and is overwhelmed, or we have a race condition between the requests. All of these possibilities leave you without a site at the end of the process, and possibly with a non-functional web server. As a Dungeon Master might say, "the stronger and faster among you paddled fiercely, spinning your boat in circles as the strong current of time washed away your boat".

What is needed in this case, is a tour guide who can tell those who are propelling the boat when to paddle and where. Let me demonstrate this idea by addressing the hypothetical situation by building a nodejs application that will create the website, then update the content, and finally, declare success. In order to do this, I'm going to use the asynchronous request (initiated by request method in the http library) to enable our "tour guide" and navigate the hazards by emitting an event (using the EventEmitter from the events library) after we've handled the response.

But...but...but...I heard that building anything synchronous in nodejs is wrong, and this sounds synchronous. Yes, in the application sense this basically turns asynchronous commands into a synchronous process, but only somewhat, as it's really more a synchronized process, not synchronous. After all, our tour guide doesn't need to know what else is going on downstream or what's going on under the water or what other rivers are like - they only need to know how to navigate this hazard on this river to get out safely on the other side.

Beside the fact that this is more a synchronized process than synchronous, we must admit that many things in the real world rely on being at least somewhat synchronous. For example, how can we know if we can update a profile unless we know the profile exists or is created or how can we know if we can ship a product unless we know we have it? We can't. Does that mean for all of these synchronous activities we cannot use nodejs? Poppycock. Yes, there is great power that comes with asynchronous processes - but that perception of power is because we assume the processes can be run in parallel. I could insert the old joke here about nine women having a baby in one month, but we all know the reality - we all recognize that there are some processes that cannot be run in parallel.

Another way to look at this is that sometimes, we lose focus and forget that while our single boat is navigating the river there are other boats and other rivers. If you're really concerned about performance, build your app in such a way that all boats on the river travel the river safely at the same time, even if each boat has to navigate from point to point.

Back to our hypothetical...a simplified portion of the code for this might look like...

function doNextStep(response) {
  if (response.status_code === 200) {
    switch (response.step) {
      case 1:
        doStep2();
        break;
      case 2:
        doStep3();
        break;
    }
  }
}
function doStep1() {
  // create the site
  destination.step = 1;
  sendRequest({host:myhost, path:'/bin/wcmcommand', port:4502}, destination);
}
function doStep2() {
  // update the site
  branch_data.step = 2;
  sendRequest({host:myhost, path:'/' + childsite, port:4502}, branch_data);
}
function doStep3() {
  // log the results
  logResults(branchData);
}

function sendRequest(options, data) {
  var req = http.request(options, function(res) {
    var body = '';
    res.on('data', function(chunk) {
        body += chunk;
      });
    res.on('end', function(err) {
        try {
          data.response = JSON.parse(body);
          emitter.emit('step_done', data);
        } catch(err) {
          console.log(err);
        }
      });
  });
  req.write(data);
}

emitter.on('step_done', doNextStep);

Of course you could use something like a timeout to build in human-like reaction times, and that might work for most of the situations, but the problem here is generally not on the end issuing the HTTP requests, it's with the end receiving the requests and processing them. We need something that watches for the response instead of waiting for a response, and this is why the callbacks on the asynchronous methods were built. Like a tour guide, the emitter calls out each time we've gotten a response and are ready to address another hazard as we've finished with the last.

In this example, we've created a river of events that our tour guide navigates, and in our hypothetical, synchronized process, it would be conceivable that we could set up a nice website in less than 200ms. The difference in speed of a (long) 200ms synchronized process compared to a human doing the same task makes me care less about whether or not something meets my anti-synchronous ideology.

Happy coding.

No comments:

Post a Comment