Saturday, February 16, 2019

About Code: On documentation

In my decades writing software in the public and private sector, one of the things that seems to come up repeatedly is documentation, and the one thing I've learned about it is that it's a big deal. Nearly every engineer has an opinion about it - everywhere on the continuum. Over the years, I've heard arguments that have included strongly worded missives about how counterproductive documentation is and it should never be used to how helpful it is and engineers shouldn't consider their work complete without it. So, with all that in mind, I wanted to put down thoughts about documentation - not only why it's important and what its purpose is, but how to write good documentation.


There's a maxim in software engineering - there will always be at least two engineers on a project: you and the engineer you were six months ago. In fast-paced environments - those where features are churned out in one-week or two-week sprints - engineers may be solving multiple problems in a short period of time. In the past, we used project notebooks to keep notes about tests we ran or solutions we tried so those actions were not repeated. But, those days are gone. Today we require a different kind of collaboration - one in which all project notes are shared by everyone on the team. In environments where the problems are increasingly complex - calculating fraud and risk scores for example - it's even more important to track not only where the project is, but where it has been. Documentation should be, in part, a living history of the code.

On a side note, here, I've heard the argument that we should use source control (e.g., git) for this purpose. I would urge you, in the strongest language I could use, to not do this. Tools built into source control for tracking history are purposefully simple, often telling you only who made a change and when that change was made. It's possible, if the tool is robust enough, that you may be able to trace a change back to its original commit - and assuming that commit hasn't been amended or overwritten you may be able to find out what the change was and assuming the commit message is detailed enough you may be able to find out the why and not just the what...but there are a lot of assumptions in that process (and I have a rule about assumptions).

Documentation is not only important as history, but also for a number of other reasons.

Current practice is to write more small, concise functions than monolithic systems. In this pattern, we've gotten away from one of the purposes of documentation - it takes less time to read the documentation than reading the code. The other piece of that justification was that anyone should be able to read the documentation (which has a lower cognitive load than the code) and be able to understand what the solution to a particular was. One might argue that neither of these arguments apply in the face of advances the software engineering community has seen in the past two decades...and I might be inclined to agree, until I encounter engineers who don't understand the difference between i++, ++i, and i = i + 1. The choices that we make when crafting a solution to a particular problem are important, and the next engineer to come along is not likely to frame the same problem in the same way, but if we explain our choices the next engineer can make informed choices about where to make modifications...like is it worth the two extra bytes in i=i+1 to avoid evaluation bugs that can pop up when using an increment.

It's important for the documentation to be in the code, not somewhere else. Why? It's an accessibility issue. Too often engineers not only build interfaces that are not accessible, but they build them without using accessible practices, creating code that lacks accessibility. Placing the documentation outside of the code introduces a level of complexity that can be a significant problem. As a helpful note, I'll just include here that two simple things you can do to significantly improve the accessibility of your code is (1) use tabs instead of spaces and (2) include comments directly in the code in a common tag format, such as jsdoc.

Additionally, while I won't repeat the argument here, documentation - specifically documentation about authors, creators, and innovators - is important because it can help reduce the rampant bias that is ravaging our industry. You can read more about this particular topic in my blog post Creation, Attribution, and Misogyny.

Good documentation does not repeat the code, but explains it in plain language. If the problem is particularly complex or prone to misunderstanding, it should describe the problem as well as the solution. For example, if our problem is credit card validation, what does that mean? Are we validating the issuer number, the card number against the card type (verifying that if the user says it's a Visa™ card they provide a Visa™ card number), are we validating the number against transposition errors using the check digit, are we validating the expiry date, or are we validating the type against a list of accepted types. Any, or all, of these rules can be called 'validation', and even this plain-language list can be confusing, as rules such as check-digit algorithms can be different depending on the card type. That sort of information is not general information, even among engineers who are familiar with the industry and would be very helpful in documentation.

Good documentation also resolves odd, confusing practices engineers or organizations may have. For example, let's say you're creating a JavaScript library and you want to expose an event every time someone changes the value in an HTML input. There is a native HTML event API and there's an event that fires when a value has been changed - called change - but it only fires after the value has changed and the user has gone on to another input or task. In your library, though, you want this new event to fire when there has been any change to the value in the input, not just when all the changes to the input have been made. In this case, you would need to repurpose a different HTML event - keypress or input for example, but what would you name this new event? If you're like some, you would name this new event change because it fires when a value is changed. This would be a good point in the documentation to explain that this new event uses the same name as an existing event but that it's different.

There are a few good arguments to justify the inclusion of documentation, and no good arguments for its exclusion that I've seen work in the wild. In three decades of writing code alone and as part of teams, I have never seen self-documenting code or a sufficiently verbose source control, and I'm not anywhere near being alone in that experience.

If you are writing code, document it, and even more important, retain as much documentation as you can when modifying the code - perhaps by using the jsdoc tag @since to identify changes - your future self and your teammates will thank you for it.

Happy coding.

No comments:

Post a Comment