Wednesday, February 28, 2018

My Pen Is My Tongue

A series of tweets about self-documenting code
A few days back I sent out a series of tweets about "self-documenting code". Self-documenting code is an idea that's been around for many years, like stories about the wee folk...and like the wee folk, no one has seen self-documenting code.

The short version of the tweet series is that if you're writing code, you should be writing documentation as well - it's really too important to skip. This post, however, is not really about self-documenting code, but rather about how to write documentation, and more specifically a certain piece of documentation that you should never neglect.



I entered the whole techno-geek world at a time when computer labs were a real thing. Punchcards and shelves full of binders stuffed with documentation were commonplace. Documentation isn't like that anymore. When Java came along, I was almost enthused to use JavaDoc because of the level of clarity it added when writing the documentation. Now that nearly all code written by large, technologically advanced firms is either in Java or JavaScript (or ECMAScript), JavaDoc and JsDoc are - or should be - the de facto standard.

There is seldom serious argument against using one of these two tools anymore. There is disagreement about how the tools should be used, however. In the JsDoc community, one of the points of contention is the @author tag. To be clear, the JsDoc tool authors have stopped using the author tag and no 'contributor' tag has been added. It might seem, from their use (or non-use) that these tags are unimportant, and in fact, that is a common perception, especially in light of the advances in source code management, or what we used to call "version control".

However, not only should you use the authorship tag(s), you should be encouraging everyone else to use them as well.

It would come as a surprise to no one if I reminded you that we write code to solve problems. Not only are we writing code to solve problems, we're writing code to solve complex problems. For example, no one would write code to add two numbers...doing simple calculations on large data sets, perhaps, but there is a "complexity bar", below which we wouldn't dream of using code to address a problem. The first step of writing code is understanding the problem you're trying to solve.

As a hypothetical example, let's assume you've inherited a project. You've read the documentation that describes the solution to the problem the code offers, but after getting a small understanding of the problem combined with the solution being used, you have a list of questions. Why was this particular solution chosen over other solutions, for example. You can make some assumptions, but wouldn't it be nice to be able to contact the author to ask for their insight? Code, even well-documented code, is only a partial story. Just like every fan of a book turned into a motion picture knows, even faithful adaptations leave out bits that someone thought important. The first reason to include authorship information in your documentation, then, is the abundance of information it can point you to.

The common response to this concept is that the authorship information is not needed in the documentation because source control software, like git (my personal favorite), can track that information and expose it through tools like blame.

This response, however, misses the purpose of such tools. Version control is tied to a specific change...in git parlance, a commit. Yes, you can look at a particular line and see the last change of that line - the author of that change - but that is qualitatively different information than the author of a solution...and that information is generally only the last change. In order to get authorship information you must follow changes to a specific line back through history, and if at any point history was squashed or rewritten, that information is gone. Version control tools are excellent at solving the problem the author intended them to solve, as the author understood the problem; do not expect another author's code to solve a problem as you understand it.

Another reason authorship is important is we, as an industry...and really we as the human race...have difficult acknowledging the contributions of women and persons of color. The list of women who have significantly contributed in STEM fields without attribution is long...far too long. Not including attribution participates in that system of oppression by reinforcing the status quo. If we want to have any hope of disrupting patterns of discrimination, patterns that have existed for millennia, we must combat it at every turn.

A while ago I wrote a post called Visibility and Obscurity that described a situation in which attribution was changed on work I had done. In academia this is typically called plagiarism, and in most instances it's a punishable offense. Even outside academia, claiming to have done something you have not done can have serious consequences - Scott Thompson's resume scandal is evidence of that.

We should be writing code we can release with pride. Build things you're proud of and put your name on it...and give that same consideration to others. Amplify voices that are too often silenced or ignored - it does not diminish your contribution and it makes a difference. If it only makes a difference to the woman or person of color who finally has their contribution recognized - that's enough. If the only people who see an authorship reference are your employees, your colleagues, that's enough - they are important too.

Happy coding.

No comments:

Post a Comment