The Programmer’s Picture Book

The Programmer’s Picture Book is a prototype for a software documentation system. The picture book treats program code as a graphic object.

We are used to thinking of programs as text, and this doesn’t really work for software documentation. The problem is that there are a lot of different things a reader might want to know about a piece of code. Looking at a function, we might want to know what it does, or how to call it, or why it is called, or what are the alternatives. The reader might want an overview, or one of several kinds of detail. The communication problem is dense and multidimensional, but straight text can only present one narrative at a time. So the same topics are addressed over and over in different ways. The descriptions are all longer than they should be, because you have to repeat things in order to establish context, and do a lot of cross referencing. Authors get to feeling bad about what a long slog it all is, so they make it longer with attempts at humor and conversational style. The reader is faced with hundreds of pages of text that is mostly irrelevant to her problem

Similar problems in statistical graphics design have been tackled by Edward Tufte, who has written a series of books about how to put meaning into graphics. His fundamental idea is to put more information on each page. He writes, “If the visual task is contrast, comparison, and choice – as it so often is – then the more relevant information within eyespan, the better.” If the reader has to flip back and forth between two pages, comparisons will be harder to make and relationships harder to see. If it is well designed, a single diagram shows more than two diagrams with the same information. Of course, the reader can get lost in too much information, so there is a design problem – how to show more without creating confusion. That’s the goal of good graphic design.

The problems with text explanation of code are exactly the problems that Tufte identifies in bad graphics. Because of low information density, comparison and understanding are more difficult because the data is spread out across multiple pages. In software textbooks we are always having to flip pages, forward and back, between a code example and the text that explains it. What if we treat code and its explanation as graphics objects? Then for example, the code and its explanation can be put on the same page, using colors and shading to separate them.

In this article I want to show you two pages of the programmer’s picture book, from a description of database access in Microsoft’s ADO.NET. These examples illustrate four of Tufte’s techniques:

Micro-macro readings – Put enough information in each graphic, that the viewer can see detail, but also draw back and see how the detail fits into something larger. Show the detail and the larger context at the same time.
Small multiples – “Small multiples resemble the frames of a movie: a series of graphics, showing the same combination of variables, indexed by changes in another variable.” This is a way to put a lot of information on a page, and allow comparison and pattern understanding,
Data/Ink maximization – Ink attracts the eye. Ink that is not showing information gets in the way of understanding. Tufte spends a lot of time showing how removing and lightening grid lines makes a graph easier to understand. “Non-data-ink: less is more. Data-ink: less is a bore.”
Layering and separation – “Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity … Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data.”

The first example is from the overview. The graphic on this page is a variation of the UML class diagram, showing objects and their relationshiphs. It demonstrates the use of small multiples – using the shape of the diagram to help readers understand the components of the library and their purposes.

The second example is a code page – it shows how to solve a particular problem – in this case, how to call a SQL stored procedure. This single page contains all of the information in a 22 page chapter of a textbook on the subject. It makes extensive use of layering and separation, to put a code example and a variety of explanatory comments into one eye span.
Excluding trivial bits like the page title, this diagram has six layers of information:

The code example itself
The narrative of what is happening, in the blue comments to the left
Miscellaneous comments in the amber text to the right
Highlighting points out code that is specific to the topic on this page
The class diagram in the upper right hand corner shows where this code fits in the library (micro-macro reading)
Links to other explanations and related material

The arrows that anchor the commenting are made as light as possible, so they can be followed, but you can read the code without being distracted by heavy lines. In an automated environment, it would be nice to hide the arrows when they are not being used.

These techniques are not amazing or unprecedented, but applying them deliberately results in a remarkable compression and clarity of information. With this approach the ADO.NET library can be shown in detail in just under 30 pages. Trade textbooks on the subject run between 300 and 400 pages, and contain less information. Perhaps some people prefer 400 pages of meandering repetitive slop. I like this approach. The principles from Tufte are

– Put as much information as possible within eye span
– Use color and other typographic elements to create layers, so that the varieties of information can be distinguished.
– Eliminate ink which does not convey data.
– Show detail and context in the same graphic.

I’ve made a few rules of my own, to show in one page as much as possible of what a programmer needs to know on a topic.

– Show both the call and the function called.
– When a parameter is passed, make sure the information on data types is accessible.
– Always show enough of the code to make clear where this code fits in the overall program.

That’s the programmer’s picture book. I’m currently creating diagrams manually. The goal at this phase is to produce some examples, and to develop a specification for a software environment to make this kind of documentation more automatic.

Leave a Reply

Your email address will not be published.