Stephen Curtis Johnson: Geek of the Week

Stephen Johnson, one of the team that developed UNIX, can claim to be the man who originally wrote the software tool that has been the longest continuously advertised and marketed software tool ever, since 1984. Lint for C and C++ was not his only success, though. He wrote YACC too, still used after 35 years, the Portable C Compiler, and possibly his greatest achievement, the MATLAB compiler.

Steve Johnson was part of the core team at Bell Labs and AT &T that create Unix as we know it. He wrote YACC, Lint and the Portable C Compiler.  The development of YACC (Yet Another Compiler Compiler), a parser generator, made it possible for people who were not language experts to make small – domain-specific languages – to improve their productivity. He wrote YACC to help build compilers, and used it to write the Portable C compiler (PCC) that was moved to more than 200 architectures.

It’s amazing what you can do
on an i-phone … I don’t think
technology is the problem
here, it’s our thick heads.

The design of the tool  had characteristics in common with  many of the Unix utilities. It was part of the atmosphere in those days, and this design style has persisted in most of Stephen’s work since then.  Steve remembers the time with affection, ‘This was truly a golden age at Bell Labs-brilliant theoretical computer scientists working closely with excellent programmers. I inherited a compiler for the B language from Dennis Ritchie, and this began an interest in compiling and languages that continues to this day.’

825-SteveJohnson.jpgThe other tool for which Steve is famous is Lint. It was derived from PCC, the Portable C Compiler, which was included with V7 of the UNIX operating system. Lint was a preprocessor written to indicate problem areas in code such as  variables being used before being set, conditions that are constant, and calculations whose result was likely to be outside the range of values that could be represented in the type used. Versions of Lint were eventually developed for many C and C++ compilers. This function is nowadays usually included within the compiler itself but Lint-like applications are also used to indicate ‘code-smells’. Gimpel PC-Lint, first introduced in 1985,  is still sold, and is used for analysing C++ source code.

We are still using,
for the most part,
the same conceptual
models of software
that I used when
I graduated from

Steve’s passion for computing started as a child when he saw his first computer. ‘When I was about five years old, my grandfather, who worked for the Bureau of Standards, took me to see a computer-it was the size of a small house, and we literally walked through it. I can still remember the heat beating off the vacuum tubes and the hiss of the air conditioning. At that point, I decided I wanted to work with computers, and never looked back.’

There were no computer courses in college so he studied Mathematics. Since Steve earned his Ph.D. in Mathematics, he has spent his entire career in computing. He has worked on topics as diverse as computer music, psychometrics and VLSI design but is best known for his work on Unix tools and the first AT&T UNIX port. He also ran the Unix System V language development department for several years in the mid-1980s.

Steve worked for twenty years with the USENIX Association, a computer users group, serving on the board and eventually as president.

In 1986, he moved to Silicon Valley, where he  he was an integral part of  some  half-dozen startups, including Transmeta, a maker of Intel-compatible, low-power chips. He was cofounder of another. He worked mostly on compilers, but also developed 2-D and 3-D graphics, massively parallel computing, and embedded systems. 

Meeting Cleve Moler at one of the startups led to a long-distance consulting relationship with The MathWorks during 1993-95, when he built the first MATLAB Compiler. Now he works full-time  at The MathWorks.

 His advice to would-be programmers is that ‘you can’t rewrite a program too many times, especially if you make sure it gets smaller and faster each time.’

‘I’ve seen over and over that if something gets an order of magnitude faster, it becomes qualitatively different. And if it is two orders of magnitude faster, it becomes amazing. Consider what Google would be like if queries took 25 seconds to be answered,’ he says.

What first prompted the development of YACC and were you trying to solve any problems when you wrote it?
I wanted to insert an ‘exclusive or’ operator into the ‘B’ compiler that Dennis Ritchie had written, that was running on our Honeywell computer.  He had used operator precedence, and I managed to do it, but not without some serious rewriting of the existing code. I was complaining about how hard this was at lunch one day, and Al Aho said ‘I think there may be an easier way’. He taught me about Donald Knuth’s recent work on LR parsing, and set out to make a table that would control the expression parser. After several days and many typos, I asked him ‘just what are you doing, here?’. Then he told me, and I said ‘I could write a program to do that.’ And I did.
What was the worst aspect of developing the language?
Without question, the biggest challenge was the size of the PDP-11 computers we were using for the early UNIX systems. At the time I believe we had 8K words (16K bytes) available for user program and data. It quickly become clear that we could not use the LR algorithms as they were published–they required too much space. So we had to prove some theorems that let us use a much smaller representations. And managing the temporary space was a real challenge, with a few arrays being overwritten by successive passes.
The small amount of space also encouraged a very cryptic syntax, which was softened a bit later on but still was a significant bother for early users.
Were there any fundamental flaws in other languages that drove the  development of YACC?
The other compiler-compilers available to us ran only on mainframes, and had very poor usability — they tended to back up when they hit a syntax error, and lost a lot of information about what kind of error happened, or even where it was in the file. LR parsing had much better error detection.
Are you surprised that the language is still around after 35 years? What do you think has kept it viable?
I rewrote the implementation about 14 times in the years between 1973 and 1978, and it ran roughly 10,000 times faster when I was done (some of that was faster hardware). Also, I made the language much easier to use. But a lot of its longevity was that there hasn’t been much technical challenge to its niche–there are more general parser generators available, but most languages don’t need them and they often produce slower and larger parsers. In a nutshell, Donald Knuth did a good job with the theory, and I implemented it well enough that implementation quality did not become a problem.
Lint was the name originally given to a program that flagged suspicious constructs. Do you think education is the answer to developing better software and that somehow we will get away from that ‘let’s get it out now despite the bugs’ way of thinking?
That’s a very interesting question. I originally wrote lint to help me debug the YACC grammar I was writing for C. I had the grammar written, but no way to test it! I could run it on the body of C code that existed, but I had no way of telling whether I had correctly “understood” the programs. So I hit on the idea of checking the consistency between function calls and definitions (this was before C had function prototypes, so mismatches were a common source of errors). When we ported Unix to a 32-bit machine, all of a sudden portability become very important, and we found ourselves looking for things that might cause portability problems.

I am convinced that in the future, languages will become almost inseparable from their IDEs — this is starting to happen with Java IDEs now. The moment you type something that has a problem, the IDE should tell you. Or should even prevent you from typing it at all – why do we still have to manually balance parentheses in this day and age? Also, I really like the idea of writing a program interactively — that is, interactively typing operations on real data, and then collecting the history of what I did into a function when I’m happy with it. This is a common pattern with MATLAB code. I don’t know of any IDEs that combine these two patterns, but I’d like to see one.

Given that our technological civilization depends on software why is most of it so poor?
I’m reminded of something my grandfather, a physicist, said when I was a young boy, and it was announced that a radio signal had been sent entirely around the world. He said ‘Well, we can now send a message around the world in 1/7 of a second, but it still takes 20 years for that same message to go through 1/4 inch of human skull!’

We are still using, for the most part, the same conceptual models of software that I used when I graduated from school – hierarchies of functions, declarations, sequential operations, contiguous arrays, files, databases, even objects existed in a recognizable form back then. We need to get with parallelism, exploit extremely cheap hardware and disc, and let the computers do a lot more of the grunt work of programming and testing.

And yes, it’s a bit like doing a heart transplant on someone while they are running a marathon -which is a lot of the reason why we aren’t farther along. But I do think that a lot of the problem is our collective inability/unwillingness to think about old things in a new way, and put some money and some will towards realizing these different ideas.

Do you think there will come a time when we will be able to use simpler  and less power-hungry hardware which are not high-end embedded systems?
Yes. We are already seeing this. It’s amazing what you can do on an i-phone. Even a language like JavaScript, despite its limitations, is capable of far more than it is being asked to do, and it runs practically everywhere. I don’t think technology is the problem here, it’s our thick heads.
The 1960s saw the birth of the modern hacker ethic: that information should be free, authority is suspect, and merit trumps credentials. Why do you think these values persisted so long in the computing world?
I think in the early days, programming really was a lot like mathematics. The scale was roughly the same, and the focus was on algorithms. And there is a long tradition of mathematics being taken as truth, and not subject to ownership or patent protection.

Now, with programs like Linux being 11 million lines of code and climbing, there is little doubt in my mind that programming has become engineering. And there is an equally long tradition that engineering is, by itself, creating value that can be owned and protected. Engineers need to depend on their suppliers in order to get their work done. So it makes sense to me that companies that maintain and adapt open-source code have become the heart of the community now. Without that, we have engineering done by amateurs, and that’s scary to contemplate.

I was reading that one of your interests is computer music. Do you develop the software or compose the music?
I worked with Max Mathews and Dick Moore at Bell Labs in the 60’s, working first on computer-controlled analog equipment, and then the early digital technologies. It was a hobby, something we did after 5PM. I made some minor contributions to sound synthesis, but after several years decided that just doing more programming in the evening wasn’t what I really wanted to do–I wanted to perform. I joined a chorus in New York, and started a richly enjoyable hobby that I’ve continued to this day. But I haven’t done much computer music at all since the 60’s.
Finally Stephen, what are you working on now?
I am currently working for The MathWorks, the company that supplies MATLAB and Simulink products for technical computing. I maintain the MATLAB front end, and have built a lint product, M-Lint, for MATLAB-an interesting technical challenge, since MATLAB is dynamically typed.