Alfred Aho: Geek of the Week

Before the advent of PowerShell in Windows, we reached for AWK for those information-processing tasks that required just simple code of a few lines. AWK, and the principles that were embedded in it, became the bedrock of many languages that followed, including Perl. It was created by Alfred Aho, Peter Weinberger and Brian Kernighan; and it was included in the UNIX distribution. We a asked Alfred Aho how it all came about.


Some say UNIX was to computer science what the Bible is to divinity students. It was the computer language which emerged from that great power house of Bell Labs the R&D facility of AT&T, the lightly-regulated monopoly that ran the US telephone network for generations.

Of course, it also gave birth to a number of languages including C and C++, all three of which have become the standard toolkit of IT professionals.

One of the most common UNIX tools to process text-based data in either files or data streams is AWK, which to all intents and purposes was the predecessor of Perl.

Developed by Alfred Aho (the ‘A’ in AWK) Peter Weinberger (the ‘W’) and Brian Kernighan (the ‘K’) the language has several flavours including an enhanced version for Bell Labs called ‘Nawk’, and a Gnu Project called ‘Gawk.’

Since the program’s early days, Alfred Aho has served as Chair of the Computer Science and Engineering Section of the National Academy of Engineering, as Chair of ACM’s Special Interest Group on Algorithms and Computability Theory, and twice as Chair of the Advisory Committee for the National Science Foundation’s Computer and Information Science and Engineering Directorate.

He is currently the Lawrence Gussman Professor in the Department of Computer Science at Columbia University, is a Member of the U.S. National Academy of Engineering and of the American Academy of Arts and Sciences.

Alfred, did the idea of AWK come about because you were trying to overcome technical issues or a need?
AWK was born from the necessity to meet a need. As a researcher at Bell Labs in the early 1970s, I found myself keeping track of budgets and other administration. I was also teaching at a nearby university at the time, so I had to keep track of student grades as well.

I wanted to have a simple little language in which I could write one or two line programs to do these tasks. Brian Kernighan, a researcher next door to me at the Labs, also wanted to create a similar language. We had daily conversations, which culminated in a desire to create a pattern-matching language suitable for simple data-processing tasks.

We were influenced by GREP [Get Regular Expression and Print] which was developed by Ken Thompson who designed and implemented the original UNIX system of course.

GREP would search a file of text looking for lines matching a pattern consisting of a limited form of regular expressions, and then print all lines in the file that matched that regular expression.

We thought that we’d like to generalize the class of patterns to deal with numbers as well as strings. We also thought that we’d like to have more computational capability than just printing the line that matched the pattern.

So out of this grew AWK, a language based on the principle of pattern-action processing. It was built to do simple data processing: the ordinary data processing that we routinely did on a day-to-day basis. We just wanted to have a very simple scripting language that would allow us and people who weren’t very computer savvy, to be able to write throw-away programs for routine data processing.

How would you describe AWK’s ability as a language?
AWK is a language for processing files of text. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on.

An AWK program is of a sequence of pattern-action statements. AWK reads the input a line at a time.

A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.

A simple example should make this clear. Suppose we have a file in which each line is a name followed by a phone number. Let’s say the file contains the line ‘Naomi 1234’. In the AWK program the first field is referred to as $1, the second field as $2, and so on Thus, we can create an AWK program to retrieve Naomi’s phone number by simply writing $1 == “Naomi” {print $2} which means if the first field matches Naomi, then print the second field. Now you’re an AWK programmer! If you typed that program into AWK and presented it with a file that had names and phone numbers that program, then it would print 1234 as Naomi’s phone number.

A typical AWK program would have several pattern-action statements. The patterns can be Boolean combinations of strings and numbers; the actions can be statements in a C-like programming language.

AWK became popular since it was one of the standard programs that came with every UNIX system.

What would you say has been the best moment in developing the language?
The fact that it was developed by three people: me, Brian Kernighan and Peter Weinberger. Peter Weinberger was interested in what Brian and I were doing right from the start. We had created a grammatical specification for AWK but hadn’t yet created the full run-time environment.

This initial form of AWK was very useful for writing the data processing routines that we were all interested in but more importantly it provided an evolvable platform for the language.

One of the most interesting parts of this project for me was that I got to know how Brian and Peter thought about language design.

With the flexible compiler construction tools we had at our disposal, we very quickly evolved the language to adopt new useful syntactic and semantic constructs. We spent a whole year intensely debating what constructs should and shouldn’t be in the language.

Language design is a very personal activity and each person brings to a language the classes of problems that they’d like to solve, and the manner in which they’d like them to be solved.

I had a lot of fun creating AWK, and working with Kernighan and Weinberger was one of the most stimulating experiences of my career. I also learned I would not want to get into a programming contest with either of them however! Their programming abilities are formidable.

Interestingly, we did not intend the language to be used except by the three of us. But very quickly we discovered lots of other people had the need for the routine kind of data processing that AWK was good for. People didn’t want to write hundred-line C programs to do data processing that could be done with a few lines of AWK, so lots of people started using AWK.

For many years AWK was one of the most popular commands on UNIX, and today, even though a number of other similar languages have come on the scene, AWK still ranks among the top 25 or 30 most popular programming languages in the world. And it all began as a little exercise to create a utility that the three of us would find useful for our own use.

Can you remember any surprises in the way that AWK has developed over the years?
One Monday morning I walked into my office to find a person from the Bell Labs micro-electronics product division who had used AWK to create a multi-thousand-line computer-aided design system. I was just stunned. I thought that no one would ever write an AWK program with more than handful of statements. But he had written a powerful CAD development system in AWK because he could do it so quickly and with such facility. My biggest surprise is that AWK has been used in many different applications that none of us had initially envisaged. But perhaps that’s the sign of a good tool, as you use a screwdriver for many more things than turning screws.
Would you have done anything differently developing AWK?
One of the things that I would have done differently is instituting rigorous testing as we started to develop the language. We initially created AWK as a ‘throw-away’ language, so we didn’t do rigorous quality control as part of our initial implementation.

I have been teaching the programming languages and compilers course at Columbia University, for several years. The course has a semester long project in which students work in teams of four or five to design their own innovative little language and to make a compiler for it.

Students coming into the course have never looked inside a compiler before, but in all the years I’ve been teaching this course, never has a team failed to deliver a working compiler at the end of the course.

All of this is due to the experience I had in developing AWK with Kernighan and Weinberger.

In addition to learning the principles of language and compiler design, the students learn good software engineering practices.

Rigorous testing is something students do from the start. The students also learn the elements of project management, teamwork, and communication skills, both oral and written. So from that perspective AWK has significantly influenced how I teach programming languages and compilers and software development.

I am very happy that other people have found AWK useful. And not only did AWK attract a lot of users, other language designers later used it as a model for developing more powerful languages.

About 10 years after AWK was created, Larry Wall created a language called Perl, which was patterned after AWK and some other UNIX commands. Perl is now one of the most popular programming language in the world.

So not only was AWK popular when it was introduced but it also stimulated the creation of other popular languages.

A lot of people I’ve interviewed for Simple-Talk get frustrated by the complexity of some software and that debugging isn’t done thoroughly enough. Do you think debugging should be taught alongside programming?
Yes, I do, though I don’t know of any general theory for debugging but I think the best approach would be to stress examples of unit tests, systematic testing processes and the use of debugging tolls as part of every programming course.
Why do you think AWK has inspired many other languages?
What made AWK popular initially was its simplicity and the kinds of tasks it was built to do. It has a very simple programming model. The idea of pattern-action programming is very natural for people. We also made the language compatible with pipes in UNIX. The actions in AWK are really simple forms of C programs. You can write a simple action like {print $2} or you can write a much more complex C-like program as an action associated with a pattern. Some Wall Street financial houses used AWK when it first came out to balance their books because it was so easy to write data-processing programs in AWK.

AWK turned a number of people into programmers because the learning curve for the language was very shallow. Even today a large number of people continue to use AWK, saying languages such as Perl have become too complicated. Some say Perl has become such a complex language that it’s become almost impossible to understand the programs once they’ve been written.

Another advantage of AWK is that that the language is stable. We haven’t changed it since the mid 1980’s. And there are also lots of other people who’ve implemented versions of AWK on different platforms such as Windows.

What advice would you give anyone thinking of designing a programming language?
It’s vital that they keep eventual users in mind. Having others say they used your tool to solve a problem is extremely rewarding. It’s also very satisfying having others build on your work to create more powerful tools.