Roy Fielding: Geek of the Week

Comments 0

Share to social media


Roy Fielding is best known for his work in developing and defining the modern World Wide Web infrastructure by authoring the Internet standards for HTTP and Uniform Resource Identifiers (URI), for defining the REST architectural style, and as co-founder and former chairman of the Apache Software Foundation that produces the software for over 60% of public Internet web sites. He is also the founder of several open-source software projects, including the Apache HTTP Server Project. Roy still works on the Apache HTTP Server project.

He is Chief Scientist at Day Software, now part of Adobe,¬†, where he is responsible for guiding Day’s research efforts to update the infrastructure on which the Web is based.

Roy received his Ph.D. degree in Information and Computer Science from the University of California, Irvine, and serves as an elected member of the W3C Technical Architecture Group.

He co-authored, with Audris Mockus and James Herbsleb, a paper which he presented at the ICSE 2000 conference called ‘A case study of open source software development: the Apache server’. This was recently given the International Conference on Software Engineering’s Most Influential Paper Award (MIP) for its impact on software engineering research over the past decade. The other paper he presented at that conference a decade ago was his pioneering work on REST, which remains his most innovative achievement in the eyes of many. Perhaps, in ten years time, the verdict will have swung to that REST paper.

Do you remember what drew you into programming?
I can think of a thousand things that made it likely that I would end up being a software developer – I grew up in a beach community and everyone I knew spent their summers working at hotels, retail shops – I wanted something better, so I asked one of my Dad’s colleagues (who was then a professor of geography at University of California) what I would need to do in order to get a job at his software company. He said he’d hire me if I learned how to program in two different computer languages.

He didn’t say which ones so I signed up for a summer class at university to learn BASIC, then a community college class to learn FORTRAN – I landed my first real job as a junior programmer on the day after graduating from high school.

Fortunately, I arrived at the same time as a brand new, state-of-the-art Wicat minicomputer. I was assigned the tasks of learning how to use it, developing a backup process, and teaching the other programmers how it worked. I loved that job. I drove 96 miles a day, through Southern California freeway traffic, just to sit in a tiny machine room and explore the architecture of a minicomputer the size of a dishwasher. I taught myself Unix, shell scripting, and C in the short two months before moving to Reed College.

Are there big differences you can identify between your early approaches to programming style to the way you think about programming now?
Interactivity and incremental design are the main differences. I started programming at the tail end of the mainframe and minicomputer era, when most ‘programming’ was done on paper or punch card and access to the central computer was severely limited. I developed a habit of visualizing everything about the computation on paper. Although flow-charts were in vogue at the time, they were essentially useless for understanding software because all the box shapes were far more complex than the actual algorithms. I used structured programming techniques, Nassi-Schneiderman Charts and a pad of paper that had 80-column terminal screen layout.
How did you get involved with Apache?
To answer that, I have to go back further and describe how I got involved in the World Wide Web project. I started using the Web in early 1993, when there were about fifty public sites.

I remember following all of the links one day and viewing the entire Web, only to find that two more servers had been added while I was browsing. Later in the year, I installed UCI’s first Web server for a class project on information systems.

At that time, there were three primary sources for HTTP server software: CERN libwww, NCSA httpd, and BSDI Plexus. The original CERN software, written by Tim Berners-Lee’s team, was very complex.

At the same time, I became involved in the WWW Project’s mailing lists. The Web was an informal collaboration among many teams and individuals at research institutions spread all over the world. The protocols were designed on-the-fly, with new features being added on an almost daily basis. I wrote a program for logfile analysis called wwwstat that helped promote the Web by showing who was accessing a website, and helped design the common logfile format with the server developers.

I also wrote one of the first Web robots, MOMspider, for traversing a website to look for updated content or broken links, which led to the addition of the META tag in HTML for describing document metadata. I presented the paper I wrote on MOMspider at the First International Conference on the WWW in Geneva – that enabled me to meet many of the early Web developers in Europe.

Around that time I became involved in the standardization of the URL syntax and HTML data format within the Internet Engineering Taskforce (IETF), which led to being a student volunteer at the Second WWW conference in Chicago.

It was in Chicago that Sir Tim Berners-Lee asked Henrik Frystyk Nielsen and myself to edit the Hypertext Transfer Protocol for standardization within the IETF.

After some discussion on various mailing lists and in private email, a group of webmasters started to form around the idea of taking over maintenance of the NCSA httpd source code that had been placed in the public domain. I was invited because of my work on the HTTP standard and the previous bugfixes and enhancements I had shared with NCSA.

Of the eight Apache founders, I had only previously met Brian Behlendorf, at the Geneva conference. The group included three business owners, four doctoral candidates, and one Ph.D.
Each of us was involved for our own reasons.. The biggest problem we had was to find a way to make decisions as a group of peers without causing the server to become an incoherent mess of committee-driven software.

You were the first to coin the term REST. Is there a simple way to explain it?
Representational State Transfer (REST) is a software architectural style that describes how to construct network-based software applications such that they have the best characteristics of the Web. It acts as a guide for analysing and comparing software design choices, evaluating protocol decisions, and teaching the effect of design constraints on system properties such as simplicity, evolvability, and performance. It also provides an abstract model of how an ideal Web application behaves in order to maximize those beneficial properties.

Naturally, that isn’t a simple explanation for folks who aren’t building large-scale software systems. Software architectural styles like REST are analogous to the architectural styles found in building architecture, such as used for describing a house as “Victorian” or “Ranch” in the US. The style consists of a set of common design elements (constraints on the designer) that come together in a way that people find “good” (e.g., pleasing to the eye, better at retaining heat, or easier to live in if you can’t climb stairs). Styles make it easier for architects to talk about common architectural patterns, such as Doric columns versus Corinthian columns, and also help builders develop specialized techniques to ease construction.

Some people think that REST means “use HTTP”. It doesn’t. It is very easy to use HTTP in non-RESTful ways and also to use custom protocols other than HTTP to provide REST-based architectures. HTTP/1.1 was designed for RESTful interactions, but it is also usable as a general protocol for a variety of styles.

With REST it seems such a simple thing to use so why hasn’t it caught on in the same way that SOAP has for example?
You must remember REST is a style and SOAP is a protocol, so they are different. One can compare RESTful use of HTTP to SOAP’s use of HTTP. In contrast, I don’t know of a single successful large architecture that has been developed and deployed using SOAP. IBM, Microsoft, and a dozen other companies spent billions marketing Web Services (SOAP) as an interoperability platform and yet never managed to obtain any interoperability between major vendors. That is because SOAP is only a protocol – it lacks the design constraints of a coherent architectural style that are needed to create interoperable systems.

SOAP certainly had a lot of developer mindshare, though that was primarily due to the massive marketing effort and money to be found in buzzword-based consulting. One of the difficulties I have in describing REST is explaining (without exaggeration) what it is good for – what makes a REST-based architecture better than other architectures within the scope of network-based systems that it was designed to address.

Most of REST’s constraints are focused on preserving independent evolvability over time, which is only measurable on the scale of years. Most developers simply don’t care what happens to their product years after it is deployed, or at least they expect to be around to rewrite it when such change occurs.

As part of the Let’s Make the Web Faster initiative, Google is experimenting with alternative protocols to help reduce the latency of web pages. One of these experiments is SPDY (an application-layer protocol for transporting content over the web, designed specifically for minimal latency. Can you see SPDY playing a role in addition to HTTP as a next-generation protocol?
SPDY is an ongoing experiment in various protocol designs. It may get to the point where it is a serious alternative to HTTP, but right now SPDY suffers from a myopic view of protocol development. Latency is an important design concern, but the best way to improve latency is to not use the protocol at all. In other words, use caching.

SPDY can only be an improvement over HTTP if it works at least as well as HTTP for layered caching. However, the designers seem more interested in limiting the protocol in other ways, such as requiring encryption and tunnelling through intermediaries. If that continues, then I think SPDY will only be of interest to authenticated services that don’t want shared caching and can afford the infrastructure demands of per-client long-term connections (e.g., Google’s web

It is certainly likely that something similar to HTTP+SPDY will eventually replace HTTP/1.1 as the primary Web protocol. There is simply too much inefficiency in the HTTP/1.x wire syntax for it to remain the dominant protocol. I started working on one such alternative back in 2001, which I call the waka protocol, but I chose not to use a public standards process to develop it.

Is one of the problems with SPDY is it might hammer servers for resources faster than the current browser protocols so some servers already operating near capacity will be easily overloaded and need more hardware?
Keeping in mind that SPDY is still very much an experiment, the current design is not amenable to layered services. In other words, it is too hard for intermediaries to look at the messages and quickly determine which server should handle the request, which is something that is essential to all Internet-scale services. I suspect that the Google engineers are already being taught that lesson by their operations folks, so it wouldn’t surprise me if the design changed substantially in the near future.
Why has it taken so long for a strong identity system to become pervasive across the web? An OpenID is in the form of a unique URL, and is authenticated by the user’s ‘OpenID provider’ (that is, the entity hosting their OpenID URL). Will this be the authentication to foil hackers and fraudsters?
There are many social (non-technical) reasons. First, the initial Web was conceived and designed as a public service, not a commercial service, so clients and servers were deployed long before any strong identity was considered necessary. Privacy was considered far more important, at the time, and that is essentially the opposite of strong identity. As a result, the early Web only included the bare minimum for authenticating with legacy information services (basic login with username and password).

By the time we started standardization of HTTP, it was clear that we needed a secure authentication mechanism. However, the RSA patent made it impossible to agree on a simple encrypted identity mechanism. To make matters worse, the IETF leadership decided that any security mechanism for HTTP had to be developed within the Security area of the IETF, while the rest of HTTP was being developed by myself and Henrik within the Applications area. While we worked diligently on HTTP/1.1, the security area decided to abandon what they were tasked with and instead develop S-HTTP, an incompatible variant of HTTP that used encrypted messages to encapsulate HTTP requests. Naturally, that work went nowhere, and we were left with just Basic authentication and an under specified experimental version of what is now called Digest authentication.

Although we made HTTP authentication extensible, we failed to detail how browsers were supposed to react to unrecognised authentication mechanisms. As a result (and because browser protocol development has been largely static the past 15 years), it has been difficult to get any new authentication mechanism deployed. OpenID has finally reached enough critical mass to get through that barrier.

Will OpenID be the one to foil hackers and fraudsters? No. Most hackers get through by guessing poor passwords, finding good passwords written down somewhere (because they are too good to remember), watching keystrokes via scripts or spyware, displaying fake login forms that fool the user into entering their real username and password, or by the user reusing the same username and password for multiple sites. Only the last two vulnerabilities will be improved with OpenID, assuming that browsers standardize on a spoof-proof way to enter their credentials on an OpenID provider site.

Does it matter to you what programming language you use?
Yes, absolutely, though probably not in the sense that you mean. It matters to me that I use the right programming language for the kind of component being developed.

Each programming language comes with a certain style of design—a certain emphasis of one principle over another – that makes for a significant difference in programming. Perl, for example, is designed for system administration tasks like file processing and external process invocation; its efficiencies are practically useless for non-admin tasks. Erlang’s focus on the shared-nothing principle makes it ideal for parallel tasks, such as the stateless processing of HTTP request messages, but simply torture for anything procedural in nature. And nothing beats C or assembly language when available memory or specialized memory management is a constraint.

How much do you read The Art of Computer Programming? To read Knuth and really understand it, you have to be mathematically sophisticated, but do you really need that to become a programmer?
I read two of the volumes while I was an undergraduate, though its influence permeates many other books as well. However, most of my work has been at the architecture level of interacting components, so it has been ages since I used Knuth’s ACP as a reference. More important, I think, is to internalise the way of thinking it presents and let that influence all of your designs. This is especially true today, since current computer architectures, with multiple cores and huge cache-hit versus miss discrepancies, mean that many of the great non-linear or memory-intensive algorithms of the past are horrid now.
Are there skills part from programming that you think would-be programmers should use?
All of them. Software development is about creating something out of nothing to perform a given range of tasks within an ever-changing context. To do that right, a software developer needs to know not only how to program the computer, but also how to analyse the task to figure out what to program and how to anticipate the context as it changes over time. Most people have difficulty understanding the context, since it usually involves people who are decidedly not computer-inclined, and only a very few software developers are good at anticipating change and design accordingly.

About the author

Richard Morris

See Profile

Richard Morris is a journalist, author and public relations/public affairs consultant. He has written for a number of UK and US newspapers and magazines and has offered strategic advice to numerous tech companies including Digital Island, Sony and several ISPs. He now specialises in social enterprise and is, among other things, a member of the Big Issue Invest advisory board. Big Issue Invest is the leading provider to high-performing social enterprises & has a strong brand name based on its parent company The Big Issue, described by McKinsey & Co as the most well known and trusted social brand in the UK.

Richard Morris's contributions