Benjamin Pollack: Geek of the Week

Benjamin Pollack is well known for his work on Fog Creek Copilot, and Kiln. He is famous amongst young geeks for his role in a documentary film and website 'Aardvark'd: 12 Weeks with Geeks', which plotted his internship with Fog Creek back in 2005.

1287-benjamin_p.jpg

It’s easy to get a sense of what the software firm Fog Creek’s office is like by clicking through their website. The pictures bear witness to a pleasantly light-filled office with great views overlooking Broadway, the wide avenue in New York City that runs the full length of Manhattan.

Fog Creek, founded by Joel Spolsky and Michael Pryor to combat that strange world inhabited by ‘Napoleon-complex junior managers’, is the home of some great programmers.

To many of us, the Fog Creek office is most familiar as the home of Project Aardvark, made into a cult film ‘Aardvark’d: 12 Weeks with Geeks’. The documentary, filmed  in 2005, showed how four interns developed Fog Creek Copilot, a remote assistance software tool, having been given 12 weeks to design, develop, debug and ship it. One of those geeks was Benjamin Pollack, who then went on to help make the original prototype of what became Kiln, a version control system (DVCS) which effectively means fast access and no need to wait to check in code.

In his free time Benjamin works on his screenplay and his novel, plays Chopin and Rachmaninoff on his piano, and occasionally spends time coding for fun – usually in Pharo, Smalltalk or Python

RM:
Benjamin, what’s your background in technology? How did you learn to program?
BP:
I started to program when I was really young. My dad had worked at SRI on robots like Shaky (the first general-purpose mobile robot to be able to reason about its own actions) back in the 1970s, so we had a computer in the house as far back as I can remember. One day, my dad came home with a new computer, a 286, which came with a copy of GW-BASIC, and there was a game called DONKEY.BAS, which was written in it. My dad showed me the general idea, and gave me the GW-BASIC manual. So I taught myself some GW-BASIC. The stuff I wrote was really silly, but it was enough to pique my interest and get me started.

Things kind of grew from there. I somehow got my hands on a copy of Turbo Pascal later on, and was impressed by how much easier programming got with functions and structures – bleeding-edge even now, I know. Being eight is fun, because everything is totally new.

In middle school, I started to realize that I really liked programming, and by early high school, I had been really fortunate to discover Squeak Smalltalk. Smalltalk was the first language and environment I actually understood ‘all the way down’: the fact that Squeak in particular was written in itself all the way down to the VM meant that I could actually do things like debug into the compiler, figure out how my text was turned into a syntax tree, figure out how that was turned into byte codes, then debug those right into the VM to figure out how those ultimately got executed on the hardware. I was blown away by how incredibly powerful this environment was compared to anything that I’d used before.

That’s when I really got hooked: it was so easy to make amazing stuff using Squeak that I just got addicted. I wrote something, even if it was dinky, nearly every day. Eventually, I got to college, realized that I really wanted to make a career out of this, and did. I’d say it’s worked out well.

RM:
Are there languages that you use all the time and ones that you stay well clear of?
BP:
The languages I use all the time are C# and Python.

Python’s simply wonderful because it’s not only easy to read and write; it’s always really easy to figure out exactly what the language is going to be doing behind the scenes with any given piece of code. Combine that with a really active community and great tools, and it’s a hard language to beat.

C# gets flack for being a Microsoft product, but to be honest, I think it’s one of my favourite languages these days. People are always saying it’s just Microsoft’s Java, but I just can’t get that. It has lambda expressions. It has a sane event model. It’s got great asynchronous support that’s only going to get better with C# 5. It has type inference, real generics, LINQ, partials, just tons and tons of stuff that makes a tremendous difference when you’re writing real code. I honestly can’t stand using Java, but I love coding in C#.

I try really hard to avoid what I’d call ambiguous languages. What I mean is languages where it’s very hard to look at any given line of code and know what actually gets run. C++ is a great example: unless you take great pains to avoid operator overloading, mark your constructors as explicit, and so on, it’s extremely difficult for me to look at a line of C++ and have the faintest idea what’s actually going on.

I don’t even really mean this in terms of the machine code generated; I mean in terms of the actual semantics. I mean, take how iterators work. The right way to do iterators in C++ is to override the pointer dereferencing operator on your iterator object. But pointers are fundamental to the language!

So now you’re saying that I can have this pointer-like thing that isn’t necessarily a pointer, but might be, so maybe I can put it in a collection, maybe not, maybe it copies if I hand it off somewhere else, maybe it doesn’t. Who knows? Throw in reference parameters, and you have a variable whose pointer you might be able to get in the caller, but not the callee, even though the usage semantics are identical.

It’s not you can’t write good code in languages like C++; that’s patently false. I just believe my time is much better spent solving problems than second-guessing the language’s syntax.

RM:
Do you remember the first interesting program that you wrote?
BP:
That depends on what you count as interesting. The first program I wrote that I thought was really cool was that I played 20 questions on one of the Apple IIs at school, and wondered how it recorded what your answers were so that it could learn. So I wrote my own version. That forced me to learn about data structures in a way that I hadn’t really understood before. Before that point, I think I’d thought that all the data had to be hard-coded in the program; making something that could learn meant I had to learn about arrays and saving to disk.

That’s totally uninteresting in a grand sense, but I learned so much writing it that it stuck with me all these years, so it’s certainly interesting to me.

RM:
What are the things that if someone had sat you down at the very beginning of your career and said, ‘You need to know X, Y and Z’ that your life would have been much easier?
BP:
There’s really only one thing that I wish someone had told me: I wish I’d had someone point out to me is how important it is that you know how older technology works, because it never goes away.

In college, they like to focus on new, up-to-date stuff, so we were writing in Java and template-heavy C++ for group projects, and I used Smalltalk or Ruby everywhere I could for individual projects.

But in the real world, there’s lots of old code that you need to deal with, so while it’s great that you know how to write GUIs in Swing or play with object databases like Gemstone or use an SCM like Subversion, what you actually ended up working with back then was probably use VBScript to talk to Access and store your code in SourceSafe. Or if you were really lucky, you got to write Win32 GUIs with MFC and maybe use CVS.

Anyone will figure this out in three seconds on the job, but I think a little warning would have been nice to lessen the shock.

RM:
Let me ask you about your time at Fog Creek. How did Kiln start and was there a specific problem you were trying to solve? Did you write mocks in the test-first sense so you could test it as you went along?
BP:
Kiln’s history’s actually really interesting to me. The way it began was that a friend and colleague of mine, Tyler, really wanted to do this 48-hour coding competition called Django Dash. At the time, he and I both wanted to start doing code reviews more rigorously, but we hated all of the tools that we had available to us.

They all revolved around the workflow of, ‘take a patch, upload it onto a server, and then comment on it.’

This meant that the reviews were totally outside source control and weren’t versioned in any sense, so even the ones that had some decent discussion and some concept of history, you’d lose that discussion and history as soon as you checked in your code.

About the same time, I’d helped Fog Creek move from Subversion to a relatively early version of Mercurial, and we were still trying to figure out exactly how to make good workflows with this new tool. So Tyler and I decided that maybe our code review tool should simply be based on Mercurial.

So we did a mad dash in the code sprint to make a code review tool based on Mercurial. We ended up with a decent tool, and we won the competition, and that was the end of it for about six months. Then Tyler and I were talking one day, and we realized we really wanted to do something other than Copilot, so we decided to try to pitch bringing Kiln in-house. I used my Thanksgiving vacation to refactor the entire code base, named the product Kiln, and gave it the first stylesheet that I didn’t think deliberately looked horrible. Tyler and I then bounced that around for a month, demoed to the company in January 2009, and convinced them to make a real commercial product out of it.

We did not do TDD, nor do we now. Tyler wanted to, but we were changing how the app worked so much so frequently that we’d have spent the entire sprint writing tests. We had similar issues when we began working on the “real” version. I don’t think we’ll ever use TDD at this point.

RM:
What form did the design take? Pseudocode? Actual code? Whiteboard scribbles?
BP:
For the prototype, it was all actual code. We had no idea how the workflow should really work; just some loose ideas. So we talked for a few minutes about something we thought might work, then did an insane code sprint to just get something working, and then immediately began using it so we could try to figure out how the tool should actually work. We ended up surprisingly close to the mark, but being able to use the tool highlighted a bunch of problems we didn’t think of that we fixed before the contest ended. So basically: come up with a rough idea, code it up, try it, iterate.

When we brought Kiln into the company, things changed. We spec’d out Kiln’s high-level architecture on whiteboards, while our designers worked with us to develop relatively complete paper specs with images and descriptions of various workflows. We’d take those specs around to people in the company and see how they tried to use the paper interfaces. Once we had a pretty good idea that what we were doing didn’t completely stink, we’d code it up, distribute it to everyone, and promptly realize that the interface didn’t work at all. So then we’d do what we did in the prototype: do a bunch of revisions on the code based on feedback and whiteboard discussions until we had a usable interface.

Unsurprisingly, we now do only whiteboard specs, followed by working prototypes as quickly as possible. Having the iteration loop is much more helpful than trying to plan up front.

RM:
You say on your blog that Kiln has a mission, not a mission statement. What do you mean by that?
BP:
I didn’t say that Kiln has an indefinable mission. In fact, I defined that mission, in broad terms, as bringing distributed source control to as many developers as possible. We really do believe that distributed source control is that much better than classic solutions, like Subversion, that we’re on an almost religious drive right now to try to spread that to as many people as possible.

What Kiln doesn’t have is a canonical mission statement. I think mission statements are a bad idea for two reasons: first, you can just go too narrow. If I’d defined Kiln’s mission statement when we got started as ‘be the best code review system,’ we would have made an entirely different product. I don’t think it would’ve been bad, but I don’t think it’d be as interesting or useful as what we actually ended up delivering.

On the flip side, whether your mission statement is broad or specific, it can end up sounding trite and becoming meaningless after a while. You end up just parroting it back without really thinking it anymore. I’d liken it to Orwell’s lament about ‘dying metaphors’ phrases where people just regurgitated a metaphor without envisioning the picture it was trying to convey.

By not defining a word-by-word mission, my hope is we’re forced to actually think about what we’re doing, instead of just parroting something we’ve memorized. So far, that’s worked very well for us.

RM:
Did you have a big architectural picture of how Kiln was going to work before launching it, so you knew what the hard to solve areas were likely to be?
BP:
Thanks to the code sprint, we had a good idea what scaled well and what didn’t before we started on the ‘real’ version. Right from the get-go on that, we split Kiln up so that the disk-intensive parts would be on separate boxes than everything else, which honestly solved huge swaths of our performance issues before we got started. We’ve had to make a variety of tweaks to the architecture since then, but we managed to get everything basically correct from the beginning due to the prototype.

That said the prototype was completely instrumental in that realization. The prototype’s design would have performed abysmally if we hadn’t modified it. I think the real lesson here is, “Build a prototype. Smack it around. Find what doesn’t work. Fix that before you design the real version.”

RM:
How does distributed source control make Kiln better to use? Why should people adopt it?
BP:
There are so many things that distributed source control makes better. Having wonderful merging means that it’s easy for us to maintain multiple versions of Kiln and FogBugz at the same time. Having trivial branching means that it’s easy for us to do quick experiments, with full history, and then merge them later if they work out, or throw them out if they don’t. The distributed nature of source control means that we can keep working from home or on the train when we want to, and the speed of the whole thing means that we can do these incredibly frequent commits of tiny changes that make figuring out what change actually broke the build a tremendous amount easier. These are all advantages to any team of any size. While any little piece may sound kind of boring, when they’re combined, they make a tremendous difference to your development process. It’s why you get people who are such evangelists of Mercurial, Git, and similar tools. They really feel that DVCS makes a palpable difference.

You get all of that with any DVCS. What Kiln provides on top is all the tools that make coordinating with a team a lot easier. Have all of your personal repositories and branches in one place. Have a way of seeing exactly what changes you have or have not accepted. Get a way to see which bugs are fixed in any given version. Use code reviews to make sure that you’re not approving changes that haven’t been vetted and improved upon.

I think literally every feature in Kiln is a direct result of some problem that we’ve had at Fog Creek with source control, so not using Kiln these days feels very weird to me. I can get lots of work done using vanilla Mercurial: it’s a truly wonderful distributed system, even on its own. But it feels kind of like going from a whiz-bang modern graphing calculator to an HP-41.

RM:
What was the biggest challenge?
BP:
Oh, definitely getting all of the timeout issues working. Mercurial pushes are done as these single HTTP POST requests, and that’s a real problem if you’re looking at a big repository, like Firefox or OpenJDK, because it means you’re trying to jam sometimes gigabytes of POST data into a single HTTP request. Web stacks just generally aren’t designed to work that way. It took us months in the beta period to fully iron out all the things that needed to be changed, from our load balancers to the website to the backend to the client boxes.

The real challenges are rarely the interesting ones; the interesting ones are fun to solve, so they’re fun to work on, and usually get solved relatively quickly. It’s the ones that just involve a pile of tedious testing of bad hypotheses that end up being the real killers.

RM:
Ok, let’s move on to a couple of general questions. How do you tackle understanding a piece of code that you didn’t write? Do you just dive in and start reading it? How do you start?
BP:
I know this is heresy, but, when possible, I like to start with a debugger. Find a button or link that looks interesting, figure out one thing that fires when that button gets pushed, attach a breakpoint, then push the link or button. Doing that allows you to quickly follow through the real flow of the program, including any behind-the-scenes voodoo that implicitly runs code and might be hidden by a cursory read of the source-code. It’s also great for giving a hands-on view of how the designers intended the code to be put together.

Alongside that, or in place of it when there’s no reasonable way to explore with a debugger, I’ll usually poke around the source code, find a file that looks interesting, and poke through it. As I go, I’ll constantly grep through the rest of the code to figure out who uses these functions or objects and where. If you pick a good candidate-say, Repo.cs in Kiln’s source code, which ends up being in charge of everything to d with a repository-doing this will naturally expand out in a way that gives you a good narrative for how the program’s put together.

RM:
What are the characteristics that make code easier to read?
BP:
Good names, as always. Name variables and functions consistently and appropriately after what they’re supposed to do. If you mean them to be internal (for whatever ‘internal’ might mean on that language and framework-private variables, module-level variables, or the like), indicate that in some way in the naming so that I can quickly pick out the interface from the implementation.

Having short functions and classes helps a lot, too. I can only hold so much of a program in my head at any given point, so the higher level abstractions I can be using at any given point in the program, the more I can reason about at once. That means better optimizations and faster debugging.

And finally, and this is one of the biggest in my opinion these days: magic is bad. Don’t use super-clever tricks to mask what your code is doing. While that makes code superficially easy to read, it also does a wonderful job masking bugs and making understanding the actual control flow of what you’re reading extremely complicated.

RM:
Speaking as a programmer how have your ideas about language design and software changed over time?
BP:
My attitude towards language design hasn’t changed too much. For better or for worse, Smalltalk remains my biggest influence. I want languages that make functional-style code easy, by giving you lambdas or a similar construct. I prefer languages that are explicit over ones that are implicit or “magical.” I like languages with relatively simple grammars, so that their syntax doesn’t become its own source of bugs for your program. And I like some pragmatism, so while I like the error checking provided by static type systems, or the general safety provided by immutable variables in functional languages, I want the right to break those rules when I need to. I think Python, Go; CoffeeScript, C#, F#, and OCaml are all good examples of these guidelines, to greater or lesser extents.
RM:
A few of the people I’ve interviewed such as L Peter Duetsch and Don Knuth compose and play music and you’re something of a musician yourself. Is there a link between music and technology do such as source code versus music notation?
BP:
Maybe. Personally, I think that’s something of a red herring, though. Pretty much everyone likes music; bright people seem to be more likely to play music or compose it. I think whatever helps you be good at math or programming may help you be slightly better at composing or playing music, or even vice-versa, but I know too many good musicians who are lousy mathematicians to believe there’s some type of cosmic underlying link going on here.

I certainly don’t believe there’s any underlying link in aptitude between source code and music notation. Music is aural. I’m sure some people compose on paper first, but I promise that playing with my friends happens on the instruments, not on paper, and while I reason about source code visually, I almost never visualize what the music I’m banging out would look like on paper while I’m playing. In other words: source code is a program; music notation is not music.

RM:
How did you manage to trump The Social Network, the Zuckerberg biopic, by five years by appearing in a movie called Aardvark’d: 12 Weeks with Geeks? What was it about and has Columbia Pictures been in touch about optioning it?
BP:
Twelve Weeks with Geeks tracked the summer where we built Copilot. Three other interns and I got brought in by Joel to build this new product entirely from scratch. We started off the summer with nothing but a 40-page Word document, and somehow finished twelve weeks later with a real product that was immediately useful, and began making some actual revenue. In this sense, I guess, it’s the opposite of The Social Network: the Winkelvii (played by Joel and Michael) came to us and told us what they wanted, we built it quickly, and we immediately started making money.

I’ll be honest: I absolutely hated having that Aardvark’d made. It’s hard enough trying to make a product from nothing to selling in twelve weeks with three guys you until now had no working relationship with. It’s even harder when there’s a camera in your face constantly, recording every lame joke you tell and every idiotic thing you do.

But in retrospect, I love having Aardvark’d around, because it’s absolutely hilarious. A great drinking game is to take a shot every time I’m a jerk in the film. Unfortunately this game is probably directly responsible for most of my friends having sclerotic livers at this point. Thankfully, I’ve been told by reliable sycophants that I’m no longer a jerk, so I can just sit back and laugh at how I’ve grown up and move on.

Amazingly, Columbia Pictures did initially express interest, but ended up deciding to do a documentary on a banana stand owned by a real estate company in Orange County instead. I don’t honestly know what happened with that; they may have arrested development of the project.

RM:
Final question Benjamin: you’re writing a novel. Is that for fun or has it been commissioned and what is it about?
RM:
Oh, that’s just for fun. I love to write in general, and have always written for fun. When I was in college, I briefly flirted with the idea of writing screen plays, so I wrote a few of those, realized that my chance of making it in LA was effectively zero even if they happened to be good, and gave up on that idea. But one screen play, which I actually wrote well after I graduated, made a pretty good story, so I’ve been lazily working on novelizing it.

The story is basically an exaggerated version of a kind of soul-searching mission I went on back in 2007. I’d just gotten out of a crappy relationship, and was having some temporary burnout at work, so my best friend pretty much dragged me to JFK and shipped me out on the first flight to San Francisco and told me to do just about anything except code for a week.

I spent the week hiking, visiting old friends, and talking with people in the Valley about how the scene looked like out there. I came back feeling tremendously refreshed, but with a tremendous pile of hilarious stories and antics. I’m basically taking those stories and wrapping them in a plot based loosely on friends and acquaintances I had who were in banking got laid off when the mortgage crisis hit a head.

I think it’s an interesting story that will successfully sell at least two copies, provided my mom thinks it’s okay. Or maybe it’ll outsell Harry Potter. Who knows? One of the great things about just writing as a hobby, and for fun, is I don’t have to worry about that. Combine that with options like Kindle and Lulu, and I even have the option of publishing if I feel like it’s good enough. But, honestly, I write because I love writing. The end. If something great happens because of that, it’d be wonderful, but I’m not going to be upset if nothing comes of it.

Connect Mercurial and Kiln to SQL Server with SQL Source Control. 28-day free trial.