Matthew Gentzkow
Jesse M. Shapiro
Chicago Booth
June 25, 2012
Introduction
Every step of every research project we do is written in code, from raw data to final paper. Doing research is therefore writing software. Over time, people who write software for a living have learned a lot about how to write it well. We follow their lead. We aim to write code that would pass muster if we worked at Google or Microsoft. Economists sometimes write code that is like stream-of-consciousness: a more or less random series of steps that happen to produce the right result. A good way to generate this kind of code is to use Stata interactively for an hour and then copy and paste the list of commands into a text editor. This code will do what it is supposed to do. But it will be very difficult for someone other than the person who produced it–or even for that same person after a day or two–to read and understand it. It will be virtually impossible to modify or extend it. And if anything about the environment changes–if you try to run it on a different computer, say, or change the name of an input file–it will break.
It is obvious that Google Maps or Microsoft Word could not be written this way. The code for these programs must be written so that many people over many years can read and understand it. It must have a logical structure that makes it easy to fix, modify, and extend, and allows people to take pieces developed for one problem and apply them to another. It must be robust enough to remain viable in a constantly changing environment. And it must be efficient. Although our projects are orders of magnitude smaller, our code needs to fulfill these same requirements. This document lays out some broad principles we should all follow. Both this document and our understanding of what makes good code are constantly evolving. Few of us have formal training and we’re learning as we go. If you look at code we’ve written in the past, you’ll see that most of it fails some of the criteria below and much of it fails most of the criteria. We encourage you to invest in reading more broadly about software craftsmanship, looking critically at your own code and that of your colleagues, and suggesting improvements or additions to the principles below.