Modern Plain Text Computing

This is the course website for the Fall 2024 seminar SOCIOL 703, Modern Plain Text Computing, taught at Duke University by Kieran Healy. It is required for first-year students in the department and open to others who are interested in the material.

Motivation

Researchers depend on computer software to get their work done. But often, they do not know enough about how their computers work. This makes their lives more difficult. I don’t mean it’s a shame not everyone is a software engineer ready and able to write applications from the ground up. Rather, as researchers working with data of various kinds, we just don’t make the best use of our computers. Nor are we encouraged to reflect on why they work the way they do. Instead we end up fending for ourselves and picking things up informally. Or, instead of getting on with the task at hand, course instructors are forced to spend time bringing people up to speed about where that document went, or what a file is, or why the stupid thing stopped working just now. In the worst case, we never get a feel for this stuff at all and end up marinating in an admixture of magical thinking and sour resentment towards the machines we sit in front of for hours each day, and will likely sit in front of for the rest of our careers.

Figure 1: A working scale replica of a Digital Equipment Corporation PDP-11/70, one of the mainstays of computing in the 1970s. As annoying as your laptop undoubtedly is, be grateful that you do not have to program this machine using the bank of switches at the front by reading the blinkenlights.

All of that is bad. This course is meant to help. While the coding and data analysis tools we have are powerful, they are also kind of a pain in the neck. For the most part they are made to allow us to know what we did. They can be opened up to have their history and inner workings examined if needed. This runs against the grain of the devices we use most often—our phones—which do not work in that way at all. As a rule, apps on your phone hide their implementation details from you and do not want you to worry too much about where things are stored or how things are accomplished or what happens if you need to do the same thing again later. They do that for very good reasons. But it does mean that even if you use a powerful computer constantly, as we almost all do now, it does not give you much of a grip on how more technical computing works. To the contrary, it makes it look strange and annoying and deliberately confusing.

The fragmented, messy, and multifacted tasks associated with scholarly research (whatever your preferred methodological approach) make heavy demands on software. Most of them have to do with the need for control over what you are doing, and especially the importance of having a record of what you did that you can revisit and reproduce if needed. They also need to allow us to track down and diagnose errors. Because our research work is fragmented and messy, this can often be a tricky process to think clearly about and work through in a systematic way.

To help address these challenges, modern computing platforms provide us with a suite of powerful, modular, specialized tools and techniques. The bad news is that they are not magic; they cannot do our thinking for us. The good news is that they are by now very old and very stable. Most of them are developed in the open. Many are supported by helpful communities. Almost all are available for free. Nearly without exception, they tend to work through the medium of explicit instructions written out in plain text. In other words they work by having you write some “code”, in the broadest sense. People who do research involving structured data of any kind should become familiar with these tools. Lack of familiarly with the basics encourages bad habits and unhealthy attitudes among the informed and uninformed alike, ranging from misplaced impatience to creeping despair.

What we’ll cover

We have twelve class weeks for this seminar, excluding Fall Break and Thanksgiving. During that time we will learn some elements of plain-text computing that every graduate student in the social sciences (and beyond!) should know something about. They are:

Topic
Week 1 Aug 27 Big Picture: Doing your work properly
Week 2 Sep 3 The file system; the shell; the terminal
Week 3 Sep 10 Editing text: Text editors; regular expressions
Week 4 Sep 17 Your data workbench I: R, RStudio, and Quarto
Week 5 Sep 24 Your data workbench II: How R thinks; tidy data
Week 6 Oct 1 Version Control: git and GitHub
Week 7 Oct 8 Ingest data: Getting stuff in and out of R
Week 8 Oct 15 No class (Fall break)
Week 9 Oct 22 Databases
Week 10 Oct 29 Iterate on data: functional programming patterns
Week 11 Nov 5 Supercharged iteration: Parallel computing
Week 12 Nov 12 Look at data: Graphs, ggplot, and the grammar of graphics
Week 13 Nov 19 Reproducible results: build systems, environments, and packages
Week 14 Nov 26 No class (Thanksgiving)

Throughout the seminar we will move back and forth between two perspectives. First, and most concretely, we will learn about specific tools and various tricks associated with using them. That’s the stuff mentioned in the tag lines. At this level we will focus on examples that come up in our everyday work. As we shall see, each particular thing we learn about is a means of doing something useful. But second, and more generally, we will try to develop a way of thinking. That’s the idea in the header. We don’t need to learn every tool in the box right away. There are far too many of them to even try, in any case. Rather, we will try to develop the ability—and fortitude—to learn how to learn more. We want to cultivate an attitude of determined curiosity that will help us solve problems as they (inevitably) arise, even when those problems are (undeniably) frustrating.

Consult the course schedule page for more detail on weekly topics, readings, and assignments.