Modern Plain Text Computing

This is the course website for the Fall 2025 seminar SOCIOL 703, Modern Plain Text Computing, taught at Duke University by Kieran Healy. It is required for first-year students in the department and open to others who are interested in the material.

A working scale model of a DEC PDP 11/70. — Figure 1: A working scale replica of a Digital Equipment Corporation PDP-11/70, one of the mainstays of computing in the 1970s.

Motivation

As researchers and scholars we depend on software to get our work done. But often, we do not know enough about how our computers work. Nor are we encouraged to reflect on why they work the way they do, or given any basic grounding in such matters as part of our training. Instead we end up fending for ourselves and pick things up informally. Or, instead of getting on with the task at hand, course instructors are forced to spend time quickly bringing people up to speed about where that document went, or what a file is, or why “that didn’t work” just now. In the worst case, we never get a feel for this stuff at all and end up marinating in an admixture of magical thinking about and sour resentment towards the machines we sit in front of for hours each day, and will likely sit in front of for the rest of our careers.

All of that is bad. This course is meant to help. The coding and data analysis tools we have are powerful, but the way the work tends to run against the grain of the devices we use most often: our phones. As a rule, apps on your phone hide their implementation details from you. They do not want you to worry too much about where things are stored, or how things are accomplished, or what happens if you need to do the same thing again later. They don’t talk to each other much, either. The fragmented and multifacted tasks associated with scholarly research, meanwhile, make distinctive demands on software. Most of them have to do with the need for control over what you are doing, and especially the importance of having a record of what you did that you can revisit and reproduce if needed. They also need to let us track down and diagnose errors. And they must assist us in pulling disparate pieces of a project together into a presentable final product like a talk, an article, or a book. This can be a tricky process to think about and manage in a systematic way.

To address these challenges, modern technical computing platforms provide us with a suite of powerful, modular, composable tools and techniques. The bad news is that they are not magic; they cannot do our thinking for us. The good news is that they are stable and reliable. Many are supported by helpful communities. Most are developed and improved in the open. Almost all are available for free. Nearly without exception, they tend to work through the medium of explicit, structured instructions written out in plain text. In other words they work by having you write some code, in the broadest sense. People who do research involving structured data of any kind should become familiar with these tools. Lack of familiarly with the basics encourages bad habits and unhealthy attitudes ranging from misplaced contempt to creeping despair.

Throughout the seminar we will move back and forth between two perspectives. First, we will learn about specific tools and tricks associated with using them. Concretely we will learn how to use the file system, the terminal, a text editor, and a programming language and development environment oriented to the analysis of tabular data. At this level we will focus on examples that come up in our everyday work. But second, we will try to develop a way of thinking about what we are doing. We don’t need to learn every tool in the box right away. There are far too many of them to even try, in any case. Rather, we will try to understand why these tools work the way they do, and why the approach they embody is so useful. In the process we will cultivate an attitude of determined curiosity that will help us notice and solve problems in our work as they (inevitably) arise, even when they are (undeniably) frustrating.

What we’ll cover

We have twelve class weeks for this seminar, excluding Labor Day and Fall Break. During that time we will learn some elements of plain-text computing that every graduate student in the social sciences (and beyond!) should know something about. They are shown in Table 1.

Table 1: Overview of the course schedule.

Week	Date	Topic
1	Aug 25	Big Picture: Doing your work properly
2	Sep 1	No class (Labor Day)
3	Sep 8	The file system; the shell; the terminal
4	Sep 15	Using R to look at data
5	Sep 22	Tidy data and dplyr
6	Sep 29	More on dplyr, tidyr, and related packages
7	Oct 6	Ingesting and cleaning data
8	Oct 13	No class (Fall break)
9	Oct 20	Version Control: git and GitHub
10	Oct 27	Iterating on Data and Models
11	Nov 3	Databases and APIs
12	Nov 10	Parallel Computing
13	Nov 17	Helpers, build systems, and environments
14	Nov 24	Packages, and LLMs

Consult the course schedule page for more detail on weekly topics, readings, and assignments. The syllabus has more information on expectations, requirements, course polices, and assessment for the seminar.