Doing Your Work Properly

Modern Plain Text Computing
Week 01

Kieran Healy

October 7, 2024

Motivation

We depend on our computers

Technical computing is frustrating

Can we make it fun?

No.

  ⇦ Not this much fun, at any rate

OK but can we eliminate frustration?

Also no.

(Sorry.)

But we can make it work

Bigger picture

Science is hard

  • Scientific research is difficult.
  • It depends on norms that entail a lot of trust.
  • Doing it badly or fraudulently is too easy.
  • We have tools to help us do the right thing.
  • (But our tools can’t make us do the right thing.)

This course

What it’s not

  • This is not a statistics or quantitative methods course.
  • Neither is it a course in the logic of social-science research.
  • Other courses in the program teach you that stuff.

What it is

  • Our goal is more humble but also smooths the path for the other stuff.
  • You’ll learn a way of thinking and a set of technical tools for managing your own research, whatever that research is.
  • This toolkit is powerful, extensible, versatile, and extremely widespread, far beyond the world of social-scientific research.

The Whole Game

A tiny research project

Features of our tiny paper

  • It’s a file!
  • Of a particular type (a PDF)
  • You can’t easily edit it
  • It has a title, author, sections, a figure, tables, references, a footnote, and a bibliography. That is, it has many of the elements of a scholarly paper.

A slightly different version

Features of this version

  • It’s also a file!
  • Of a particular type (still a PDF)
  • You still can’t easily edit it
  • It has all the stuff in the original version plus a bunch of code that we can now see that was not shown before.

Have you done this part yet?

Week 01 Assignment: Install R and RStudio

The project’s GitHub page

  • Go here and download it: https://github.com/kjhealy/mptc_oecd
  • We’re going to open it in RStudio, take a look at the various parts of the project, and turn it into a document.
  • Don’t worry at this point if you’re not sure what’s happening, or what GitHub or RStudio are.

(Sound of Engine Trying to Start)

This slide is up because we are installing the required software by following the instructions handed out earlier. In other words, we are discovering the inevitable idiosyncrasies of everyone’s individual setup, the vagaries of various operating systems, the intrinsic difficulty of following documented steps in a procedure, the hidden bits of implicit knowledge or not-fully-articulated steps that are nevertheless necessary, the high prevalence of ordinary error and failure in everyday life, and the awful grip of chance on human affairs in general.

Main Ideas

  • Our little scholarly article is a file in some format.
  • We created or rendered it from several other pieces when we pushed that ⇨Render button.
  • Some of these pieces include: data, text, markup, and code.
  • Most of these pieces are in plain text, in files of their own.
  • They are stored in some sort of orderly fashion somewhere.
  • We have some sort of engine that assembled the pieces into the article.
  • We have some kind of application to help us run the engine and manage the pieces.
  • We can reliably produce and reproduce the document in various formats.

That’s a lot all at once

  • We’re going to back up and go through these pieces slowly.
  • Keep in mind why we’re doing it (we want to reliably produce a scholarly paper).
  • Also start thinking about why the tools we’re using might look like this.

Two Revolutions in Computing

What everyday computing is now

  • Touch-based user interface.
  • Foregrounds a single application.
  • Dislikes multi-tasking.*
  • Hides the file system.
  • Very far underneath, it’s often the 1970s, UNIX, and the command-line. But usually you don’t get to see this.

Multi-tasking: I mean, “Making different specialized applications and resources work together in the service of a single but multi-dimensional project”, not “Checking social media while also listening to a talk and waiting for an update from the school nurse.”

Where technical computing lives

  • Most interaction via windows, icons, pointers, clicking.
  • Multi-tasking via multiple windows at once is standard.
  • Exposes and leverages the file system.
  • Using several specialized applications in concert is common.
  • Underneath, it’s often the 1970s, UNIX, and the command-line. And you can get to see this.

Where technical computing lives

  • For technical computing in the sense of doing “statistics” this toolset is by now really good.
  • It’s also very good for technical computing in the sense that all scholarly work is technical.
  • But these tools are grounded in a paradigm that is increasingly far away from the everyday use of our most common computing devices.
  • So why do we continue to use and develop them?

Control, not Productivity

Productivity is great and everything, but not why we do all this.

 

The most important thing is to be able to confidently know and clearly show what it was that you did in the service of doing your work properly.

“Office” vs “Engineering” approaches

The challenge of “Knowing and Showing” gives rise to questions like these:

What is “real” in your project?

What is the final output?

How is it produced?

How are changes managed?

Can you do it again?

Different Answers

Office model

  • Formatted documents are real.
  • Intermediate outputs are cut and pasted into documents.
  • Changes are tracked inside files.
  • Final output is often in the same format you’ve been working in, e.g. a Word file, or a PDF.

Engineering model

  • Plain-text files are real.
  • Intermediate outputs are produced via code, often inside documents.
  • Changes are tracked outside files, at the level of a project.
  • Final outputs are assembled programatically and converted to some desired format.

Different strengths and weaknesses

Office model

  • Documents look like documents.
  • Everyone knows Word, Excel, or Google Docs.
  • “Track changes” is powerful and easy in a single document.
  • Hm, I can’t remember how I made this figure.
  • Where did this table of results come from?
  • Paper_edits_FINAL_kh-1.docx

Engineering model

  • Plain text is highly portable.
  • Push button, recreate analysis.
  • Project fully version-controlled.
  • Tables and figures produced and integrated programatically.
  • For the love of God, why can’t I do this simple thing?
  • Object of type 'closure' is not subsettable

Each approach generates solutions to its own problems