Your Computer

Modern Plain Text Social Science: Week 1

Kieran Healy

September 30, 2023

Motivation

Technical Computing is frustrating

Can we make it fun?

No.

  ⇦ Not this much fun, at any rate

OK but can we eliminate frustration?

Also no.

(Sorry.)

But we can make it work

Two Revolutions in Computing

What everyday computing is now

  • Touch-based user interface
  • Foregrounds a single application
  • Dislikes multi-tasking*
  • Hides the file system

Multitasking: I mean, “Making different specialized applications and resources work together in the service of a single but multi-dimensional project”, not “Checking social media while also listening to a talk and waiting for an update from the school nurse.”

Where technical computing lives

  • Windows and pointers.
  • Multi-tasking, multiple windows.
  • Exposes and leverages the file system.
  • Many specialized tools in concert.
  • Underneath, it’s the 1970s, UNIX, and the command-line.

Where technical computing lives

  • This toolset is by now really good!
  • Free! Open! Powerful!
  • Friendly communities! Lots of information! Many resources!
  • But: grounded in a UI paradigm that is increasingly far away from the everyday use of computing devices
  • So why do we use this stuff?

Control, not Productivity

Productivity is great and everything, but not why we do all this

 

The most important thing is to be able to know and show what it was that you did

“Office” vs “Engineering” approaches

What is “real” in your project?

What is the final output?

How is it produced?

How are changes managed?

Different Answers

Office model

  • Formatted documents are real.
  • Intermediate outputs are cut and pasted into documents.
  • Changes are tracked inside files.
  • Final output is often in the same format you’ve been working in, e.g. a Word file, or a PDF.

Engineering model

  • Plain-text files are real.
  • Intermediate outputs are produced via code, often inside documents.
  • Changes are tracked outside files, at the level of a project.
  • Final outputs are assembled programatically and converted to some desired format.

Different strengths and weaknesses

Office model

  • Everyone knows Word, Excel, or Google Docs.
  • “Track changes” is powerful and easy.
  • Hm, I can’t remember how I made this figure
  • Where did this table of results come from?
  • Paper_edits_FINAL_kh-1.docx

Engineering model

  • Plain text is highly portable.
  • Push button, recreate analysis.
  • JFC Why can’t I do this simple thing?
  • Object of type 'closure' is not subsettable

Each approach generates solutions to its own problems

The File System

The traditional analog

You may never have actually used one of these.

The file cabinet!

The file cabinet!

Index cards

Index cards

Automating information processing

Automating information processing

Automating information processing

Hollerith machines

Hollerith Machines

Hollerith machines

Hollerith machines

Hollerith Operators

Hollerith Operators

IBM punch cards

IBM punch cards

Big Iron

Storage

Storage

Input/Output

A late-model teletype (TTY) machine

Input/Output

The DEC VT-100 Terminal

Input/Output

Back to the file system

File system hierarchy

Standard locations

  • / : root. Everything lives inside or under the root.
  • /bin/ : For binaries. Core user executable programs and tools.
  • /sbin/ : System binaries. Essential executables for the super user (who is also called root)
  • /lib/ : Support files for executables.
  • /usr/ : Conventionally, stuff installed “locally” for users in addition to the core system. Will contain its own bin/ and lib/ subdirs.
  • /usr/local : Files that the local user has compiled or installed
  • /opt/ : Like /usr/, another place for locally installed software to go.

Standard locations

  • These locations get mapped together in the $PATH, which is an environment variable that tells the system where executables can be found.
 echo $PATH
/home/kjhealy/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
  • Delimited by : and searched in order from left to right.
  • To learn where a command is being executed from, use which
 which R
/usr/local/bin/R

Standard locations

  • / : root. Everything lives inside or under the root.
  • /bin/ : For binaries. Core user executable programs and tools.
  • /sbin/ : System binaries. Essential executables for the super user (who is also called root)
  • /lib/ : Support files for executables.
  • /usr/ : Conventionally, stuff installed “locally” for users in addition to the core system. Will contain its own bin/ and lib/ subdirs.
  • /usr/local : Files that the local user has compiled or installed
  • /opt/ : Like /usr/, another place for locally installed software to go.
  • /etc/ : Editable text configuration. Config files often go here.

Standard locations

  • /home/ or /Users/ : Where the accounts of individual system users live, like /Users/kjhealy/
 pwd
/home/kjhealy
  ls
bin  certbot.log  logrotate.conf  old  projects  public  staging

File system hierarchy

  • An edited version of the root / tree
├── Applications
├── bin
├── cores
├── dev
├── etc -> private/etc
├── home -> /System/Volumes/Data/home
├── Library
├── opt
│  ├── homebrew
├── private
│  ├── etc
│  ├── tftpboot
│  ├── tmp
│  └── var
├── sbin
├── System
├── tmp -> private/tmp
├── Users
│  ├── kjhealy
│  └── Shared
├── usr
│  ├── bin
│  ├── lib
│  ├── libexec
│  ├── local
│  ├── sbin
│  ├── share
│  ├── standalone
├── var -> private/var
└── Volumes

File system hierarchy

  • An edited version of the user or home tree
├── Applications
├── bin
├── Box
├── Creative Cloud Files
├── Desktop
├── Documents
│  ├── bibs -> /Users/kjhealy/Library/texmf/bibtex/bib
│  ├── bookdown
│  ├── comments
│  ├── completed
│  ├── courses
│  ├── data
│  ├── letters
│  ├── misc
│  ├── nonsense
│  ├── ordinal-society
│  ├── papers
│  ├── sites
│  ├── source
│  ├── talks
│  ├── teaching
│  ├── templates
│  ├── vita
├── Downloads
├── Dropbox
├── Library
├── Movies
├── Music
├── Pictures
├── Public
├── scratch
├── tmp
└── Zotero

Path rules

  • If the path name begins with /, it is an absolute path, starting from the root.
  • If the path name begins with ~, it will usually be expanded into an absolute path name starting at your home directory (~).

Path rules

  • If the pathname does not begin with a / or ~
    • The path name is relative to the current directory. Two relative special cases use entries that are in every Unix directory:
      1. If the path name begins with ./, the path is relative to the current directory, e.g., ./textfile, though this can also execute the file if it is given executable file permissions.
      2. If the path name begins with ../, the path is relative to the parent of the current directory. For example, if your current directory is /Users/kjhealy/Documents/papers then ../data means /Users/kjhealy/Documents/data

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

In Unix systems there are three kinds of owner: the user (here kjhealy), the group (here staff), and others or other users on the system.

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

Three things you can do:

read

write

execute

  • For files, “read” means open; “write” means edit, save, or delete; “execute” means run if it’s an application or script.
  • For directories, “read” means list contents with ls, “write’ means create, delete, or rename;”execute” means access or enter using cd

File permissions

❯ ls -l README.md

-rw-r--r--@ 1 kjhealy staff 1210 Aug 15 20:29 README.md

These permissions say rw-r--r-- or

  • The user can rw- read and write this file
  • The group can r-- read this file
  • The world can r-- read this file

Executable permissions are irrelevant here because it’s a text file.

File permissions

  • We change file permissions with the chmod command. So e.g. chmod 644 README.md means “change the permissions to rw-r--r--”.

The Unix way of thinking

Stepping back

  • Your computer stores files and runs commands
  • The files are stored in a large hierarchy
  • The commands are many and varied but there’s a core set of them that are most useful
  • Unix commands and utilities generally try to do a specific thing to files or running processes
  • The Unix conception of a ‘file’ is very flexible. Connections to other computers or processes can act like files.
  • Unix commands are often composable using pipes

The Unix pipe

  • Unix commands work with some input and may produce some output
  • Unix systems have the concepts of “standard input”, “standard output”, and “standard error” as streams where things come from, where they go to, and where problems are reported.

The Unix pipe

  • More on this next week, but for now think about the output of a command like ls:
ls 
R
README.md
README.qmd
README_files
_extensions
_freeze
_quarto.yml
_site
_targets
_targets.R
_variables.yml
about
assets
assignment
content
data
deploy.sh
example
files
html
index.html
index.qmd
mptc.Rproj
renv
renv.lock
renv.lock.orig
schedule
site_libs
slides
syllabus

The Unix pipe

We can send, or pipe, this output to another command, instead of to the terminal:

ls | wc -l
      30
  • The wc command counts the number of words in a file, or in whatever is sent to it via STDIN.
  • The -l switch to wc means ‘just count lines instead of words’

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 ls -lh access.log
-rw-r--r-- 1 root root 7.0M Aug 29 16:00 access.log
 head access.log
192.195.49.31 - - [27/Aug/2023:00:01:11 +0000] "GET / HTTP/1.1" 200 19219 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /libs/tufte-css-2015.12.29/tufte.css HTTP/1.1" 200 2025 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /libs/tufte-css-2015.12.29/envisioned.css HTTP/1.1" 200 888 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/tablesaw-stackonly.css HTTP/1.1" 200 1640 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/nudge.css HTTP/1.1" 200 1675 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/sourcesans.css HTTP/1.1" 200 1492 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/jquery.js HTTP/1.1" 200 30464 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/tablesaw-stackonly.js HTTP/1.1" 200 2996 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/nudge.min.js HTTP/1.1" 200 937 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
52.13.187.67 - - [27/Aug/2023:00:01:13 +0000] "GET /dataviz-pdfl_files/figure-html4/ch-03-fig-lexp-gdp-10-1.png HTTP/1.1" 200 308830 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 head access.log | awk '// {print $11}'

"https://www.google.com/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"-"

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 awk '// {print $11}' access.log | sort | uniq -c | sort -nr | head -n 15

   9729 "https://socviz.co/lookatdata.html"
   4851 "-"
   4212 "https://socviz.co/"
   1719 "https://socviz.co/makeplot.html"
   1477 "https://bookdown.org/"
   1466 "https://socviz.co/gettingstarted.html"
   1373 "https://socviz.co/groupfacettx.html"
    864 "https://socviz.co/workgeoms.html"
    794 "https://socviz.co/maps.html"
    733 "https://socviz.co/refineplots.html"
    671 "https://socviz.co/index.html"
    349 "https://socviz.co/appendix.html"
    228 "https://socviz.co/modeling.html"
    153 "https://www.google.com/"
     50 "http://vissoc.co/"

The Unix pipe

We can do a lot with a pipeline:

curl -s 'http://api.citybik.es/v2/networks/citi-bike-nyc' |
   jq '.network.stations[].free_bikes' | 
  gpaste -sd+ | 
  bc
36747

This is the number of Citi Bikes available in New York City at the time these slides were made.

Things to do for next week

Setup

On a Mac

  • Install macOS command line tools: From the command line, run xcode-select –install
  • Install the Homebrew package manager. See https://brew.sh

On Windows

  • Install the Windows Subsystem for Linux: https://learn.microsoft.com/en-us/windows/wsl/install
  • Or install Cygwin: https://www.cygwin.com