The File System & the Shell

Modern Plain Text Computing
Week 03b

Kieran Healy

September 2025

Files

Files

  • A file is just a stream of bytes, or data, some sort of resource that a program can read or interact with.
  • Files have a location in the file system.
  • In the UNIX way of thinking, “Everything is a file
  • That is, lots of things that are not normally thought of as files (such as printers, or terminal screens, or connections to other computers) can be thought of as living in a named place somewhere in the filesystem.
  • The basic set of UNIX utilities can be thought of as tools that accept “files” (as a standard stream of input data), perform some specific action on them (read, print, move, copy, delete, count lines, find text, whatever) and then return a standard stream of output data that can be sent somewhere, e.g. to a terminal display, or used as input to another command, or become a file of its own.

File system hierarchy

Path conventions

  • / represents a division in the file hierarchy. You can think of it as a branch point on a tree, or as a new level of nesting in a series of boxes, or as the action “Go inside” or “Enter”.

  • On a Unix-like system, a full path to a file looks like this:

/Users/kjhealy/Documents/courses/mptc/slides/01b-slides.qmd

“Go inside the ‘Users’ folder, then inside the ‘kjhealy’ folder, then inside ‘Documents’ then inside ‘courses’ then ‘mptc’ then ‘slides’ and you will find the file 01b-slides.qmd.”

Standard Unix locations

  • / : root. Everything lives inside or under the root.
  • /bin/ : For binaries. Core user executable programs and tools.
  • /sbin/ : System binaries. Essential executables for the super user (who is also called root)
  • /lib/ : Support files for executables.
  • /usr/ : Conventionally, stuff installed “locally” for users in addition to the core system. Will contain its own bin/ and lib/ subdirs.
  • /usr/local : Files that the local user has compiled or installed
  • /opt/ : Like /usr/, another place for locally installed software to go.

Standard Unix locations

  • These locations get mapped together in the $PATH, which is an environment variable that tells the system where executables can be found.
 echo $PATH
/home/kjhealy/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
  • Delimited by : and searched in order from left to right.
  • To learn where a command is being executed from, use which
 which R
/usr/local/bin/R

Standard Unix locations

  • / : root. Everything lives inside or under the root.
  • /bin/ : For binaries. Core user executable programs and tools.
  • /sbin/ : System binaries. Essential executables for the super user (who is also called root)
  • /lib/ : Support files for executables.
  • /usr/ : Conventionally, stuff installed “locally” for users in addition to the core system. Will contain its own bin/ and lib/ subdirs.
  • /usr/local : Files that the local user has compiled or installed
  • /opt/ : Like /usr/, another place for locally installed software to go.
  • /etc/ : Editable text configuration. Config files often go here.

Standard Unix locations

  • /home/ or /Users/ : Where the accounts of individual system users live, like /Users/kjhealy or /home/kjhealy
 pwd
/home/kjhealy
  ls
bin  certbot.log  logrotate.conf  old  projects  public  staging
  • All of this is a matter of more or less established convention that varies by particular operating systems. E.g. on most Linux systems, individual user directories live in /home. On macOS they live in /Users. Windows is different again (and uses \ for file paths rather than /.)

File system hierarchy

  • An edited version of the root, /, or top of my Mac’s file system tree:
├── Applications
├── bin
├── cores
├── dev
├── etc -> private/etc
├── home -> /System/Volumes/Data/home
├── Library
├── opt
  ├── homebrew
├── private
  ├── etc
  ├── tftpboot
  ├── tmp
  └── var
├── sbin
├── System
├── tmp -> private/tmp
├── Users
  ├── kjhealy
  └── Shared
├── usr
  ├── bin
  ├── lib
  ├── libexec
  ├── local
  ├── sbin
  ├── share
  ├── standalone
├── var -> private/var
└── Volumes

File system hierarchy

  • An edited version of the User or home tree, i.e. everyting inside /Users/kjhealy on my Mac:
├── Applications
├── bin
├── Box
├── Creative Cloud Files
├── Desktop
├── Documents
  ├── bibs -> /Users/kjhealy/Library/texmf/bibtex/bib
  ├── bookdown
  ├── comments
  ├── completed
  ├── courses
  ├── data
  ├── letters
  ├── misc
  ├── nonsense
  ├── ordinal-society
  ├── papers
  ├── sites
  ├── source
  ├── talks
  ├── teaching
  ├── templates
  ├── vita
├── Downloads
├── Dropbox
├── Library
├── Movies
├── Music
├── Pictures
├── Public
├── scratch
├── tmp
└── Zotero

Local and Remote Files

Local Files

  • So far we’ve been working with files on our own computer. These local files live somewhere in the file system on our own computer.

  • We’re also mostly going to be confining ourselves, in any particular project, to files that are in or under our project directory. Like in the mptc_oecd project. While we’re in an R session and working with mptc_oecd, we think of the project directory as our working directory, and the top of the project directory as the root of our little system of files and folders.

  • So data-raw/countries_iso3.tsv is a file that lives in the data-raw folder inside the project directory. mptc_oecd.qmd lives at the top level of the project directory.

  • But files can also be located remotely, on other computers, and we can access them over the internet or a network.

Remote Files: URLs

  • A URL or Uniform Resource Locator is a kind of address that locates a resource on the internet. It is, in effect, a path to a file that lives on another computer somewhere, one that is accessible by us (or by the public in general).

Remember, there’s no such thing as The Cloud, it’s just Someone Else’s Computer

Remote Files: URLs

Remote Files: URLs

  • As you can see, a URL is just a file path, apart from the https://kieranhealy.org bit at the start that tells your computer which webserver to connect to.

  • You might wonder why paths to folders, like https://kieranhealy.org/publications/ appear in your browser as a web page. This is because the site is set up to serve a default file, usually called index.html, when you ask for a folder.

  • Can we get remote files via the Terminal or command line? Of course we can.

Curl

  • The address https://kjhealy.co/mptc/ shows a directory with some files in it. One is called mortality.txt. We use the curl command:
curl https://kjhealy.co/mptc/mortality.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 16160  100 16160    0     0  32509      0 --:--:-- --:--:-- --:--:-- 32449
England and Wales, Total Population, Death rates (period 1x1),  Last modified: 02 Apr 2018;  Methods Protocol: v6 (2017)

  Year          Age             Female            Male           Total
  1841           0             0.136067        0.169189        0.152777
  1841           1             0.059577        0.063208        0.061386
  1841           2             0.036406        0.036976        0.036689
  1841           3             0.024913        0.026055        0.025480
  1841           4             0.018457        0.019089        0.018772
  1841           5             0.013967        0.014279        0.014123
  1841           6             0.010870        0.011210        0.011040
  1841           7             0.008591        0.008985        0.008788
  1841           8             0.006860        0.007246        0.007053
  1841           9             0.005772        0.006050        0.005911
  1841          10             0.005303        0.005382        0.005343
  1841          11             0.005114        0.005002        0.005057
  1841          12             0.005145        0.004856        0.004999
  1841          13             0.005455        0.004955        0.005202
  1841          14             0.006014        0.005341        0.005675
  1841          15             0.006733        0.005992        0.006365
  1841          16             0.007460        0.006661        0.007067
  1841          17             0.008137        0.007249        0.007700
  1841          18             0.008678        0.007782        0.008238
  1841          19             0.008889        0.008259        0.008583
  1841          20             0.008736        0.008672        0.008706
  1841          21             0.008735        0.009032        0.008874
  1841          22             0.008926        0.009316        0.009108
  1841          23             0.009077        0.009508        0.009278
  1841          24             0.009335        0.009679        0.009496
  1841          25             0.009743        0.009831        0.009784
  1841          26             0.010019        0.009909        0.009967
  1841          27             0.010069        0.009892        0.009984
  1841          28             0.010078        0.009845        0.009967
  1841          29             0.010058        0.009734        0.009903
  1841          30             0.010032        0.009570        0.009809
  1841          31             0.010047        0.009512        0.009788
  1841          32             0.010130        0.009602        0.009875
  1841          33             0.010294        0.009790        0.010050
  1841          34             0.010794        0.010282        0.010545
  1841          35             0.011777        0.011185        0.011487
  1841          36             0.012544        0.011932        0.012243
  1841          37             0.012791        0.012277        0.012538
  1841          38             0.012935        0.012537        0.012739
  1841          39             0.012671        0.012456        0.012565
  1841          40             0.011985        0.012011        0.011998
  1841          41             0.011661        0.011885        0.011771
  1841          42             0.011779        0.012171        0.011971
  1841          43             0.012007        0.012591        0.012294
  1841          44             0.012666        0.013506        0.013079
  1841          45             0.013950        0.015157        0.014543
  1841          46             0.015067        0.016646        0.015842
  1841          47             0.015527        0.017382        0.016438
  1841          48             0.015816        0.017909        0.016844
  1841          49             0.015658        0.017987        0.016798
  1841          50             0.014998        0.017556        0.016242
  1841          51             0.014810        0.017530        0.016128
  1841          52             0.015288        0.018136        0.016667
  1841          53             0.016006        0.018981        0.017445
  1841          54             0.017819        0.021051        0.019382
  1841          55             0.021757        0.025512        0.023574
  1841          56             0.025524        0.029715        0.027553
  1841          57             0.027171        0.031453        0.029241
  1841          58             0.028662        0.032979        0.030747
  1841          59             0.028376        0.032595        0.030405
  1841          60             0.026111        0.030069        0.028003
  1841          61             0.025699        0.029516        0.027516
  1841          62             0.027376        0.031227        0.029206
  1841          63             0.029374        0.033285        0.031229
  1841          64             0.033268        0.037533        0.035284
  1841          65             0.041038        0.046343        0.043530
  1841          66             0.048485        0.054821        0.051443
  1841          67             0.052247        0.058897        0.055341
  1841          68             0.056042        0.063048        0.059289
  1841          69             0.058426        0.065157        0.061546
  1841          70             0.058666        0.064278        0.061280
  1841          71             0.060660        0.065868        0.063087
  1841          72             0.065270        0.070988        0.067922
  1841          73             0.070312        0.076652        0.073236
  1841          74             0.078804        0.085351        0.081821
  1841          75             0.094455        0.100040        0.097049
  1841          76             0.109350        0.114295        0.111654
  1841          77             0.118135        0.124276        0.120976
  1841          78             0.128041        0.135812        0.131607
  1841          79             0.125836        0.136846        0.130802
  1841          80             0.123423        0.137300        0.129580
  1841          81             0.134875        0.150926        0.141946
  1841          82             0.147990        0.166114        0.155927
  1841          83             0.160912        0.182608        0.170311
  1841          84             0.174643        0.201489        0.186101
  1841          85             0.184144        0.202559        0.191942
  1841          86             0.198383        0.217986        0.206623
  1841          87             0.213268        0.234232        0.221901
  1841          88             0.228774        0.251293        0.237842
  1841          89             0.244869        0.269194        0.254494
  1841          90             0.261470        0.287917        0.271734
  1841          91             0.278553        0.307488        0.289546
  1841          92             0.295980        0.327873        0.307813
  1841          93             0.313596        0.349042        0.326412
  1841          94             0.331448        0.371035        0.345362
  1841          95             0.349350        0.393954        0.364550
  1841          96             0.367162        0.417644        0.383799
  1841          97             0.384787        0.442191        0.403031
  1841          98             0.402606        0.468861        0.422833
  1841          99             0.420501        0.496786        0.442828
  1841         100             0.439020        0.526291        0.463433
  1841         101             0.458325        0.566260        0.486829
  1841         102             0.475390        0.621170        0.511000
  1841         103             0.499132        0.689155        0.541891
  1841         104             0.537371        0.905983        0.605360
  1841         105             0.576967        1.727848        0.700373
  1841         106             0.677711        6.000000        0.795287
  1841         107             0.900000              .        0.900000
  1841         108             1.388430              .        1.388430
  1841         109                   .              .              .
  1841         110+                  .              .              .
  1842           0             0.148491        0.184007        0.166481
  1842           1             0.063038        0.066596        0.064818
  1842           2             0.035203        0.035854        0.035527
  1842           3             0.023901        0.024521        0.024209
  1842           4             0.018041        0.018043        0.018042
  1842           5             0.013951        0.013864        0.013908
  1842           6             0.010678        0.010738        0.010708
  1842           7             0.008256        0.008460        0.008358
  1842           8             0.006617        0.006914        0.006765
  1842           9             0.005567        0.005860        0.005714
  1842          10             0.005113        0.005278        0.005195
  1842          11             0.004976        0.004939        0.004957
  1842          12             0.005034        0.004801        0.004916
  1842          13             0.005328        0.004876        0.005100
  1842          14             0.005896        0.005179        0.005534
  1842          15             0.006647        0.005713        0.006179
  1842          16             0.007311        0.006340        0.006830
  1842          17             0.007931        0.006948        0.007448
  1842          18             0.008499        0.007491        0.008004
  1842          19             0.008936        0.008001        0.008477
  1842          20             0.009051        0.008457        0.008763
  1842          21             0.008860        0.008790        0.008827
  1842          22             0.008842        0.009035        0.008933
  1842          23             0.008977        0.009172        0.009068
  1842          24             0.009072        0.009195        0.009129
  1842          25             0.009285        0.009198        0.009244
  1842          26             0.009695        0.009281        0.009499
  1842          27             0.009986        0.009331        0.009674
  1842          28             0.010062        0.009310        0.009703
  1842          29             0.010095        0.009277        0.009704
  1842          30             0.010098        0.009200        0.009668
  1842          31             0.010095        0.009095        0.009613
  1842          32             0.010131        0.009110        0.009637
  1842          33             0.010239        0.009284        0.009776
  1842          34             0.010429        0.009573        0.010014
  1842          35             0.010960        0.010179        0.010580
  1842          36             0.011943        0.011157        0.011559
  1842          37             0.012680        0.011954        0.012323
  1842          38             0.012921        0.012367        0.012648
  1842          39             0.013055        0.012682        0.012871
  1842          40             0.012774        0.012635        0.012706
  1842          41             0.012096        0.012230        0.012162
  1842          42             0.011785        0.012142        0.011960
  1842          43             0.011897        0.012435        0.012161
  1842          44             0.012124        0.012855        0.012483
  1842          45             0.012806        0.013781        0.013285
  1842          46             0.014142        0.015447        0.014782
  1842          47             0.015289        0.016910        0.016084
  1842          48             0.015796        0.017621        0.016691
  1842          49             0.016143        0.018130        0.017117
  1842          50             0.016024        0.018175        0.017076
  1842          51             0.015406        0.017724        0.016532
  1842          52             0.015275        0.017706        0.016451
  1842          53             0.015841        0.018322        0.017040
  1842          54             0.016657        0.019181        0.017876
  1842          55             0.018659        0.021327        0.019947
  1842          56             0.022853        0.025885        0.024317
  1842          57             0.026756        0.030086        0.028363
  1842          58             0.028627        0.031958        0.030233
  1842          59             0.030401        0.033693        0.031986
  1842          60             0.030271        0.033463        0.031802
  1842          61             0.028149        0.031162        0.029586
  1842          62             0.028077        0.030994        0.029463
  1842          63             0.030183        0.033109        0.031570
  1842          64             0.032690        0.035659        0.034095
  1842          65             0.037423        0.040723        0.038979
  1842          66             0.046365        0.050687        0.048390
  1842          67             0.054580        0.060028        0.057115
  1842          68             0.058874        0.064821        0.061632
  1842          69             0.063163        0.069724        0.066193
  1842          70             0.065687        0.072203        0.068697
  1842          71             0.065787        0.071417        0.068401
  1842          72             0.067912        0.073581        0.070544
  1842          73             0.072769        0.079502        0.075879
  1842          74             0.077971        0.086027        0.081670
  1842          75             0.087044        0.096147        0.091221
  1842          76             0.103859        0.113061        0.108113
  1842          77             0.119120        0.129111        0.123747
  1842          78             0.128146        0.140685        0.133906
  1842          79             0.138476        0.154186        0.145627
  1842          80             0.120815        0.140730        0.129639
  1842          81             0.131145        0.154466        0.141387
  1842          82             0.142925        0.169936        0.154683
  1842          83             0.156507        0.187015        0.169686
  1842          84             0.169478        0.205025        0.184629
  1842          85             0.180414        0.205264        0.190855
  1842          86             0.194978        0.219974        0.205430
  1842          87             0.210594        0.235190        0.220799
  1842          88             0.227254        0.250889        0.236857
  1842          89             0.245018        0.266959        0.253733
  1842          90             0.263947        0.283391        0.271535
  1842          91             0.284074        0.300092        0.290205
  1842          92             0.305495        0.316985        0.309801
  1842          93             0.328187        0.334059        0.330337
  1842          94             0.352075        0.351075        0.351718
  1842          95             0.377398        0.368133        0.374181
  1842          96             0.404121        0.385307        0.397782
  1842          97             0.432131        0.402336        0.422414
  1842          98             0.461650        0.419428        0.448358
  1842          99             0.493453        0.438463        0.476819
  1842         100             0.527615        0.457483        0.507261
  1842         101             0.566071        0.478862        0.541867
  1842         102             0.609062        0.514905        0.584548
  1842         103             0.651460        0.574333        0.633310
  1842         104             0.714510        0.662732        0.703632
  1842         105             0.827586        1.174935        0.875269
  1842         106             0.962669              .        0.962669
  1842         107             1.431953              .        1.431953
  1842         108             4.176471              .        4.176471
  1842         109                   .              .              .
  1842         110+                  .              .              .

The contents of the file just appear in the terminal window.

Curl

We can redirect it to a file instead:

mkdir tmp
curl https://kjhealy.co/mptc/mortality.txt > tmp/mortality.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 16160  100 16160    0     0  34390      0 --:--:-- --:--:-- --:--:-- 34456
ls -l tmp/
total 32
-rw-r--r--@ 1 kjhealy  staff  16160 Sep  3 13:50 mortality.txt
head tmp/mortality.txt
England and Wales, Total Population, Death rates (period 1x1),  Last modified: 02 Apr 2018;  Methods Protocol: v6 (2017)

  Year          Age             Female            Male           Total
  1841           0             0.136067        0.169189        0.152777
  1841           1             0.059577        0.063208        0.061386
  1841           2             0.036406        0.036976        0.036689
  1841           3             0.024913        0.026055        0.025480
  1841           4             0.018457        0.019089        0.018772
  1841           5             0.013967        0.014279        0.014123
  1841           6             0.010870        0.011210        0.011040

The Shell

What is it?

  • A shell is a way for you to tell the operating system to do things.
  • On Unix systems it’s the first user-facing thing to get off the ground during the startup/boot process.
  • The command line or command prompt is where you type instructions. Shells come with a collection of standard utilities—i.e., commands—that let you do things.
  • These utilities can be composed, chained or piped together to accomplish more complex tasks.
  • You can also write scripts, or little programs, that the shell can run for you.
  • Shell scripting languages are small interpreted programming languages that understand variables, command substitution, branching, and iteration.

There are many shells

  • Strictly speaking, GUI environments like Windows and the macOS Finder are shells, too. They’re graphical shells.
  • But “the shell” usually means a text-based interpreter that runs programs in response to typed commands.
  • The “original” Unix shell is sh.
  • Its most widely-used descendant is bash or the Bourne-Again Shell.
  • On macOS the default shell is the Z Shell or zsh.
  • Windows has the Command shell and PowerShell. (PowerShell does not follow Unix conventions.)
  • On Windows we can also use the Git Bash shell, which pretends to be just enough of a Unix shell to run Git commands.

A command interpreter

echo "Hello there"
Hello there
  • A shell waits for commands. When you supply them, it does what you tell it, or tells the relevant bit of the operating system to do what you said. Then it tells you what happened, and waits for the next command.
  • This way of interacting with a computer is sometimes called a REPL or Read-Eval-Print Loop.
  • Python and R work this way as well. They are sometimes called interpreted languages, meaning code is sent to an interpreter (the Python or R program) that runs it.
  • This is distinct from languages that must be compiled into executable machine code before they are run. Languages like C, Go, and Rust are in this category.
  • (This distinction does not really stand up to detailed scrutiny with modern languages.)

Getting around the file system

Who and where

Who am I?

whoami
kjhealy

Where am I?

# Print working directory
pwd
/Users/kjhealy/Documents/courses/mptc

Listing files

What is in here?

# List files
ls
_extensions
_freeze
_motivation.qmd
_quarto.yml
_site
_targets
_targets.R
_variables.yml
_weekly-schedule.qmd
00_dummy_files
about
assets
assignment
avhrr
content
data
deploy.sh
example
files
html
index.html
index.qmd
mptc.Rproj
R
README.md
README.qmd
renv
renv.lock
renv.lock.orig
schedule
seas
site_libs
slides
staging
syllabus

Path rules

  • If the path name begins with /, it is an absolute path, starting from the filesystem root.
  • If the path name begins with ~, it will usually be expanded into an absolute path name starting at your home directory (~).

Path rules

  • If the pathname does not begin with a / or ~ then the path name is relative to the current directory.

  • Two relative special cases use entries that are in every Unix directory:

    1. If the path name begins with ./, the path is relative to the current directory, e.g., ./textfile, though this can also execute the file if it is given executable file permissions.
    2. If the path name begins with ../, the path is relative to the parent of the current directory. For example, if your current directory is /Users/kjhealy/Documents/papers then ../data means /Users/kjhealy/Documents/data

File permissions

Who is using this file system anyway?

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

Unix derives from a world there there are multiple users and groups of users who are all using slices (in terms of processor time and available permanent storage) of a large central computer.

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

In Unix systems there are three kinds of owner: the user (here kjhealy), the group (here staff), and others or other users on the system.

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

Three things you can do to a file:

read

write

execute

  • For files, “read” means open; “write” means edit, save, or delete; “execute” means run if it’s an application or script.
  • For directories, “read” means list contents with ls, “write’ means create, delete, or rename;”execute” means access or enter using cd

File permissions

❯ ls -l README.md

-rw-r--r--@ 1 kjhealy staff 1210 Aug 15 20:29 README.md

 

These permissions say rw-r--r-- or

  • The user can rw- read and write this file
  • The group can r-- read this file
  • The world can r-- read this file

Executable permissions are irrelevant here because it’s a text file.

File permissions

  • We change file permissions with the chmod command. So e.g. chmod 644 README.md means “change the permissions to rw-r--r--”.

A Tree

├── schedule
├── staging
│   ├── example
│   ├── content
│   ├── assignment
│   ├── slides
├── example
├── R
├── content
├── assignment
├── html
│   ├── fonts
├── site_libs
│   ├── revealjs
│   ├── bootstrap
│   ├── quarto-html
│   ├── quarto-contrib
│   ├── quarto-nav
│   ├── quarto-search
│   ├── lightable-0.0.1
│   ├── kePrint-0.0.1
│   ├── clipboard
├── about
├── slides
│   ├── 03-slides-shell_files
│   ├── 00-slides-template_files
│   ├── fonts
├── syllabus
├── _extensions
│   ├── kjhealy
├── 00_dummy_files
│   ├── figure-revealjs
├── _site
├── files
│   ├── misc
│   ├── examples
│   ├── scripts
│   ├── bib
├── .git
│   ├── objects
│   ├── info
│   ├── logs
│   ├── hooks
│   ├── refs
│   ├── modules
├── .vscode
├── _targets
│   ├── meta
│   ├── objects
│   ├── user
│   ├── workspaces
│   ├── scratch
├── renv
│   ├── staging
│   ├── library
├── data
├── assets
│   ├── 03-editors
│   ├── 04-r
│   ├── 10-parallel
│   ├── 04-git
│   ├── 08-iterate
│   ├── 00-site
│   ├── 02-shell
│   ├── 01-file-system
│   ├── 07-ingest
│   ├── 12-build
│   ├── 11-graph
│   ├── 05-dplyr
│   ├── 06-build
├── _freeze
│   ├── schedule
│   ├── example
│   ├── content
│   ├── assignment
│   ├── site_libs
│   ├── slides
│   ├── index
├── .Rproj.user
│   ├── B6516D0D
│   ├── shared
├── .quarto
│   ├── quarto-session-temp9852b87545991403
│   ├── xref
│   ├── quarto-session-tempf1c6ab6924e0a539
│   ├── quarto-session-tempd8f6f14f45aac093
│   ├── quarto-session-tempda6eaafca18532aa
│   ├── quarto-session-temp121ff76bb2dbe79d
│   ├── quarto-session-temp50818f039413fd08
│   ├── quarto-session-temp28f16ffb5ccd091
│   ├── quarto-session-temp75d76c0c4cc7e1a0
│   ├── idx
│   ├── quarto-session-temp73492e3cde48296d
│   ├── quarto-session-tempbd38f5ebf57df76e
│   ├── quarto-session-temp21f1ab8171829829
│   ├── quarto-session-tempacc4a6fb545292f0
│   ├── quarto-session-temp5204cafe52e4ce99
│   ├── quarto-session-temp1996b734a330ba24
│   ├── quarto-session-temp7d1f02da51c301d1
│   ├── quarto-session-temp605afad12dbb726b
│   ├── quarto-session-temp8cff18670f1a8e44
│   ├── quarto-session-temp87a9ec432f2541a5
│   ├── quarto-session-tempe1ea9dddf0df0e43
│   ├── quarto-session-temp17d7574426678fc8
│   ├── quarto-session-temp104dd764154b9bab
│   ├── quarto-session-temp509d30e18303aabe
│   ├── quarto-session-temp242a4905d6fc63a0
│   ├── quarto-session-temp117c229103af8b19
│   ├── quarto-session-temp3d8c5c2921ea0781
│   ├── quarto-session-temp39a3492fb3221f61
│   ├── quarto-session-temp8452c5bede3e9fe5
│   ├── quarto-session-temp48fe514a4d74082c
│   ├── quarto-session-temp46729032bf0f89ab
│   ├── quarto-session-temp7ab5856e592f4648
│   ├── quarto-session-tempa4fa3232910cb2be
│   ├── quarto-session-temp3b780033d494b2c1
│   ├── quarto-session-temp14e99a674fd4111f
│   ├── quarto-session-temp2f9cac9e1d16b2b2
│   ├── quarto-session-temp79f0e945ce4a3178
│   ├── quarto-session-temp95935067308122c3
│   ├── quarto-session-temp646213e8194000e2
│   ├── quarto-session-temp1314496a6bbe8d42
│   ├── quarto-session-temp92a88148e5732b26
│   ├── quarto-session-tempbceaf75553900fc9
│   ├── quarto-session-tempd0b9362c1ce8ad01
│   ├── quarto-session-temp43efabf11b4e136d
│   ├── quarto-session-temp32febbcb1a6d962a
│   ├── quarto-session-tempdc7901ab5872e552
│   ├── quarto-session-tempb8dbf40539ec57bd
│   ├── quarto-session-temp776567b7ef71d67d
│   ├── quarto-session-temp74f11449a1216c9d
│   ├── quarto-session-temp5f3b11d5851bc3fe
│   ├── quarto-session-temp365b3c0b298f885
│   ├── quarto-session-temp948e5cf47246a02b
│   ├── quarto-session-temp8aa657533c1549bb
│   ├── quarto-session-temp51b0b95bf26810ea
│   ├── quarto-session-temp9fbe211c86fa973c
│   ├── quarto-session-temp962b89261287028d
│   ├── quarto-session-temp9dd74b895dc4cd4b
│   ├── quarto-session-temp1e0f3df451724634
│   ├── quarto-session-tempdbfc23b0eb1f64f7
│   ├── quarto-session-temp31b1cb00aec9ff87
│   ├── quarto-session-tempbaec0f678def8a5d
│   ├── quarto-session-tempcd4883bebb9f7ec3
│   ├── quarto-session-tempea75c7d2617cc9ad
│   ├── quarto-session-temp338adbbc86c300a5
│   ├── quarto-session-temp2c3682f02881cc84
│   ├── quarto-session-tempba96237eea68efb4
│   ├── quarto-session-temp28991888ed08e870
│   ├── quarto-session-temp388a0599539de5d4
│   ├── quarto-session-temp96675b9ba9c8cc55
│   ├── quarto-session-tempaa127ccf439eb9f
│   ├── preview
│   ├── quarto-session-tempef730b5c072fedb3
│   ├── quarto-session-tempc8a9c70b4a2bcdba
│   ├── quarto-session-temp27d6351309a13b97
│   ├── quarto-session-tempde1a0126152fe8e4
│   ├── quarto-session-temp3cb98a7550493d7e
│   ├── quarto-session-temp7878e895dc51d1d5
│   ├── quarto-session-tempbc900858510e57cb
│   ├── quarto-session-temp1f4a513536586489
│   ├── quarto-session-tempd1adfcd49d14fb66
│   ├── quarto-session-tempd03c2e9b70928953
│   ├── quarto-session-temp9ee349fc4457088e
│   ├── quarto-session-tempc251635a9a6918fc
│   ├── quarto-session-tempd9ea7fa9e2af6622
│   ├── quarto-session-temp51d595d51a27034
│   ├── quarto-session-tempb91e26399bab60f1
│   ├── quarto-session-temp9f94d2582c9a9f22
│   ├── _freeze
│   ├── quarto-session-temp83278b29d39ab49e
│   ├── quarto-session-temp71338f1c8fc46e25
│   ├── quarto-session-temp9cbd673e3e880bb6
│   ├── quarto-session-tempbf93e1e45dffd70d
│   ├── quarto-session-temp8766b0056c7033b6
│   ├── quarto-session-temp3aa0cb7fed86f50f
│   ├── quarto-session-temp838cde0869ea7a3
│   ├── quarto-session-temp85ebc2ab6d3091d5
│   ├── quarto-session-tempcb91904f86f29dc7
│   ├── quarto-session-tempf18f0b40f525e62a
│   ├── quarto-session-temp9b55cf46bfbd7db6
│   ├── project-cache
│   ├── quarto-session-temp53741a2c16107773

Changing directories

## Change directory and list files
cd files
ls
cd ../slides
01_1890_hollerith_codes.png
01_apple_macintosh.png
01_bryant_hard_drive.png
bib
examples
fars_spreadsheet_raw.png
misc
schedule.ics
scripts

Some shell tools

Example files

# This time we use -o to specify the output file name, rather than using > to redirect STDOUT.
curl https://kjhealy.co/mptc/mptc_text_examples.zip -o mptc_text_examples.zip

# Once you've downloaded it, unzip it:
unzip mptc_text_examples.zip

What are we working with

ls files/examples/
_make-example
01_mptc_oecd_nocode.pdf
01_mptc_oecd_withcode.pdf
alice_in_wonderland.txt
alice_noboiler.txt
apple_mobility_daily_2021-04-12.csv
ascii_table.xlsx
bashrc.txt
basics.txt
census_edage.csv
congress
continent_sizes.csv
continent_tab.csv
continent_tab.tsv
countries_iso3.csv
countries.csv
country_iso3.tsv
country_tab.csv
country_tab.tsv
country-intermediate.tsv
country-working.tsv
fars_crash_report.xlsx
fars0-17daily.csv
first_terms.csv
fruit.txt
gapminder_xtra.csv
gss_panel_long.dta
jabberwocky.txt
mortality.txt
organdonation.csv
pride_and_prejudice.txt
rfm_table.csv
roman.txt
SAS_on_2021-04-13.csv
sentences.txt
shalott_1832.txt
shalott_1842.txt
specials.txt
stmf.rda
symptoms.xlsx
ulysses.txt
words.txt
year_tab.tsv
zshrc.txt
  • These files are in my course site project, so your file path will be different! It will be wherever you unzipped the files and the folder will be called mptc_text_examples if you got it via curl, or mptc_text_examples_main if you got it from GitHub.

  • First order of business is to open a Terminal window (either in RStudio or from the operating system) and navigate to where your example files are using pwd, cd, and ls.

wc, cat, head, and tail

wc files/examples/alice_in_wonderland.txt
    3761   29564  174392 files/examples/alice_in_wonderland.txt

We can ask for a count of lines only:

wc -l files/examples/alice_in_wonderland.txt
    3761 files/examples/alice_in_wonderland.txt

wc, cat, head, and tail

cat concatenates and prints the files given to it.

cat files/examples/jabberwocky.txt
’Twas brillig, and the slithy toves 
      Did gyre and gimble in the wabe: 
All mimsy were the borogoves, 
      And the mome raths outgrabe. 

“Beware the Jabberwock, my son! 
      The jaws that bite, the claws that catch! 
Beware the Jubjub bird, and shun 
      The frumious Bandersnatch!” 

He took his vorpal sword in hand; 
      Long time the manxome foe he sought— 
So rested he by the Tumtum tree 
      And stood awhile in thought. 

And, as in uffish thought he stood, 
      The Jabberwock, with eyes of flame, 
Came whiffling through the tulgey wood, 
      And burbled as it came! 

One, two! One, two! And through and through 
      The vorpal blade went snicker-snack! 
He left it dead, and with its head 
      He went galumphing back. 

“And hast thou slain the Jabberwock? 
      Come to my arms, my beamish boy! 
O frabjous day! Callooh! Callay!” 
      He chortled in his joy. 

’Twas brillig, and the slithy toves 
      Did gyre and gimble in the wabe: 
All mimsy were the borogoves, 
      And the mome raths outgrabe.

wc, cat, head, and tail

The top:

head files/examples/alice_in_wonderland.txt
The Project Gutenberg eBook of Alice's Adventures in Wonderland
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

The bottom:

tail files/examples/alice_in_wonderland.txt

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how to
subscribe to our email newsletter to hear about new eBooks.

wc, cat, head, and tail

There are 29 lines of boilerplate at the start of the book:

head -n 29 files/examples/alice_in_wonderland.txt
The Project Gutenberg eBook of Alice's Adventures in Wonderland
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: Alice's Adventures in Wonderland


Author: Lewis Carroll

Release date: June 27, 2008 [eBook #11]
                Most recently updated: March 30, 2021

Language: English

Credits: Arthur DiBianca and David Widger


*** START OF THE PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND ***
[Illustration]



wc, cat, head, and tail

And 351 at the end:

tail -n 351 files/examples/alice_in_wonderland.txt | head -n 20
            *** END OF THE PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND ***
        

    

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S. copyright
law means that no one owns a United States copyright in these works,
so the Foundation (and you!) can copy and distribute it in the United
States without permission and without paying copyright
royalties. Special rules, set forth in the General Terms of Use part
of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is very

wc, cat, head, and tail

We can use tail to skip the boilerplate at the top:

tail -n +29 files/examples/alice_in_wonderland.txt | head

Alice’s Adventures in Wonderland

by Lewis Carroll

THE MILLENNIUM FULCRUM EDITION 3.0

Contents

 CHAPTER I.     Down the Rabbit-Hole

wc, cat, head, and tail

The shell can be treated like a programming language. That is, it has variables and also flow control (loops, if-then-else, etc).

We can use some shell variables along with tail twice to skip the boilerplate at the top and bottom, and put the result into a file of its own using > to redirect the output from STDOUT:

# This sets HEADSKIP to 29 and ENDSKIP to 351;
# We can refer to them with $HEADSKIP and $ENDSKIP
HEADSKIP=29
ENDSKIP=351

# The backticks ` ` here mean "Evaluate this command"; then put the result in a variable
BOOKLINES=`cat files/examples/alice_in_wonderland.txt| wc -l | tr ' ' '\n' | tail -1`

# This line does the arithmetic using expr and makes the result a variable
GOODLINES=$(expr $BOOKLINES - $HEADSKIP - $ENDSKIP)

# Now we use $HEADKSIP and $GOODLINES and create a new file
tail -n +$HEADSKIP files/examples/alice_in_wonderland.txt |
  head -n $GOODLINES > files/examples/alice_noboiler.txt

wc, cat, head, and tail

Now our wc will be different:

wc files/examples/alice_in_wonderland.txt

wc files/examples/alice_noboiler.txt
    3761   29564  174392 files/examples/alice_in_wonderland.txt
    3381   26524  154465 files/examples/alice_noboiler.txt

uniq, sort, and cut

A data file:

head files/examples/countries.csv
cname,iso3,iso2,continent
Afghanistan,AFG,AF,Asia
Algeria,DZA,DZ,Africa
Armenia,ARM,AM,Asia
Australia,AUS,AU,Oceania
Austria,AUT,AT,Europe
Azerbaijan,AZE,AZ,Asia
Bahrain,BHR,BH,Asia
Belarus,BLR,BY,Europe
Belgium,BEL,BE,Europe

How many lines?

wc -l files/examples/countries.csv
     214 files/examples/countries.csv

How many unique lines?

uniq files/examples/countries.csv | wc -l
     214

uniq, sort, and cut

# Omit the header line
tail -n +2 files/examples/countries.csv | sort -r | head
Zimbabwe,ZWE,ZW,Africa
Zambia,ZMB,ZM,Africa
Yemen,YEM,YE,Asia
Western Sahara,ESH,EH,Africa
Wallis and Futuna,WLF,WF,Oceania
Viet Nam,VNM,VN,Asia
Vanuatu,VUT,VU,Oceania
Uzbekistan,UZB,UZ,Asia
Uruguay,URY,UY,South America
United States,USA,US,North America

uniq, sort, and cut

This doesn’t quite work because of the way the data is coded:

tail -n +2 files/examples/countries.csv | sort -t , -k4 -k1
Algeria,DZA,DZ,Africa
Angola,AGO,AO,Africa
Benin,BEN,BJ,Africa
Botswana,BWA,BW,Africa
Burkina Faso,BFA,BF,Africa
Burundi,BDI,BI,Africa
Cabo Verde,CPV,CV,Africa
Cameroon,CMR,CM,Africa
Central African Republic,CAF,CF,Africa
Chad,TCD,TD,Africa
Comoros,COM,KM,Africa
Congo,COG,CG,Africa
Côte d'Ivoire,CIV,CI,Africa
Djibouti,DJI,DJ,Africa
Egypt,EGY,EG,Africa
Equatorial Guinea,GNQ,GQ,Africa
Eritrea,ERI,ER,Africa
Ethiopia,ETH,ET,Africa
Gabon,GAB,GA,Africa
Gambia,GMB,GM,Africa
Ghana,GHA,GH,Africa
Guinea-Bissau,GNB,GW,Africa
Guinea,GIN,GN,Africa
Kenya,KEN,KE,Africa
Lesotho,LSO,LS,Africa
Liberia,LBR,LR,Africa
Libya,LBY,LY,Africa
Madagascar,MDG,MG,Africa
Malawi,MWI,MW,Africa
Mali,MLI,ML,Africa
Mauritania,MRT,MR,Africa
Mauritius,MUS,MU,Africa
Morocco,MAR,MA,Africa
Mozambique,MOZ,MZ,Africa
Namibia,"NAM",NA,Africa
Niger,NER,NE,Africa
Nigeria,NGA,NG,Africa
Rwanda,RWA,RW,Africa
Sao Tome and Principe,STP,ST,Africa
Senegal,SEN,SN,Africa
Seychelles,SYC,SC,Africa
Sierra Leone,SLE,SL,Africa
Somalia,SOM,SO,Africa
South Africa,ZAF,ZA,Africa
South Sudan,SSD,SS,Africa
Sudan,SDN,SD,Africa
Swaziland,SWZ,SZ,Africa
Togo,TGO,TG,Africa
Tunisia,TUN,TN,Africa
Uganda,UGA,UG,Africa
Western Sahara,ESH,EH,Africa
Zambia,ZMB,ZM,Africa
Zimbabwe,ZWE,ZW,Africa
Afghanistan,AFG,AF,Asia
Armenia,ARM,AM,Asia
Azerbaijan,AZE,AZ,Asia
Bahrain,BHR,BH,Asia
Bangladesh,BGD,BD,Asia
Bhutan,BTN,BT,Asia
Brunei Darussalam,BRN,BN,Asia
Cambodia,KHM,KH,Asia
China,CHN,CN,Asia
Georgia,GEO,GE,Asia
India,IND,IN,Asia
Indonesia,IDN,ID,Asia
Iraq,IRQ,IQ,Asia
Israel,ISR,IL,Asia
Japan,JPN,JP,Asia
Jordan,JOR,JO,Asia
Kazakhstan,KAZ,KZ,Asia
Kuwait,KWT,KW,Asia
Kyrgyzstan,KGZ,KG,Asia
Lao People's Democratic Republic,LAO,LA,Asia
Lebanon,LBN,LB,Asia
Malaysia,MYS,MY,Asia
Maldives,MDV,MV,Asia
Mongolia,MNG,MN,Asia
Myanmar,MMR,MM,Asia
Nepal,NPL,NP,Asia
Oman,OMN,OM,Asia
Pakistan,PAK,PK,Asia
Philippines,PHL,PH,Asia
Qatar,QAT,QA,Asia
Saudi Arabia,SAU,SA,Asia
Singapore,SGP,SG,Asia
Sri Lanka,LKA,LK,Asia
Syrian Arab Republic,SYR,SY,Asia
Tajikistan,TJK,TJ,Asia
Thailand,THA,TH,Asia
Turkey,TUR,TR,Asia
United Arab Emirates,ARE,AE,Asia
Uzbekistan,UZB,UZ,Asia
Viet Nam,VNM,VN,Asia
Yemen,YEM,YE,Asia
"Bolivia, Plurinational State of",BOL,BO,South America
"Bonaire, Sint Eustatius and Saba",BES,BQ,North America
"Congo, the Democratic Republic of the",COD,CD,Africa
Albania,ALB,AL,Europe
Andorra,AND,AD,Europe
Austria,AUT,AT,Europe
Belarus,BLR,BY,Europe
Belgium,BEL,BE,Europe
Bosnia and Herzegovina,BIH,BA,Europe
Bulgaria,BGR,BG,Europe
Croatia,HRV,HR,Europe
Cyprus,CYP,CY,Europe
Czech Republic,CZE,CZ,Europe
Denmark,DNK,DK,Europe
Estonia,EST,EE,Europe
Faroe Islands,FRO,FO,Europe
Finland,FIN,FI,Europe
France,FRA,FR,Europe
Germany,DEU,DE,Europe
Gibraltar,GIB,GI,Europe
Greece,GRC,GR,Europe
Guernsey,GGY,GG,Europe
Holy See (Vatican City State),VAT,VA,Europe
Hungary,HUN,HU,Europe
Iceland,ISL,IS,Europe
Ireland,IRL,IE,Europe
Isle of Man,IMN,IM,Europe
Italy,ITA,IT,Europe
Jersey,JEY,JE,Europe
Kosovo,XKV,NA,Europe
Latvia,LVA,LV,Europe
Liechtenstein,LIE,LI,Europe
Lithuania,LTU,LT,Europe
Luxembourg,LUX,LU,Europe
Malta,MLT,MT,Europe
Monaco,MCO,MC,Europe
Montenegro,MNE,ME,Europe
Netherlands,NLD,NL,Europe
Norway,NOR,NO,Europe
Poland,POL,PL,Europe
Portugal,PRT,PT,Europe
Romania,ROU,RO,Europe
Russian Federation,RUS,RU,Europe
San Marino,SMR,SM,Europe
Serbia,SRB,RS,Europe
Slovakia,SVK,SK,Europe
Slovenia,SVN,SI,Europe
Spain,ESP,ES,Europe
Sweden,SWE,SE,Europe
Switzerland,CHE,CH,Europe
Ukraine,UKR,UA,Europe
United Kingdom,GBR,GB,Europe
"Iran, Islamic Republic of",IRN,IR,Asia
"Korea, Republic of",KOR,KR,Asia
"Moldova, Republic of",MDA,MD,Europe
"Macedonia, the former Yugoslav Republic of",MKD,MK,Europe
Anguilla,AIA,AI,North America
Antigua and Barbuda,ATG,AG,North America
Aruba,ABW,AW,North America
Bahamas,BHS,BS,North America
Barbados,BRB,BB,North America
Belize,BLZ,BZ,North America
Bermuda,BMU,BM,North America
Canada,CAN,CA,North America
Cayman Islands,CYM,KY,North America
Costa Rica,CRI,CR,North America
Cuba,CUB,CU,North America
Curaçao,CUW,CW,North America
Dominica,DMA,DM,North America
Dominican Republic,DOM,DO,North America
El Salvador,SLV,SV,North America
Greenland,GRL,GL,North America
Grenada,GRD,GD,North America
Guatemala,GTM,GT,North America
Haiti,HTI,HT,North America
Honduras,HND,HN,North America
Jamaica,JAM,JM,North America
Mexico,MEX,MX,North America
Montserrat,MSR,MS,North America
Nicaragua,NIC,NI,North America
Panama,PAN,PA,North America
Puerto Rico,PRI,PR,North America
Saint Kitts and Nevis,KNA,KN,North America
Saint Lucia,LCA,LC,North America
Saint Vincent and the Grenadines,VCT,VC,North America
Sint Maarten (Dutch part),SXM,SX,North America
Trinidad and Tobago,TTO,TT,North America
Turks and Caicos Islands,TCA,TC,North America
United States,USA,US,North America
Australia,AUS,AU,Oceania
Fiji,FJI,FJ,Oceania
French Polynesia,PYF,PF,Oceania
Guam,GUM,GU,Oceania
Marshall Islands,MHL,MH,Oceania
New Caledonia,NCL,NC,Oceania
New Zealand,NZL,NZ,Oceania
Northern Mariana Islands,MNP,MP,Oceania
Papua New Guinea,PNG,PG,Oceania
Solomon Islands,SLB,SB,Oceania
Timor-Leste,TLS,TL,Oceania
Vanuatu,VUT,VU,Oceania
Wallis and Futuna,WLF,WF,Oceania
"Palestine, State of",PSE,PS,Asia
Argentina,ARG,AR,South America
Brazil,BRA,BR,South America
Chile,CHL,CL,South America
Colombia,COL,CO,South America
Ecuador,ECU,EC,South America
Falkland Islands (Malvinas),FLK,FK,South America
Guyana,GUY,GY,South America
Paraguay,PRY,PY,South America
Peru,PER,PE,South America
Suriname,SUR,SR,South America
Uruguay,URY,UY,South America
"Taiwan, Province of China",TWN,TW,Asia
"Tanzania, United Republic of",TZA,TZ,Africa
"Venezuela, Bolivarian Republic of",VEN,VE,South America
"Virgin Islands, British",VGB,VG,North America
"Virgin Islands, U.S.",VIR,VI,North America

uniq, sort, and cut

cut slices out columns defined by a delimiter (by default \t or tab)

cut -d , -f 2,4 files/examples/countries.csv
iso3,continent
AFG,Asia
DZA,Africa
ARM,Asia
AUS,Oceania
AUT,Europe
AZE,Asia
BHR,Asia
BLR,Europe
BEL,Europe
BRA,South America
KHM,Asia
CAN,North America
CHN,Asia
HRV,Europe
CZE,Europe
DNK,Europe
DOM,North America
ECU,South America
EGY,Africa
EST,Europe
FIN,Europe
FRA,Europe
GEO,Asia
DEU,Europe
GRC,Europe
ISL,Europe
IND,Asia
IDN,Asia
 Islamic Republic of",IR
IRQ,Asia
IRL,Europe
ISR,Asia
ITA,Europe
JPN,Asia
KWT,Asia
LBN,Asia
LTU,Europe
LUX,Europe
MYS,Asia
MEX,North America
MCO,Europe
NPL,Asia
NLD,Europe
NZL,Oceania
NGA,Africa
 the former Yugoslav Republic of",MK
NOR,Europe
OMN,Asia
PAK,Asia
PHL,Asia
QAT,Asia
ROU,Europe
RUS,Europe
SMR,Europe
SGP,Asia
 Republic of",KR
ESP,Europe
LKA,Asia
SWE,Europe
CHE,Europe
 Province of China",TW
THA,Asia
ARE,Asia
GBR,Europe
USA,North America
VNM,Asia
AND,Europe
JOR,Asia
LVA,Europe
MAR,Africa
PRT,Europe
SAU,Asia
SEN,Africa
SXM,North America
TUN,Africa
ARG,South America
CHL,South America
POL,Europe
UKR,Europe
HUN,Europe
LIE,Europe
SVN,Europe
BTN,Asia
BIH,Europe
FRO,Europe
 State of",PS
ZAF,Africa
CMR,Africa
COL,South America
CRI,North America
VAT,Europe
MLT,Europe
PER,South America
SRB,Europe
SVK,Europe
TGO,Africa
BGR,Europe
MDV,Asia
 Republic of",MD
PRY,South America
ALB,Europe
BGD,Asia
BRN,Asia
CYP,Europe
MNG,Asia
PAN,North America
BFA,Africa
 the Democratic Republic of the",CD
 Plurinational State of",BO
CIV,Africa
CUB,North America
HND,North America
JAM,North America
TUR,Asia
ABW,North America
CUW,North America
GAB,Africa
GHA,Africa
GUY,South America
VCT,North America
TTO,North America
ETH,Africa
GIN,Africa
KEN,Africa
XKV,Europe
SDN,Africa
ATG,North America
GNQ,Africa
SWZ,Africa
GTM,North America
KAZ,Asia
MRT,Africa
"NAM",Africa
RWA,Africa
LCA,North America
SYC,Africa
SUR,South America
URY,South America
 Bolivarian Republic of",VE
BHS,North America
CAF,Africa
COG,Africa
UZB,Asia
BEN,Africa
LBR,Africa
MMR,Asia
SOM,Africa
 United Republic of",TZ
BRB,North America
GMB,Africa
MNE,Europe
DJI,Africa
SLV,North America
PYF,Oceania
GUM,Oceania
KGZ,Asia
NIC,North America
ZMB,Africa
BMU,North America
CYM,North America
TCD,Africa
FJI,Oceania
GIB,Europe
GRL,North America
GGY,Europe
HTI,North America
JEY,Europe
MUS,Africa
CPV,Africa
IMN,Europe
MDG,Africa
MSR,North America
NCL,Oceania
NER,Africa
PNG,Oceania
ZWE,Africa
AGO,Africa
ERI,Africa
TLS,Oceania
UGA,Africa
DMA,North America
GRD,North America
MOZ,Africa
SYR,Asia
BLZ,North America
 U.S.",VI
LAO,Asia
LBY,Africa
TCA,North America
MLI,Africa
KNA,North America
AIA,North America
 British",VG
GNB,Africa
PRI,North America
MNP,Oceania
BWA,Africa
BDI,Africa
SLE,Africa
 Sint Eustatius and Saba",BQ
MWI,Africa
FLK,South America
SSD,Africa
STP,Africa
YEM,Asia
ESH,Africa
TJK,Asia
COM,Africa
LSO,Africa
SLB,Oceania
WLF,Oceania
MHL,Oceania
VUT,Oceania

Again in this case it doesn’t quite behave as you might think!

Finding files and finding text

find

find is for locating files and directories by name:

# Everything in the `files/` subdirectory
find files
files
files/misc
files/misc/home-tree.txt
files/misc/root-tree.txt
files/.DS_Store
files/schedule.ics
files/01_apple_macintosh.png
files/01_bryant_hard_drive.png
files/fars_spreadsheet_raw.png
files/examples
files/examples/country_iso3.tsv
files/examples/jabberwocky.txt
files/examples/country_tab.csv
files/examples/ulysses.txt
files/examples/_make-example
files/examples/_make-example/mypaper.md
files/examples/_make-example/fig1.r
files/examples/_make-example/Makefile
files/examples/_make-example/README.md
files/examples/_make-example/.gitignore
files/examples/_make-example/.RData
files/examples/rfm_table.csv
files/examples/01_mptc_oecd_nocode.pdf
files/examples/.DS_Store
files/examples/countries.csv
files/examples/specials.txt
files/examples/stmf.rda
files/examples/gapminder_xtra.csv
files/examples/bashrc.txt
files/examples/apple_mobility_daily_2021-04-12.csv
files/examples/alice_in_wonderland.txt
files/examples/continent_tab.tsv
files/examples/first_terms.csv
files/examples/symptoms.xlsx
files/examples/roman.txt
files/examples/fruit.txt
files/examples/shalott_1832.txt
files/examples/year_tab.tsv
files/examples/census_edage.csv
files/examples/fars_crash_report.xlsx
files/examples/organdonation.csv
files/examples/continent_tab.csv
files/examples/pride_and_prejudice.txt
files/examples/basics.txt
files/examples/01_mptc_oecd_withcode.pdf
files/examples/continent_sizes.csv
files/examples/country-intermediate.tsv
files/examples/SAS_on_2021-04-13.csv
files/examples/country-working.tsv
files/examples/words.txt
files/examples/mortality.txt
files/examples/sentences.txt
files/examples/ascii_table.xlsx
files/examples/gss_panel_long.dta
files/examples/congress
files/examples/congress/23_101_congress.csv
files/examples/congress/28_106_congress.csv
files/examples/congress/08_86_congress.csv
files/examples/congress/05_83_congress.csv
files/examples/congress/31_109_congress.csv
files/examples/congress/24_102_congress.csv
files/examples/congress/16_94_congress.csv
files/examples/congress/37_115_congress.csv
files/examples/congress/13_91_congress.csv
files/examples/congress/25_103_congress.csv
files/examples/congress/30_108_congress.csv
files/examples/congress/01_79_congress.csv
files/examples/congress/09_87_congress.csv
files/examples/congress/36_114_congress.csv
files/examples/congress/17_95_congress.csv
files/examples/congress/22_100_congress.csv
files/examples/congress/04_82_congress.csv
files/examples/congress/29_107_congress.csv
files/examples/congress/12_90_congress.csv
files/examples/congress/15_93_congress.csv
files/examples/congress/11_89_congress.csv
files/examples/congress/35_113_congress.csv
files/examples/congress/06_84_congress.csv
files/examples/congress/26_104_congress.csv
files/examples/congress/03_81_congress.csv
files/examples/congress/32_110_congress.csv
files/examples/congress/18_96_congress.csv
files/examples/congress/21_99_congress.csv
files/examples/congress/07_85_congress.csv
files/examples/congress/10_88_congress.csv
files/examples/congress/33_111_congress.csv
files/examples/congress/14_92_congress.csv
files/examples/congress/02_80_congress.csv
files/examples/congress/38_116_congress.csv
files/examples/congress/34_112_congress.csv
files/examples/congress/20_98_congress.csv
files/examples/congress/27_105_congress.csv
files/examples/congress/19_97_congress.csv
files/examples/fars0-17daily.csv
files/examples/shalott_1842.txt
files/examples/alice_noboiler.txt
files/examples/countries_iso3.csv
files/examples/country_tab.tsv
files/examples/zshrc.txt
files/scripts
files/scripts/hello-world.sh
files/scripts/make-thumbnail.sh
files/bib
files/bib/samplesyllabus.csl
files/bib/american-political-science-association.csl
files/bib/references.bib
files/bib/chicago-fullnote-bibliography-no-bib.csl
files/bib/mptc_references.bib
files/bib/chicago-fullnote-bibliography.csl
files/bib/chicago-syllabus-no-bib.csl
files/bib/apa.csl
files/bib/chicago-author-date.csl
files/bib/.auctex-auto
files/bib/.auctex-auto/references.el
files/bib/chicago-note-bibliography.csl
files/01_1890_hollerith_codes.png

find

We can use globbing (or wildcards) to narrow our search:

# Everything underneath the `files/` subdirectory
# whose name ends in `.csl`
find files -name "*.csl"
files/bib/samplesyllabus.csl
files/bib/american-political-science-association.csl
files/bib/chicago-fullnote-bibliography-no-bib.csl
files/bib/chicago-fullnote-bibliography.csl
files/bib/chicago-syllabus-no-bib.csl
files/bib/apa.csl
files/bib/chicago-author-date.csl
files/bib/chicago-note-bibliography.csl

find

Here we use the . to mean “Search in the current folder”

find . -name "*.xlsx"
./files/examples/symptoms.xlsx
./files/examples/fars_crash_report.xlsx
./files/examples/ascii_table.xlsx
./data/schedule.xlsx
./data/data_sources.xlsx

find

  • The -exec option lets us do things with each result.
  • The {} expands to each found file in turn.
  • Here we use echo to see what the rm (remove) command would do.
  • The quoted semicolon ";" or \; is required to end the line
find files -name "*.png" -exec echo rm {} ";"
rm files/01_apple_macintosh.png
rm files/01_bryant_hard_drive.png
rm files/fars_spreadsheet_raw.png
rm files/01_1890_hollerith_codes.png

If we omitted the echo here the found files really would be deleted one at a time.

find

We can also use xargs to act on search results:

# Everything underneath the `files/` subdirectory
# whose name ends in `.png`
find files -name "*.png"
files/01_apple_macintosh.png
files/01_bryant_hard_drive.png
files/fars_spreadsheet_raw.png
files/01_1890_hollerith_codes.png

Convert all these png files to jpg:

# Convert everything underneath the `files/` subdirectory
# whose name ends in `.png` to `.jpg` format, keeping the original files.
find files -name '*.png' -print0 | xargs -0 -r mogrify -format jpg

find

Check:

find files -name '*.png'
find files -name '*.jpg'
files/01_apple_macintosh.png
files/01_bryant_hard_drive.png
files/fars_spreadsheet_raw.png
files/01_1890_hollerith_codes.png
files/01_apple_macintosh.jpg
files/01_bryant_hard_drive.jpg
files/fars_spreadsheet_raw.jpg
files/01_1890_hollerith_codes.jpg

Delete them (with another method of deletion):

find files  -name '*.jpg' -type f -delete

Perspective

Obviously you will not be doing this sort of thing every day of the week. But you may well want to programmatically rename, move, convert, or otherwise maniplate files in batches from time to time. Especially if there are a lot of them, the shell can help you.

Naming things

Naming files

  • The better your names for things, the easier they will be to find (and programmatically work with)
  • In civilized operating systems, names containing spaces and special characters (such as ? ! , . # $ * <space> and the like) are not a problem.
  • However, the more you work programatically, the more you will want to avoid them.
  • Jenny Bryan’s 5 minute Normconf talk is a good overview of good habits

Naming files

  • Names should tell you something about what the file is
  • Names should avoid spaces and punctuation
  • Names should follow some reasonable convention
  • Names with numbers should sort in useful ways
  • Names should not be used to track the versions of files

Naming files

Find all files in or below the project directory that end in .qmd:

find . -name "*.qmd"
./schedule/index.qmd
./staging/example/04-example.qmd
./staging/example/11-example.qmd
./staging/example/08-example.qmd
./staging/example/07-example.qmd
./staging/example/09-example.qmd
./staging/example/05-example.qmd
./staging/example/06-example.qmd
./staging/example/03-example.qmd
./staging/content/09-content.qmd
./staging/content/05-content.qmd
./staging/content/10-content.qmd
./staging/content/06-content.qmd
./staging/content/03-content.qmd
./staging/content/11-content.qmd
./staging/content/04-content.qmd
./staging/content/08-content.qmd
./staging/content/07-content.qmd
./staging/content/12-content.qmd
./staging/assignment/04-assignment.qmd
./staging/assignment/03-assignment.qmd
./staging/assignment/05-assignment.qmd
./staging/assignment/07-assignment.qmd
./staging/assignment/08-assignment.qmd
./staging/assignment/09-assignment.qmd
./staging/assignment/06-assignment.qmd
./staging/slides/slides-r-and-quarto.qmd
./staging/slides/slides-ggplot-1.qmd
./staging/slides/slides-databases.qmd
./staging/slides/slides-ggplot-2.qmd
./staging/slides/slides-ggplot-3.qmd
./staging/slides/slides-r-dplyr.qmd
./staging/slides/slides-parallel-computing.qmd
./staging/slides/slides-search-and-edit-text.qmd
./staging/slides/slides-ingesting-data.qmd
./staging/slides/slides-git.qmd
./staging/slides/slides-r-reshaping-data.qmd
./staging/slides/slides-project-management.qmd
./staging/slides/slides-iteration.qmd
./_motivation.qmd
./example/03-example-shell.qmd
./example/01-example-oecd.qmd
./example/index.qmd
./content/02-content-labor-day.qmd
./content/01-content-doing-your-work.qmd
./content/index.qmd
./content/03-content-shell.qmd
./assignment/01-assignment-quarto-notes.qmd
./assignment/00-assignment-install-r-github.qmd
./assignment/index.qmd
./assignment/03-assignment-shell.qmd
./assignment/02-assignment-find-some-data.qmd
./about/index.qmd
./index.qmd
./slides/00-slides-template.qmd
./slides/01-slides-bigpicture.qmd
./slides/03-slides-shell.qmd
./slides/03-slides-filesystem.qmd
./slides/01-slides-rstudio-firstlook.qmd
./syllabus/index.qmd
./README.qmd
./_weekly-schedule.qmd

Naming files

Find all files in or below the current directory that start with two characters followed by -example and end with any other number of characters:

find . -name "??-example*"
./staging/example/04-example.qmd
./staging/example/11-example.qmd
./staging/example/08-example.qmd
./staging/example/07-example.qmd
./staging/example/09-example.qmd
./staging/example/05-example.qmd
./staging/example/06-example.qmd
./staging/example/03-example.qmd
./example/01-example-oecd.html
./example/03-example-shell.qmd
./example/01-example-oecd.qmd
./example/03-example-shell.html
./_freeze/example/01-example-oecd
./_freeze/example/00-example-oecd
./_freeze/example/03-example-shell
./.quarto/idx/example/01-example-oecd.qmd.json
./.quarto/idx/example/03-example-shell.qmd.json
./.quarto/_freeze/example/03-example
./.quarto/_freeze/example/02-example
./.quarto/_freeze/example/09-example
./.quarto/_freeze/example/01-example-oecd
./.quarto/_freeze/example/08-example
./.quarto/_freeze/example/01-example
./.quarto/_freeze/example/00-example-oecd
./.quarto/_freeze/example/11-example
./.quarto/_freeze/example/04-example
./.quarto/_freeze/example/05-example
./.quarto/_freeze/example/07-example
./.quarto/_freeze/example/06-example
./.quarto/_freeze/example/03-example-shell

Sort order

mkdir tmp
touch tmp/{1..15}.txt

See how these sort:

ls tmp/
1.txt
10.txt
11.txt
12.txt
13.txt
14.txt
15.txt
2.txt
3.txt
4.txt
5.txt
6.txt
7.txt
8.txt
9.txt

Not what we want.

Sort order

rm -f tmp/*.txt
touch tmp/{01..15}.txt
ls tmp/
01.txt
02.txt
03.txt
04.txt
05.txt
06.txt
07.txt
08.txt
09.txt
10.txt
11.txt
12.txt
13.txt
14.txt
15.txt

Sort order

rm -f tmp/*.txt
touch tmp/{a..d}{01..03}.txt
ls -l tmp/
rm -rf tmp/
rm -rf ../tmp/
total 0
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 a01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 a02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 a03.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 b01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 b02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 b03.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 c01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 c02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 c03.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 d01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 d02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Sep  3 13:50 d03.txt

In general keep your names lower-case.

Dates

Use the one true YMD format, ISO 8601:

YYYY-MM-DD

Naming files

  • Be consistent in your use of naming conventions
  • No need to get too clever, but …
data_clean/
data_raw/
docs/
figures/
R/01_clean-data.R
R/02_process-data.R
R/03_descriptive-figs-tables.R
R/04_brms-model.R
paper/
README.md

Unix naming conventions

  • Dotfiles and underscores
ls -l
total 912
drwxr-xr-x   3 kjhealy  staff      96 Jan  9  2024 _extensions
drwxr-xr-x@  9 kjhealy  staff     288 Sep  3 13:50 _freeze
-rw-r--r--@  1 kjhealy  staff    3757 Aug 17 10:36 _motivation.qmd
-rw-r--r--@  1 kjhealy  staff    3008 Sep  2 14:17 _quarto.yml
drwxr-xr-x@  2 kjhealy  staff      64 Sep  3 13:50 _site
drwxr-xr-x@  8 kjhealy  staff     256 Sep  3 13:50 _targets
-rw-r--r--@  1 kjhealy  staff    6214 Aug 16 21:23 _targets.R
-rw-r--r--@  1 kjhealy  staff    1009 Aug 15 16:51 _variables.yml
-rw-r--r--@  1 kjhealy  staff     974 Aug 16 21:28 _weekly-schedule.qmd
drwxr-xr-x@  3 kjhealy  staff      96 Sep  3 13:50 00_dummy_files
drwxr-xr-x@  4 kjhealy  staff     128 Sep  3 13:50 about
drwxr-xr-x@ 18 kjhealy  staff     576 Aug 25 05:58 assets
drwxr-xr-x@ 13 kjhealy  staff     416 Sep  3 13:50 assignment
lrwxr-xr-x   1 kjhealy  staff     135 Nov  5  2024 avhrr -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr
drwxr-xr-x@ 10 kjhealy  staff     320 Sep  3 13:50 content
drwxr-xr-x@  5 kjhealy  staff     160 Sep  2 14:12 data
-rwxr-xr-x@  1 kjhealy  staff     437 Oct 23  2024 deploy.sh
drwxr-xr-x@  9 kjhealy  staff     288 Sep  3 13:50 example
drwxr-xr-x@ 12 kjhealy  staff     384 Sep  3 13:50 files
drwxr-xr-x  15 kjhealy  staff     480 Jan 24  2025 html
-rw-r--r--@  1 kjhealy  staff   36492 Sep  3 13:50 index.html
-rw-r--r--@  1 kjhealy  staff    1422 Aug 18 08:38 index.qmd
-rw-r--r--@  1 kjhealy  staff     306 Jul 30 10:17 mptc.Rproj
drwxr-xr-x@  8 kjhealy  staff     256 Aug 15  2023 R
-rw-r--r--@  1 kjhealy  staff    1967 Sep 18  2023 README.md
-rw-r--r--   1 kjhealy  staff    1764 Jan 23  2024 README.qmd
drwxr-xr-x@  7 kjhealy  staff     224 Aug 15  2023 renv
-rw-r--r--@  1 kjhealy  staff  335659 Sep  2 14:16 renv.lock
-rw-r--r--   1 kjhealy  staff   46717 Dec 11  2023 renv.lock.orig
drwxr-xr-x@  4 kjhealy  staff     128 Sep  3 13:50 schedule
lrwxr-xr-x   1 kjhealy  staff      66 Nov  5  2024 seas -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/World_Seas_IHO_v3
drwxr-xr-x@ 11 kjhealy  staff     352 Sep  3 13:50 site_libs
drwxr-xr-x@ 16 kjhealy  staff     512 Sep  3 13:50 slides
drwxr-xr-x@  6 kjhealy  staff     192 Aug 28  2024 staging
drwxr-xr-x   4 kjhealy  staff     128 Sep  3 13:32 syllabus

Unix naming conventions

ls -la
total 1008
drwxr-xr-x    3 kjhealy  staff      96 Jan  9  2024 _extensions
drwxr-xr-x@   9 kjhealy  staff     288 Sep  3 13:50 _freeze
-rw-r--r--@   1 kjhealy  staff    3757 Aug 17 10:36 _motivation.qmd
-rw-r--r--@   1 kjhealy  staff    3008 Sep  2 14:17 _quarto.yml
drwxr-xr-x@   2 kjhealy  staff      64 Sep  3 13:50 _site
drwxr-xr-x@   8 kjhealy  staff     256 Sep  3 13:50 _targets
-rw-r--r--@   1 kjhealy  staff    6214 Aug 16 21:23 _targets.R
-rw-r--r--@   1 kjhealy  staff    1009 Aug 15 16:51 _variables.yml
-rw-r--r--@   1 kjhealy  staff     974 Aug 16 21:28 _weekly-schedule.qmd
drwxr-xr-x@  47 kjhealy  staff    1504 Sep  3 13:50 .
drwxr-xr-x@  35 kjhealy  staff    1120 Aug 25 12:25 ..
-rw-r--r--@   1 kjhealy  staff   10244 Aug 25 05:58 .DS_Store
drwxr-xr-x@  16 kjhealy  staff     512 Sep  3 13:38 .git
-rw-r--r--@   1 kjhealy  staff     383 Aug 19 09:19 .gitignore
-rw-r--r--    1 kjhealy  staff      71 Jan  9  2024 .gitmodules
-rw-r--r--@   1 kjhealy  staff     821 Aug 16  2023 .luarc.json
drwxr-xr-x@ 103 kjhealy  staff    3296 Sep  3 13:50 .quarto
-rw-r--r--@   1 kjhealy  staff   16656 Jul 31 15:33 .Rhistory
-rw-r--r--@   1 kjhealy  staff      26 Aug 15  2023 .Rprofile
drwxr-xr-x@   4 kjhealy  staff     128 Aug 10  2023 .Rproj.user
drwxr-xr-x@   3 kjhealy  staff      96 Aug 19 09:12 .vscode
drwxr-xr-x@   3 kjhealy  staff      96 Sep  3 13:50 00_dummy_files
drwxr-xr-x@   4 kjhealy  staff     128 Sep  3 13:50 about
drwxr-xr-x@  18 kjhealy  staff     576 Aug 25 05:58 assets
drwxr-xr-x@  13 kjhealy  staff     416 Sep  3 13:50 assignment
lrwxr-xr-x    1 kjhealy  staff     135 Nov  5  2024 avhrr -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr
drwxr-xr-x@  10 kjhealy  staff     320 Sep  3 13:50 content
drwxr-xr-x@   5 kjhealy  staff     160 Sep  2 14:12 data
-rwxr-xr-x@   1 kjhealy  staff     437 Oct 23  2024 deploy.sh
drwxr-xr-x@   9 kjhealy  staff     288 Sep  3 13:50 example
drwxr-xr-x@  12 kjhealy  staff     384 Sep  3 13:50 files
drwxr-xr-x   15 kjhealy  staff     480 Jan 24  2025 html
-rw-r--r--@   1 kjhealy  staff   36492 Sep  3 13:50 index.html
-rw-r--r--@   1 kjhealy  staff    1422 Aug 18 08:38 index.qmd
-rw-r--r--@   1 kjhealy  staff     306 Jul 30 10:17 mptc.Rproj
drwxr-xr-x@   8 kjhealy  staff     256 Aug 15  2023 R
-rw-r--r--@   1 kjhealy  staff    1967 Sep 18  2023 README.md
-rw-r--r--    1 kjhealy  staff    1764 Jan 23  2024 README.qmd
drwxr-xr-x@   7 kjhealy  staff     224 Aug 15  2023 renv
-rw-r--r--@   1 kjhealy  staff  335659 Sep  2 14:16 renv.lock
-rw-r--r--    1 kjhealy  staff   46717 Dec 11  2023 renv.lock.orig
drwxr-xr-x@   4 kjhealy  staff     128 Sep  3 13:50 schedule
lrwxr-xr-x    1 kjhealy  staff      66 Nov  5  2024 seas -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/World_Seas_IHO_v3
drwxr-xr-x@  11 kjhealy  staff     352 Sep  3 13:50 site_libs
drwxr-xr-x@  16 kjhealy  staff     512 Sep  3 13:50 slides
drwxr-xr-x@   6 kjhealy  staff     192 Aug 28  2024 staging
drwxr-xr-x    4 kjhealy  staff     128 Sep  3 13:32 syllabus

Unix naming conventions

  • Files and folders beginning with a period, ., are “hidden”
  • They won’t show up via ls
  • By convention they are often used for configuration information
  • In the world of R, files or folders beginning with an underscore, _, are often “generated” or are visible configuration files. (This is a weak convention.)
  • The structure of plain-text config files will depend on the thing they are configuring. It might just a list of words or options, or it might be a structured file based on a Markup language like YAML or TOML, or it might be written to be parsed in R or Python, etc.
  • Files have extensions by convention. These exist to help the user and they can be useful when writing scripts. And specific applications or processes may expect to look for and use files with specific names or extensions. But the operating system in general doesn’t care about them.

Unix naming conventions

Here’s the .gitignore file for this project:

.Rproj.user
.Rhistory
.RData
.Ruserdata

/.quarto/
/_site/
/renv/

/staging/

/_freeze/
/_targets/

about/*.pdf
about/*.html
assignment/*.html
example/*.html
schedule/*.html
syllabus/*.html
data/dfstrat.csv
slides/*.pdf
slides/*.html
slides/fonts/*
slides/**/*_cache/*
slides/libs/*
projects/*.zip
seas
avhrr

# knitr and caching
**/*_files/*
**/*_cache/*

README.html

/.luarc.json

Customizing your shell

Bash (often the Linux default)

A .bashrc file to configure non-login shells for Bash:

# Put the contents of this file in your ~/.bashrc file
# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything
case $- in
    *i*) ;;
      *) return;;
esac

# don't put duplicate lines or lines starting with space in the history.
# See bash(1) for more options
HISTCONTROL=ignoreboth

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=1000
HISTFILESIZE=2000

# check the window size after each command and, if necessary,
# update the values of LINES and COLUMNS.
shopt -s checkwinsize

# If set, the pattern "**" used in a pathname expansion context will
# match all files and zero or more directories and subdirectories.
#shopt -s globstar

# make less more friendly for non-text input files, see lesspipe(1)
#[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

# set variable identifying the chroot you work in (used in the prompt below)
if [ -z "${debian_chroot:-}" ] && [ -r /etc/debian_chroot ]; then
    debian_chroot=$(cat /etc/debian_chroot)
fi

# set a fancy prompt (non-color, unless we know we "want" color)
case "$TERM" in
    xterm-color) color_prompt=yes;;
esac

# uncomment for a colored prompt, if the terminal has the capability; turned
# off by default to not distract the user: the focus in a terminal window
# should be on the output of commands, not on the prompt
force_color_prompt=yes

if [ -n "$force_color_prompt" ]; then
    if [ -x /usr/bin/tput ] && tput setaf 1 >&/dev/null; then
        # We have color support; assume it's compliant with Ecma-48
        # (ISO/IEC-6429). (Lack of such support is extremely rare, and such
        # a case would tend to support setf rather than setaf.)
        color_prompt=yes
    else
        color_prompt=
    fi
fi

if [ "$color_prompt" = yes ]; then
#    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
     PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\H\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] \$ '
else
    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
fi
unset color_prompt force_color_prompt

# If this is an xterm set the title to user@host:dir
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\u@\h: \w\a\]$PS1"
    ;;
*)
    ;;
esac

# enable color support of ls and also add handy aliases
if [ -x /usr/bin/dircolors ]; then
    test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)"
    alias ls='ls --color=auto'
    #alias dir='dir --color=auto'
    #alias vdir='vdir --color=auto'

    alias grep='grep --color=auto'
    alias fgrep='fgrep --color=auto'
    alias egrep='egrep --color=auto'
fi

# some more ls aliases
#alias ll='ls -l'
#alias la='ls -A'
#alias l='ls -CF'

# Alias definitions.
# You may want to put all your additions into a separate file like
# ~/.bash_aliases, instead of adding them here directly.
# See /usr/share/doc/bash-doc/examples in the bash-doc package.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

# enable programmable completion features (you don't need to enable
# this, if it's already enabled in /etc/bash.bashrc and /etc/profile
# sources /etc/bash.bashrc).
if ! shopt -oq posix; then
  if [ -f /usr/share/bash-completion/bash_completion ]; then
    . /usr/share/bash-completion/bash_completion
  elif [ -f /etc/bash_completion ]; then
    . /etc/bash_completion
  fi
fi

Zsh (the Mac default)

# Put the contents of this file in your ~/.zshrc file.
# Source: https://github.com/belak/zsh-utils?tab=readme-ov-file

[[ ! -d "$HOME/.antigen" ]] && git clone https://github.com/zsh-users/antigen.git "$HOME/.antigen"
source "$HOME/.antigen/antigen.zsh"

# Set the default plugin repo to be zsh-utils
antigen use belak/zsh-utils --branch=main

# Specify completions we want before the completion module
antigen bundle zsh-users/zsh-completions

# Specify plugins we want
antigen bundle editor@main
antigen bundle history@main
antigen bundle prompt@main
antigen bundle utility@main
antigen bundle completion@main

# Specify additional external plugins we want
antigen bundle zsh-users/zsh-syntax-highlighting

# Load everything
antigen apply

# Set any settings or overrides here
prompt belak
bindkey -e

Caution

Don’t blindly install things

Installing things via shell scripts should only be done from trusted sources!

The Unix way of thinking

Stepping back

  • Your computer stores files and runs commands.
  • The files are stored in a large hierarchy called a filesystem.
  • You issue instructions to run particluar commands at a command line that is provided by a shell, which is how you the user talk to the operating system.
  • Unix commands and utilities generally try to do a specific thing to files or running processes.
  • The Unix conception of a ‘file’ is very flexible. Connections to other computers can act like files.
  • Unix commands are often composable using pipes.

The Unix pipe

  • Unix commands work with some input and may produce some output
  • Unix systems have the concepts of “standard input”, “standard output”, and “standard error” as streams where things come from, where they go to, and where problems are reported.
  • The idea of a sequence of commands or, more generally, functions that can be composed or pipelined in a smooth sequence is a very general and very powerful idea that we will soon see in action in R and that you may come across in many other settings as well.

The Unix pipe

  • The output of the ls command again:
ls
_extensions
_freeze
_motivation.qmd
_quarto.yml
_site
_targets
_targets.R
_variables.yml
_weekly-schedule.qmd
00_dummy_files
about
assets
assignment
avhrr
content
data
deploy.sh
example
files
html
index.html
index.qmd
mptc.Rproj
R
README.md
README.qmd
renv
renv.lock
renv.lock.orig
schedule
seas
site_libs
slides
staging
syllabus

The Unix pipe

We can send, or pipe, this output to another command, instead of to the terminal:

ls | wc -l
      35
  • The wc command counts the number of words in a file, or in whatever is sent to it via STDIN.
  • The -l switch to wc means ‘just count lines instead of words’

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 ls -lh access.log
-rw-r--r-- 1 root root 7.0M Aug 29 16:00 access.log
 head access.log
192.195.49.31 - - [27/Aug/2023:00:01:11 +0000] "GET / HTTP/1.1" 200 19219 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /libs/tufte-css-2015.12.29/tufte.css HTTP/1.1" 200 2025 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /libs/tufte-css-2015.12.29/envisioned.css HTTP/1.1" 200 888 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/tablesaw-stackonly.css HTTP/1.1" 200 1640 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/nudge.css HTTP/1.1" 200 1675 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/sourcesans.css HTTP/1.1" 200 1492 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/jquery.js HTTP/1.1" 200 30464 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/tablesaw-stackonly.js HTTP/1.1" 200 2996 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/nudge.min.js HTTP/1.1" 200 937 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
52.13.187.67 - - [27/Aug/2023:00:01:13 +0000] "GET /dataviz-pdfl_files/figure-html4/ch-03-fig-lexp-gdp-10-1.png HTTP/1.1" 200 308830 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 head access.log | awk '// {print $11}'

"https://www.google.com/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"-"

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 awk '// {print $11}' access.log | sort | uniq -c | sort -nr | head -n 15

   9729 "https://socviz.co/lookatdata.html"
   4851 "-"
   4212 "https://socviz.co/"
   1719 "https://socviz.co/makeplot.html"
   1477 "https://bookdown.org/"
   1466 "https://socviz.co/gettingstarted.html"
   1373 "https://socviz.co/groupfacettx.html"
    864 "https://socviz.co/workgeoms.html"
    794 "https://socviz.co/maps.html"
    733 "https://socviz.co/refineplots.html"
    671 "https://socviz.co/index.html"
    349 "https://socviz.co/appendix.html"
    228 "https://socviz.co/modeling.html"
    153 "https://www.google.com/"
     50 "http://vissoc.co/"

The Unix pipe

We can do a lot with a pipeline:

curl -s 'http://api.citybik.es/v2/networks/citi-bike-nyc' |
   jq '.network.stations[].free_bikes' |
  gpaste -sd+ |
  bc
32542

This is the number of Citi Bikes available in New York City at the time these slides were made.

We usually won’t use the Unix command line or shell to things like this. We’ll do it in R. You could also do it in other languages. But basic shell competence remains extremely handy for many more common tasks.

Shell Scripting

Shell Scripts

  • If you find yourself doing the same task repeatedly, think about whether it makes sense to write a script
  • Shell scripts can become mini-programs, but can also be just one or two lines that pull together a few commands
  • They really show their strength when there’s some fiddly thing you want to do to a lot of files or directories

Shell Scripts

#!/usr/bin/env bash

echo "Hello World!"
  • #! or “shebang” line saying where the interpreter is
  • chmod 755 script.sh or chmod +x script.sh to make executable
  • The interpreter doesn’t have to be the shell: other languages can be scripted too

Shell Scripts

#!/usr/bin/env bash

# Make a thumbnail for each PNG
for i in *.png; do

  FILENAME=$(basename -- "$i") # Full filename
  EXTENSION="${FILENAME##*.}" # Extension only
  FILENAME="${FILENAME%.*}" # Filename without extension

  convert "$i" -thumbnail 500 "$FILENAME-thumb.$EXTENSION";

done;

Shell Scripts

  • The shell can talk to the clipboard:
echo I am sending this sentence to the clipboard | pbcopy
  • Back from the clipboard:
pbpaste | wc -c
      44

In an era of Generative AI and LLMs, why are we covering this stuff?

Because Unix is still everywhere

“Why am I doing this?”

  • As soon as you try to do anything of any sort of technical complexity, or just simple reproducibility, with your computer—even using the newest and coolest tools—I promise you’ll eventually find yourself in a world governed by the metaphors and methods Unix originated, and, very likely, in a literal Unix-derived environment.

  • That is, you will be in some sort of folder-based hierarchy; you will edit plain-text files in order to configure, launch, generate, or capture the output of applications; and you will do this by way of instructions written down as a series of commands that follow some sort of regular syntax. The details of those instructions (and the particular conventions they use) will vary depending on the task at hand. But in essence you will always be doing the same thing.