The File System & the Shell

Modern Plain Text Computing
Week 02

Kieran Healy

November 5, 2024

Housekeeping

RStudio Container

Files

Files

  • A file is just a stream of bytes, or data, some sort of resource that a program can read or interact with.
  • Files have a location in the file system.
  • In the UNIX way of thinking, “Everything is a file
  • That is, lots of things that are not normally thought of as files (such as printers, or terminal screens, or connections to other computers) can be thought of as living in a named place somewhere in the filesystem.
  • The basic set of UNIX utilities can be thought of as tools that accept “files” (as a standard stream of input data), perform some specific action on them (read, print, move, copy, delete, count lines, find text, whatever) and then return a standard stream of output data that can be sent somewhere, e.g. to a terminal display, or used as input to another command, or become a file of its own.

File system hierarchy

Path conventions

  • / represents a division in the file hierarchy. You can think of it as a branch point on a tree, or as a new level of nesting in a series of boxes, or as the action “Go inside” or “Enter”.

  • On a Unix-like system, a full path to a file looks like this:

/Users/kjhealy/Documents/courses/mptc/slides/01b-slides.qmd

“Go inside the ‘Users’ folder, then inside the ‘kjhealy’ folder, then inside ‘Documents’ then inside ‘courses’ then ‘mptc’ then ‘slides’ and you will find the file 01b-slides.qmd.”

Standard Unix locations

  • / : root. Everything lives inside or under the root.
  • /bin/ : For binaries. Core user executable programs and tools.
  • /sbin/ : System binaries. Essential executables for the super user (who is also called root)
  • /lib/ : Support files for executables.
  • /usr/ : Conventionally, stuff installed “locally” for users in addition to the core system. Will contain its own bin/ and lib/ subdirs.
  • /usr/local : Files that the local user has compiled or installed
  • /opt/ : Like /usr/, another place for locally installed software to go.

Standard Unix locations

  • These locations get mapped together in the $PATH, which is an environment variable that tells the system where executables can be found.
 echo $PATH
/home/kjhealy/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
  • Delimited by : and searched in order from left to right.
  • To learn where a command is being executed from, use which
 which R
/usr/local/bin/R

Standard Unix locations

  • / : root. Everything lives inside or under the root.
  • /bin/ : For binaries. Core user executable programs and tools.
  • /sbin/ : System binaries. Essential executables for the super user (who is also called root)
  • /lib/ : Support files for executables.
  • /usr/ : Conventionally, stuff installed “locally” for users in addition to the core system. Will contain its own bin/ and lib/ subdirs.
  • /usr/local : Files that the local user has compiled or installed
  • /opt/ : Like /usr/, another place for locally installed software to go.
  • /etc/ : Editable text configuration. Config files often go here.

Standard Unix locations

  • /home/ or /Users/ : Where the accounts of individual system users live, like /Users/kjhealy or /home/kjhealy
 pwd
/home/kjhealy
  ls
bin  certbot.log  logrotate.conf  old  projects  public  staging
  • All of this is a matter of more or less established convention that varies by particular operating systems. E.g. on most Linux systems, individual user directories live in /home. On macOS they live in /Users. Windows is different again (and uses \ for file paths rather than /.)

File system hierarchy

  • An edited version of the root / tree
├── Applications
├── bin
├── cores
├── dev
├── etc -> private/etc
├── home -> /System/Volumes/Data/home
├── Library
├── opt
  ├── homebrew
├── private
  ├── etc
  ├── tftpboot
  ├── tmp
  └── var
├── sbin
├── System
├── tmp -> private/tmp
├── Users
  ├── kjhealy
  └── Shared
├── usr
  ├── bin
  ├── lib
  ├── libexec
  ├── local
  ├── sbin
  ├── share
  ├── standalone
├── var -> private/var
└── Volumes

File system hierarchy

  • An edited version of the user or home tree
├── Applications
├── bin
├── Box
├── Creative Cloud Files
├── Desktop
├── Documents
  ├── bibs -> /Users/kjhealy/Library/texmf/bibtex/bib
  ├── bookdown
  ├── comments
  ├── completed
  ├── courses
  ├── data
  ├── letters
  ├── misc
  ├── nonsense
  ├── ordinal-society
  ├── papers
  ├── sites
  ├── source
  ├── talks
  ├── teaching
  ├── templates
  ├── vita
├── Downloads
├── Dropbox
├── Library
├── Movies
├── Music
├── Pictures
├── Public
├── scratch
├── tmp
└── Zotero

So, how do we make our way around this file hierarchy tree and how do we take actions and do things?

The Shell

What is it?

  • A shell is a way for you to tell the operating system to do things.
  • On Unix systems it’s the first user-facing thing to get off the ground during the startup/boot process.
  • The command line or command prompt is where you type instructions. Shells come with a collection of standard utilities—i.e., commands—that let you do things.
  • These utilities can be composed, chained or piped together to accomplish more complex tasks.
  • You can also write scripts, or little programs, that the shell will run you.
  • Shell scripting languages are small interpreted programming languages that understand variables, command substitution, branching, and iteration.

There are many shells

  • Strictly speaking, GUI environments like Windows and the macOS Finder are shells too.
  • But “the shell” usually means a text-based interpreter that runs programs in response to typed commands.
  • The “original” Unix shell is sh.
  • Its most widely-used descendant is bash or the Bourne-Again Shell.
  • On macOS the default shell is the Z Shell or zsh.
  • Windows has the Command shell and PowerShell, and possibly also e.g. Cygwin. (PowerShell does not follow Unix conventions.)

A command interpreter

echo "Hello there"
Hello there
  • A shell is an interpreter. It waits for commands. When you supply them, it does what you tell it, or tells the relevant bit of the operating system to do what you said.

  • This mode of interacting with a computer is sometimes called a REPL or Read-Eval-Print Loop.

  • Programming languages like Python and R work this way as well. So does ChatGPT. Shell commands (and R and Python commands, and scripts) They are interpreted, meaning code is sent to an interpreter (the Python or R program) that runs the code.

  • This is distinct from languages (at least originally) designed to be compiled into executable machine code before they are run. Languages like C, Go, and Rust are in this category.

Getting around the file system

Who and where

Who am I?

whoami
kjhealy

Where am I?

# Print working directory
pwd
/Users/kjhealy/Documents/courses/mptc

Listing files

What is in here?

# List files
ls
R
README.md
README.qmd
README_files
_extensions
_freeze
_quarto.yml
_site
_targets
_targets.R
_variables.yml
about
assets
assignment
avhrr
content
data
deploy.sh
example
files
html
index.html
index.qmd
mptc.Rproj
renv
renv.lock
renv.lock.orig
schedule
seas
site_libs
slides
staging
syllabus

Path rules

  • If the path name begins with /, it is an absolute path, starting from the filesystem root.
  • If the path name begins with ~, it will usually be expanded into an absolute path name starting at your home directory (~).

Path rules

  • If the pathname does not begin with a / or ~ then the path name is relative to the current directory.

  • Two relative special cases use entries that are in every Unix directory:

    1. If the path name begins with ./, the path is relative to the current directory, e.g., ./textfile, though this can also execute the file if it is given executable file permissions.
    2. If the path name begins with ../, the path is relative to the parent of the current directory. For example, if your current directory is /Users/kjhealy/Documents/papers then ../data means /Users/kjhealy/Documents/data

File permissions

Who is using this file system anyway?

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

Unix derives from a world there there are multiple users and groups of users who are all using slices (in terms of processor time and available permanent storage) of a large central computer.

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

In Unix systems there are three kinds of owner: the user (here kjhealy), the group (here staff), and others or other users on the system.

File permissions

drwxr-xr-x@  8 kjhealy  staff    256 Aug 15 16:35 R
-rw-r--r--@  1 kjhealy  staff   1210 Aug 15 20:29 README.md

Three things you can do to a file:

read

write

execute

  • For files, “read” means open; “write” means edit, save, or delete; “execute” means run if it’s an application or script.
  • For directories, “read” means list contents with ls, “write’ means create, delete, or rename;”execute” means access or enter using cd

File permissions

❯ ls -l README.md

-rw-r--r--@ 1 kjhealy staff 1210 Aug 15 20:29 README.md

 

These permissions say rw-r--r-- or

  • The user can rw- read and write this file
  • The group can r-- read this file
  • The world can r-- read this file

Executable permissions are irrelevant here because it’s a text file.

File permissions

  • We change file permissions with the chmod command. So e.g. chmod 644 README.md means “change the permissions to rw-r--r--”.

A Tree

├── schedule
├── staging
│   ├── example
│   ├── content
│   ├── assignment
│   ├── slides
├── README_files
│   ├── libs
├── example
│   ├── 09-example_files
│   ├── 07-example_files
├── R
├── content
├── assignment
├── html
│   ├── fonts
├── site_libs
│   ├── revealjs
│   ├── bootstrap
│   ├── quarto-html
│   ├── quarto-contrib
│   ├── quarto-nav
│   ├── quarto-search
│   ├── lightable-0.0.1
│   ├── kePrint-0.0.1
│   ├── clipboard
├── about
├── slides
│   ├── 00-slides_files
│   ├── 02-slides_files
├── syllabus
├── _extensions
│   ├── kjhealy
├── _site
├── files
│   ├── misc
│   ├── examples
│   ├── scripts
│   ├── bib
├── .git
│   ├── objects
│   ├── info
│   ├── logs
│   ├── hooks
│   ├── refs
│   ├── modules
├── _targets
│   ├── meta
│   ├── objects
│   ├── user
│   ├── workspaces
├── renv
│   ├── staging
│   ├── library
├── data
├── assets
│   ├── 03-editors
│   ├── 04-r
│   ├── 10-parallel
│   ├── 04-git
│   ├── 08-iterate
│   ├── 00-site
│   ├── 02-shell
│   ├── 01-file-system
│   ├── 07-ingest
│   ├── 05-dplyr
│   ├── 06-build
├── _freeze
│   ├── schedule
│   ├── example
│   ├── content
│   ├── assignment
│   ├── site_libs
│   ├── slides
│   ├── syllabus
│   ├── index
├── .Rproj.user
│   ├── B6516D0D
│   ├── shared
├── .quarto
│   ├── xref
│   ├── idx
│   ├── preview
│   ├── _freeze

Changing directories

## Change directory and list files
cd files
ls
cd ../slides
01_1890_hollerith_codes.png
01_apple_macintosh.png
01_bryant_hard_drive.png
bib
examples
fars_spreadsheet_raw.png
misc
schedule.ics
scripts

Some shell tools

Example files

What are we working with

ls files/examples/
01_mptc_oecd_nocode.pdf
01_mptc_oecd_withcode.pdf
SAS_on_2021-04-13.csv
_make-example
alice_in_wonderland.txt
alice_noboiler.txt
apple_mobility_daily_2021-04-12.csv
ascii_table.xlsx
bashrc.txt
basics.txt
congress
continent_sizes.csv
continent_tab.csv
continent_tab.tsv
countries.csv
countries_iso3.csv
country-intermediate.tsv
country-working.tsv
country_iso3.tsv
country_tab.csv
country_tab.tsv
fars0-17daily.csv
fars_crash_report.xlsx
first_terms.csv
fruit.txt
gapminder_xtra.csv
gss_panel_long.dta
jabberwocky.txt
mortality.txt
organdonation.csv
pride_and_prejudice.txt
rfm_table.csv
roman.txt
sentences.txt
shalott_1832.txt
shalott_1842.txt
specials.txt
symptoms.xlsx
ulysses.txt
words.txt
year_tab.tsv
zshrc.txt
  • Your file path will be different! First order of business is to open a Terminal window (either in RStudio or from the operating system) and navigate to where your example files are using pwd, cd, and ls.

wc, cat, head, and tail

wc files/examples/alice_in_wonderland.txt
    3761   29564  174392 files/examples/alice_in_wonderland.txt

We can ask for a count of lines only:

wc -l files/examples/alice_in_wonderland.txt
    3761 files/examples/alice_in_wonderland.txt

wc, cat, head, and tail

cat concatenates and prints the files given to it.

cat files/examples/jabberwocky.txt
’Twas brillig, and the slithy toves 
      Did gyre and gimble in the wabe: 
All mimsy were the borogoves, 
      And the mome raths outgrabe. 

“Beware the Jabberwock, my son! 
      The jaws that bite, the claws that catch! 
Beware the Jubjub bird, and shun 
      The frumious Bandersnatch!” 

He took his vorpal sword in hand; 
      Long time the manxome foe he sought— 
So rested he by the Tumtum tree 
      And stood awhile in thought. 

And, as in uffish thought he stood, 
      The Jabberwock, with eyes of flame, 
Came whiffling through the tulgey wood, 
      And burbled as it came! 

One, two! One, two! And through and through 
      The vorpal blade went snicker-snack! 
He left it dead, and with its head 
      He went galumphing back. 

“And hast thou slain the Jabberwock? 
      Come to my arms, my beamish boy! 
O frabjous day! Callooh! Callay!” 
      He chortled in his joy. 

’Twas brillig, and the slithy toves 
      Did gyre and gimble in the wabe: 
All mimsy were the borogoves, 
      And the mome raths outgrabe.

wc, cat, head, and tail

The top:

head files/examples/alice_in_wonderland.txt
The Project Gutenberg eBook of Alice's Adventures in Wonderland
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

The bottom:

tail files/examples/alice_in_wonderland.txt

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how to
subscribe to our email newsletter to hear about new eBooks.

wc, cat, head, and tail

There are 29 lines of boilerplate at the start of the book:

head -n 29 files/examples/alice_in_wonderland.txt
The Project Gutenberg eBook of Alice's Adventures in Wonderland
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this ebook or online
at www.gutenberg.org. If you are not located in the United States,
you will have to check the laws of the country where you are located
before using this eBook.

Title: Alice's Adventures in Wonderland


Author: Lewis Carroll

Release date: June 27, 2008 [eBook #11]
                Most recently updated: March 30, 2021

Language: English

Credits: Arthur DiBianca and David Widger


*** START OF THE PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND ***
[Illustration]



wc, cat, head, and tail

And 351 at the end:

tail -n 351 files/examples/alice_in_wonderland.txt | head -n 20
            *** END OF THE PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND ***
        

    

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S. copyright
law means that no one owns a United States copyright in these works,
so the Foundation (and you!) can copy and distribute it in the United
States without permission and without paying copyright
royalties. Special rules, set forth in the General Terms of Use part
of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is very

wc, cat, head, and tail

We can use tail to skip the boilerplate at the top:

tail -n +29 files/examples/alice_in_wonderland.txt | head

Alice’s Adventures in Wonderland

by Lewis Carroll

THE MILLENNIUM FULCRUM EDITION 3.0

Contents

 CHAPTER I.     Down the Rabbit-Hole

wc, cat, head, and tail

The shell can be treated like a programming language. That is, it has variables and also flow control (loops, if-then-else, etc).

We can use some shell variables along with tail twice to skip the boilerplate at the top and bottom, and put the result into a file of its own using > to redirect the output from STDOUT:

# This sets HEADSKIP to 29 and ENDSKIP to 351;
# We can refer to them with $HEADSKIP and $ENDSKIP
HEADSKIP=29
ENDSKIP=351

# The backticks ` ` here mean "Evaluate this command"; then put the result in a variable
BOOKLINES=`cat files/examples/alice_in_wonderland.txt| wc -l | tr ' ' '\n' | tail -1`

# This line does the arithmetic using expr and makes the result a variable
GOODLINES=$(expr $BOOKLINES - $HEADSKIP - $ENDSKIP)

# Now we use $HEADKSIP and $GOODLINES and create a new file
tail -n +$HEADSKIP files/examples/alice_in_wonderland.txt |
  head -n $GOODLINES > files/examples/alice_noboiler.txt

wc, cat, head, and tail

Now our wc will be different:

wc files/examples/alice_in_wonderland.txt

wc files/examples/alice_noboiler.txt
    3761   29564  174392 files/examples/alice_in_wonderland.txt
    3381   26524  154465 files/examples/alice_noboiler.txt

uniq, sort, and cut

A data file:

head files/examples/countries.csv
cname,iso3,iso2,continent
Afghanistan,AFG,AF,Asia
Algeria,DZA,DZ,Africa
Armenia,ARM,AM,Asia
Australia,AUS,AU,Oceania
Austria,AUT,AT,Europe
Azerbaijan,AZE,AZ,Asia
Bahrain,BHR,BH,Asia
Belarus,BLR,BY,Europe
Belgium,BEL,BE,Europe

How many lines?

wc -l files/examples/countries.csv
     214 files/examples/countries.csv

How many unique lines?

uniq files/examples/countries.csv | wc -l
     214

uniq, sort, and cut

# Omit the header line
tail -n +2 files/examples/countries.csv | sort -r | head
Zimbabwe,ZWE,ZW,Africa
Zambia,ZMB,ZM,Africa
Yemen,YEM,YE,Asia
Western Sahara,ESH,EH,Africa
Wallis and Futuna,WLF,WF,Oceania
Viet Nam,VNM,VN,Asia
Vanuatu,VUT,VU,Oceania
Uzbekistan,UZB,UZ,Asia
Uruguay,URY,UY,South America
United States,USA,US,North America

uniq, sort, and cut

This doesn’t quite work because of the way the data is coded:

tail -n +2 files/examples/countries.csv | sort -t , -k4 -k1
Algeria,DZA,DZ,Africa
Angola,AGO,AO,Africa
Benin,BEN,BJ,Africa
Botswana,BWA,BW,Africa
Burkina Faso,BFA,BF,Africa
Burundi,BDI,BI,Africa
Cabo Verde,CPV,CV,Africa
Cameroon,CMR,CM,Africa
Central African Republic,CAF,CF,Africa
Chad,TCD,TD,Africa
Comoros,COM,KM,Africa
Congo,COG,CG,Africa
Côte d'Ivoire,CIV,CI,Africa
Djibouti,DJI,DJ,Africa
Egypt,EGY,EG,Africa
Equatorial Guinea,GNQ,GQ,Africa
Eritrea,ERI,ER,Africa
Ethiopia,ETH,ET,Africa
Gabon,GAB,GA,Africa
Gambia,GMB,GM,Africa
Ghana,GHA,GH,Africa
Guinea,GIN,GN,Africa
Guinea-Bissau,GNB,GW,Africa
Kenya,KEN,KE,Africa
Lesotho,LSO,LS,Africa
Liberia,LBR,LR,Africa
Libya,LBY,LY,Africa
Madagascar,MDG,MG,Africa
Malawi,MWI,MW,Africa
Mali,MLI,ML,Africa
Mauritania,MRT,MR,Africa
Mauritius,MUS,MU,Africa
Morocco,MAR,MA,Africa
Mozambique,MOZ,MZ,Africa
Namibia,"NAM",NA,Africa
Niger,NER,NE,Africa
Nigeria,NGA,NG,Africa
Rwanda,RWA,RW,Africa
Sao Tome and Principe,STP,ST,Africa
Senegal,SEN,SN,Africa
Seychelles,SYC,SC,Africa
Sierra Leone,SLE,SL,Africa
Somalia,SOM,SO,Africa
South Africa,ZAF,ZA,Africa
South Sudan,SSD,SS,Africa
Sudan,SDN,SD,Africa
Swaziland,SWZ,SZ,Africa
Togo,TGO,TG,Africa
Tunisia,TUN,TN,Africa
Uganda,UGA,UG,Africa
Western Sahara,ESH,EH,Africa
Zambia,ZMB,ZM,Africa
Zimbabwe,ZWE,ZW,Africa
Afghanistan,AFG,AF,Asia
Armenia,ARM,AM,Asia
Azerbaijan,AZE,AZ,Asia
Bahrain,BHR,BH,Asia
Bangladesh,BGD,BD,Asia
Bhutan,BTN,BT,Asia
Brunei Darussalam,BRN,BN,Asia
Cambodia,KHM,KH,Asia
China,CHN,CN,Asia
Georgia,GEO,GE,Asia
India,IND,IN,Asia
Indonesia,IDN,ID,Asia
Iraq,IRQ,IQ,Asia
Israel,ISR,IL,Asia
Japan,JPN,JP,Asia
Jordan,JOR,JO,Asia
Kazakhstan,KAZ,KZ,Asia
Kuwait,KWT,KW,Asia
Kyrgyzstan,KGZ,KG,Asia
Lao People's Democratic Republic,LAO,LA,Asia
Lebanon,LBN,LB,Asia
Malaysia,MYS,MY,Asia
Maldives,MDV,MV,Asia
Mongolia,MNG,MN,Asia
Myanmar,MMR,MM,Asia
Nepal,NPL,NP,Asia
Oman,OMN,OM,Asia
Pakistan,PAK,PK,Asia
Philippines,PHL,PH,Asia
Qatar,QAT,QA,Asia
Saudi Arabia,SAU,SA,Asia
Singapore,SGP,SG,Asia
Sri Lanka,LKA,LK,Asia
Syrian Arab Republic,SYR,SY,Asia
Tajikistan,TJK,TJ,Asia
Thailand,THA,TH,Asia
Turkey,TUR,TR,Asia
United Arab Emirates,ARE,AE,Asia
Uzbekistan,UZB,UZ,Asia
Viet Nam,VNM,VN,Asia
Yemen,YEM,YE,Asia
"Bolivia, Plurinational State of",BOL,BO,South America
"Bonaire, Sint Eustatius and Saba",BES,BQ,North America
"Congo, the Democratic Republic of the",COD,CD,Africa
Albania,ALB,AL,Europe
Andorra,AND,AD,Europe
Austria,AUT,AT,Europe
Belarus,BLR,BY,Europe
Belgium,BEL,BE,Europe
Bosnia and Herzegovina,BIH,BA,Europe
Bulgaria,BGR,BG,Europe
Croatia,HRV,HR,Europe
Cyprus,CYP,CY,Europe
Czech Republic,CZE,CZ,Europe
Denmark,DNK,DK,Europe
Estonia,EST,EE,Europe
Faroe Islands,FRO,FO,Europe
Finland,FIN,FI,Europe
France,FRA,FR,Europe
Germany,DEU,DE,Europe
Gibraltar,GIB,GI,Europe
Greece,GRC,GR,Europe
Guernsey,GGY,GG,Europe
Holy See (Vatican City State),VAT,VA,Europe
Hungary,HUN,HU,Europe
Iceland,ISL,IS,Europe
Ireland,IRL,IE,Europe
Isle of Man,IMN,IM,Europe
Italy,ITA,IT,Europe
Jersey,JEY,JE,Europe
Kosovo,XKV,NA,Europe
Latvia,LVA,LV,Europe
Liechtenstein,LIE,LI,Europe
Lithuania,LTU,LT,Europe
Luxembourg,LUX,LU,Europe
Malta,MLT,MT,Europe
Monaco,MCO,MC,Europe
Montenegro,MNE,ME,Europe
Netherlands,NLD,NL,Europe
Norway,NOR,NO,Europe
Poland,POL,PL,Europe
Portugal,PRT,PT,Europe
Romania,ROU,RO,Europe
Russian Federation,RUS,RU,Europe
San Marino,SMR,SM,Europe
Serbia,SRB,RS,Europe
Slovakia,SVK,SK,Europe
Slovenia,SVN,SI,Europe
Spain,ESP,ES,Europe
Sweden,SWE,SE,Europe
Switzerland,CHE,CH,Europe
Ukraine,UKR,UA,Europe
United Kingdom,GBR,GB,Europe
"Iran, Islamic Republic of",IRN,IR,Asia
"Korea, Republic of",KOR,KR,Asia
"Moldova, Republic of",MDA,MD,Europe
"Macedonia, the former Yugoslav Republic of",MKD,MK,Europe
Anguilla,AIA,AI,North America
Antigua and Barbuda,ATG,AG,North America
Aruba,ABW,AW,North America
Bahamas,BHS,BS,North America
Barbados,BRB,BB,North America
Belize,BLZ,BZ,North America
Bermuda,BMU,BM,North America
Canada,CAN,CA,North America
Cayman Islands,CYM,KY,North America
Costa Rica,CRI,CR,North America
Cuba,CUB,CU,North America
Curaçao,CUW,CW,North America
Dominica,DMA,DM,North America
Dominican Republic,DOM,DO,North America
El Salvador,SLV,SV,North America
Greenland,GRL,GL,North America
Grenada,GRD,GD,North America
Guatemala,GTM,GT,North America
Haiti,HTI,HT,North America
Honduras,HND,HN,North America
Jamaica,JAM,JM,North America
Mexico,MEX,MX,North America
Montserrat,MSR,MS,North America
Nicaragua,NIC,NI,North America
Panama,PAN,PA,North America
Puerto Rico,PRI,PR,North America
Saint Kitts and Nevis,KNA,KN,North America
Saint Lucia,LCA,LC,North America
Saint Vincent and the Grenadines,VCT,VC,North America
Sint Maarten (Dutch part),SXM,SX,North America
Trinidad and Tobago,TTO,TT,North America
Turks and Caicos Islands,TCA,TC,North America
United States,USA,US,North America
Australia,AUS,AU,Oceania
Fiji,FJI,FJ,Oceania
French Polynesia,PYF,PF,Oceania
Guam,GUM,GU,Oceania
Marshall Islands,MHL,MH,Oceania
New Caledonia,NCL,NC,Oceania
New Zealand,NZL,NZ,Oceania
Northern Mariana Islands,MNP,MP,Oceania
Papua New Guinea,PNG,PG,Oceania
Solomon Islands,SLB,SB,Oceania
Timor-Leste,TLS,TL,Oceania
Vanuatu,VUT,VU,Oceania
Wallis and Futuna,WLF,WF,Oceania
"Palestine, State of",PSE,PS,Asia
Argentina,ARG,AR,South America
Brazil,BRA,BR,South America
Chile,CHL,CL,South America
Colombia,COL,CO,South America
Ecuador,ECU,EC,South America
Falkland Islands (Malvinas),FLK,FK,South America
Guyana,GUY,GY,South America
Paraguay,PRY,PY,South America
Peru,PER,PE,South America
Suriname,SUR,SR,South America
Uruguay,URY,UY,South America
"Taiwan, Province of China",TWN,TW,Asia
"Tanzania, United Republic of",TZA,TZ,Africa
"Venezuela, Bolivarian Republic of",VEN,VE,South America
"Virgin Islands, British",VGB,VG,North America
"Virgin Islands, U.S.",VIR,VI,North America

uniq, sort, and cut

cut slices out columns defined by a delimiter (by default \t or tab)

cut -d , -f 2,4 files/examples/countries.csv
iso3,continent
AFG,Asia
DZA,Africa
ARM,Asia
AUS,Oceania
AUT,Europe
AZE,Asia
BHR,Asia
BLR,Europe
BEL,Europe
BRA,South America
KHM,Asia
CAN,North America
CHN,Asia
HRV,Europe
CZE,Europe
DNK,Europe
DOM,North America
ECU,South America
EGY,Africa
EST,Europe
FIN,Europe
FRA,Europe
GEO,Asia
DEU,Europe
GRC,Europe
ISL,Europe
IND,Asia
IDN,Asia
 Islamic Republic of",IR
IRQ,Asia
IRL,Europe
ISR,Asia
ITA,Europe
JPN,Asia
KWT,Asia
LBN,Asia
LTU,Europe
LUX,Europe
MYS,Asia
MEX,North America
MCO,Europe
NPL,Asia
NLD,Europe
NZL,Oceania
NGA,Africa
 the former Yugoslav Republic of",MK
NOR,Europe
OMN,Asia
PAK,Asia
PHL,Asia
QAT,Asia
ROU,Europe
RUS,Europe
SMR,Europe
SGP,Asia
 Republic of",KR
ESP,Europe
LKA,Asia
SWE,Europe
CHE,Europe
 Province of China",TW
THA,Asia
ARE,Asia
GBR,Europe
USA,North America
VNM,Asia
AND,Europe
JOR,Asia
LVA,Europe
MAR,Africa
PRT,Europe
SAU,Asia
SEN,Africa
SXM,North America
TUN,Africa
ARG,South America
CHL,South America
POL,Europe
UKR,Europe
HUN,Europe
LIE,Europe
SVN,Europe
BTN,Asia
BIH,Europe
FRO,Europe
 State of",PS
ZAF,Africa
CMR,Africa
COL,South America
CRI,North America
VAT,Europe
MLT,Europe
PER,South America
SRB,Europe
SVK,Europe
TGO,Africa
BGR,Europe
MDV,Asia
 Republic of",MD
PRY,South America
ALB,Europe
BGD,Asia
BRN,Asia
CYP,Europe
MNG,Asia
PAN,North America
BFA,Africa
 the Democratic Republic of the",CD
 Plurinational State of",BO
CIV,Africa
CUB,North America
HND,North America
JAM,North America
TUR,Asia
ABW,North America
CUW,North America
GAB,Africa
GHA,Africa
GUY,South America
VCT,North America
TTO,North America
ETH,Africa
GIN,Africa
KEN,Africa
XKV,Europe
SDN,Africa
ATG,North America
GNQ,Africa
SWZ,Africa
GTM,North America
KAZ,Asia
MRT,Africa
"NAM",Africa
RWA,Africa
LCA,North America
SYC,Africa
SUR,South America
URY,South America
 Bolivarian Republic of",VE
BHS,North America
CAF,Africa
COG,Africa
UZB,Asia
BEN,Africa
LBR,Africa
MMR,Asia
SOM,Africa
 United Republic of",TZ
BRB,North America
GMB,Africa
MNE,Europe
DJI,Africa
SLV,North America
PYF,Oceania
GUM,Oceania
KGZ,Asia
NIC,North America
ZMB,Africa
BMU,North America
CYM,North America
TCD,Africa
FJI,Oceania
GIB,Europe
GRL,North America
GGY,Europe
HTI,North America
JEY,Europe
MUS,Africa
CPV,Africa
IMN,Europe
MDG,Africa
MSR,North America
NCL,Oceania
NER,Africa
PNG,Oceania
ZWE,Africa
AGO,Africa
ERI,Africa
TLS,Oceania
UGA,Africa
DMA,North America
GRD,North America
MOZ,Africa
SYR,Asia
BLZ,North America
 U.S.",VI
LAO,Asia
LBY,Africa
TCA,North America
MLI,Africa
KNA,North America
AIA,North America
 British",VG
GNB,Africa
PRI,North America
MNP,Oceania
BWA,Africa
BDI,Africa
SLE,Africa
 Sint Eustatius and Saba",BQ
MWI,Africa
FLK,South America
SSD,Africa
STP,Africa
YEM,Asia
ESH,Africa
TJK,Asia
COM,Africa
LSO,Africa
SLB,Oceania
WLF,Oceania
MHL,Oceania
VUT,Oceania

Again in this case it doesn’t quite behave as you might think!

Finding files and finding text

find

find is for locating files and directories by name:

# Everything in the `files/` subdirectory
find files
files
files/misc
files/misc/home-tree.txt
files/misc/root-tree.txt
files/.DS_Store
files/schedule.ics
files/01_apple_macintosh.png
files/01_bryant_hard_drive.png
files/fars_spreadsheet_raw.png
files/examples
files/examples/country_iso3.tsv
files/examples/jabberwocky.txt
files/examples/country_tab.csv
files/examples/ulysses.txt
files/examples/_make-example
files/examples/_make-example/mypaper.md
files/examples/_make-example/fig1.r
files/examples/_make-example/Makefile
files/examples/_make-example/README.md
files/examples/_make-example/.gitignore
files/examples/_make-example/.RData
files/examples/rfm_table.csv
files/examples/01_mptc_oecd_nocode.pdf
files/examples/.DS_Store
files/examples/countries.csv
files/examples/specials.txt
files/examples/gapminder_xtra.csv
files/examples/bashrc.txt
files/examples/apple_mobility_daily_2021-04-12.csv
files/examples/alice_in_wonderland.txt
files/examples/continent_tab.tsv
files/examples/first_terms.csv
files/examples/symptoms.xlsx
files/examples/roman.txt
files/examples/fruit.txt
files/examples/shalott_1832.txt
files/examples/year_tab.tsv
files/examples/fars_crash_report.xlsx
files/examples/organdonation.csv
files/examples/continent_tab.csv
files/examples/pride_and_prejudice.txt
files/examples/basics.txt
files/examples/01_mptc_oecd_withcode.pdf
files/examples/continent_sizes.csv
files/examples/country-intermediate.tsv
files/examples/SAS_on_2021-04-13.csv
files/examples/country-working.tsv
files/examples/words.txt
files/examples/mortality.txt
files/examples/sentences.txt
files/examples/ascii_table.xlsx
files/examples/gss_panel_long.dta
files/examples/congress
files/examples/congress/23_101_congress.csv
files/examples/congress/28_106_congress.csv
files/examples/congress/08_86_congress.csv
files/examples/congress/05_83_congress.csv
files/examples/congress/31_109_congress.csv
files/examples/congress/24_102_congress.csv
files/examples/congress/16_94_congress.csv
files/examples/congress/37_115_congress.csv
files/examples/congress/13_91_congress.csv
files/examples/congress/25_103_congress.csv
files/examples/congress/30_108_congress.csv
files/examples/congress/01_79_congress.csv
files/examples/congress/09_87_congress.csv
files/examples/congress/36_114_congress.csv
files/examples/congress/17_95_congress.csv
files/examples/congress/22_100_congress.csv
files/examples/congress/04_82_congress.csv
files/examples/congress/29_107_congress.csv
files/examples/congress/12_90_congress.csv
files/examples/congress/15_93_congress.csv
files/examples/congress/11_89_congress.csv
files/examples/congress/35_113_congress.csv
files/examples/congress/06_84_congress.csv
files/examples/congress/26_104_congress.csv
files/examples/congress/03_81_congress.csv
files/examples/congress/32_110_congress.csv
files/examples/congress/18_96_congress.csv
files/examples/congress/21_99_congress.csv
files/examples/congress/07_85_congress.csv
files/examples/congress/10_88_congress.csv
files/examples/congress/33_111_congress.csv
files/examples/congress/14_92_congress.csv
files/examples/congress/02_80_congress.csv
files/examples/congress/38_116_congress.csv
files/examples/congress/34_112_congress.csv
files/examples/congress/20_98_congress.csv
files/examples/congress/27_105_congress.csv
files/examples/congress/19_97_congress.csv
files/examples/fars0-17daily.csv
files/examples/shalott_1842.txt
files/examples/alice_noboiler.txt
files/examples/countries_iso3.csv
files/examples/country_tab.tsv
files/examples/zshrc.txt
files/scripts
files/scripts/hello-world.sh
files/scripts/make-thumbnail.sh
files/bib
files/bib/samplesyllabus.csl
files/bib/american-political-science-association.csl
files/bib/references.bib
files/bib/chicago-fullnote-bibliography-no-bib.csl
files/bib/mptc_references.bib
files/bib/chicago-fullnote-bibliography.csl
files/bib/chicago-syllabus-no-bib.csl
files/bib/apa.csl
files/bib/chicago-author-date.csl
files/bib/.auctex-auto
files/bib/.auctex-auto/references.el
files/bib/chicago-note-bibliography.csl
files/01_1890_hollerith_codes.png

find

We can use globbing (or wildcards) to narrow our search:

# Everything underneath the `files/` subdirectory
# whose name ends in `.csl`
find files -name "*.csl"
files/bib/samplesyllabus.csl
files/bib/american-political-science-association.csl
files/bib/chicago-fullnote-bibliography-no-bib.csl
files/bib/chicago-fullnote-bibliography.csl
files/bib/chicago-syllabus-no-bib.csl
files/bib/apa.csl
files/bib/chicago-author-date.csl
files/bib/chicago-note-bibliography.csl

find

Here we use the . to mean “Search in the current folder”

find . -name "*.xlsx"
./files/examples/symptoms.xlsx
./files/examples/fars_crash_report.xlsx
./files/examples/ascii_table.xlsx
./data/schedule.xlsx
./data/data_sources.xlsx

find

  • The -exec option lets us do things with each result.
  • The {} expands to each found file in turn.
  • Here we use echo to see what the rm (remove) command would do.
  • The quoted semicolon ";" or \; is required to end the line
find files -name "*.png" -exec echo rm {} ";"
rm files/01_apple_macintosh.png
rm files/01_bryant_hard_drive.png
rm files/fars_spreadsheet_raw.png
rm files/01_1890_hollerith_codes.png

If we omitted the echo here the found files really would be deleted one at a time.

find

We can also use xargs to act on search results:

# Everything underneath the `files/` subdirectory
# whose name ends in `.png`
find files -name "*.png"
files/01_apple_macintosh.png
files/01_bryant_hard_drive.png
files/fars_spreadsheet_raw.png
files/01_1890_hollerith_codes.png

Convert all these png files to jpg:

# Convert everything underneath the `files/` subdirectory
# whose name ends in `.png` to `.jpg` format, keeping the original files.
find files -name '*.png' -print0 | xargs -0 -r mogrify -format jpg

find

Check:

find files -name '*.png'
find files -name '*.jpg'
files/01_apple_macintosh.png
files/01_bryant_hard_drive.png
files/fars_spreadsheet_raw.png
files/01_1890_hollerith_codes.png
files/01_apple_macintosh.jpg
files/01_bryant_hard_drive.jpg
files/fars_spreadsheet_raw.jpg
files/01_1890_hollerith_codes.jpg

Delete them (with another method of deletion):

find files  -name '*.jpg' -type f -delete

Perspective

Obviously you will not be doing this sort of thing every day of the week. But you may well want to programmatically rename, move, convert, or otherwise maniplate files in batches from time to time. Especially if there are a lot of them, the shell can help you.

Naming things

Naming files

  • The better your names for things, the easier they will be to find (and programmatically work with)
  • In civilized operating systems, names containing spaces and special characters (such as ? ! , . # $ * <space> and the like) are not a problem.
  • However, the more you work programatically, the more you will want to avoid them.
  • Jenny Bryan’s 5 minute Normconf talk is a good overview of good habits

Naming files

  • Names should tell you something about what the file is
  • Names should avoid spaces and punctuation
  • Names should follow some reasonable convention
  • Names with numbers should sort in useful ways
  • Names should not be used to track the versions of files

Naming files

Find all files in or below the project directory that end in .qmd:

find . -name "*.qmd"
./schedule/index.qmd
./example/04-example.qmd
./example/08-example.qmd
./example/01-example.qmd
./example/02-example.qmd
./example/07-example.qmd
./example/index.qmd
./example/09-example.qmd
./example/05-example.qmd
./example/06-example.qmd
./example/03-example.qmd
./content/09-content.qmd
./content/05-content.qmd
./content/10-content.qmd
./content/06-content.qmd
./content/03-content.qmd
./content/index.qmd
./content/04-content.qmd
./content/01-content.qmd
./content/08-content.qmd
./content/02-content.qmd
./content/07-content.qmd
./assignment/04-assignment.qmd
./assignment/03-assignment.qmd
./assignment/02-assignment.qmd
./assignment/05-assignment.qmd
./assignment/07-assignment.qmd
./assignment/08-assignment.qmd
./assignment/01-assignment.qmd
./assignment/09-assignment.qmd
./assignment/06-assignment.qmd
./assignment/index.qmd
./about/index.qmd
./index.qmd
./slides/08-slides.qmd
./slides/05b-slides.qmd
./slides/05-slides.qmd
./slides/00-slides.qmd
./slides/07-slides.qmd
./slides/02-slides.qmd
./slides/01a-slides.qmd
./slides/01b-slides.qmd
./slides/10-slides.qmd
./slides/09-slides.qmd
./slides/04-slides.qmd
./slides/03-slides.qmd
./slides/06-slides.qmd
./syllabus/index.qmd
./README.qmd

Naming files

Find all files in or below the current directory that start with two characters followed by -example and end with any other number of characters:

find . -name "??-example*"
./example/08-example.html
./example/04-example.qmd
./example/08-example.qmd
./example/09-example_files
./example/09-example.html
./example/01-example.qmd
./example/02-example.qmd
./example/04-example.html
./example/03-example.html
./example/07-example_files
./example/07-example.qmd
./example/02-example.html
./example/05-example.html
./example/09-example.qmd
./example/07-example.html
./example/01-example.html
./example/06-example.html
./example/05-example.qmd
./example/06-example.qmd
./example/03-example.qmd
./_freeze/example/03-example
./_freeze/example/02-example
./_freeze/example/09-example
./_freeze/example/08-example
./_freeze/example/01-example
./_freeze/example/04-example
./_freeze/example/05-example
./_freeze/example/07-example
./_freeze/example/06-example
./.quarto/idx/example/08-example.qmd.json
./.quarto/idx/example/03-example.qmd.json
./.quarto/idx/example/07-example.qmd.json
./.quarto/idx/example/09-example.qmd.json
./.quarto/idx/example/02-example.qmd.json
./.quarto/idx/example/06-example.qmd.json
./.quarto/idx/example/01-example.qmd.json
./.quarto/idx/example/05-example.qmd.json
./.quarto/idx/example/04-example.qmd.json
./.quarto/_freeze/example/03-example
./.quarto/_freeze/example/02-example
./.quarto/_freeze/example/09-example
./.quarto/_freeze/example/08-example
./.quarto/_freeze/example/01-example
./.quarto/_freeze/example/04-example
./.quarto/_freeze/example/05-example
./.quarto/_freeze/example/07-example
./.quarto/_freeze/example/06-example

Sort order

mkdir tmp
touch tmp/{1..15}.txt

See how these sort:

ls tmp/
1.txt
10.txt
11.txt
12.txt
13.txt
14.txt
15.txt
2.txt
3.txt
4.txt
5.txt
6.txt
7.txt
8.txt
9.txt

Not what we want.

Sort order

rm -f tmp/*.txt
touch tmp/{01..15}.txt
ls tmp/
01.txt
02.txt
03.txt
04.txt
05.txt
06.txt
07.txt
08.txt
09.txt
10.txt
11.txt
12.txt
13.txt
14.txt
15.txt

Sort order

rm -f tmp/*.txt
touch tmp/{a..d}{01..03}.txt
ls -l tmp/
rm -rf tmp/
rm -rf ../tmp/
total 0
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 a01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 a02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 a03.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 b01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 b02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 b03.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 c01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 c02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 c03.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 d01.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 d02.txt
-rw-r--r--@ 1 kjhealy  staff  0 Nov  5 10:57 d03.txt

In general keep your names lower-case.

Dates

Use the one true YMD format, ISO 8601:

YYYY-MM-DD

Naming files

  • Be consistent in your use of naming conventions
  • No need to get too clever, but …
data_clean/
data_raw/
docs/
figures/
R/01_clean-data.R
R/02_process-data.R
R/03_descriptive-figs-tables.R
R/04_brms-model.R
paper/
README.md

Unix naming conventions

  • Dotfiles and underscores
ls -l
total 408
drwxr-xr-x@  8 kjhealy  staff    256 Aug 15  2023 R
-rw-r--r--@  1 kjhealy  staff   1967 Sep 18  2023 README.md
-rw-r--r--   1 kjhealy  staff   1764 Jan 23  2024 README.qmd
drwxr-xr-x@  3 kjhealy  staff     96 Sep 18  2023 README_files
drwxr-xr-x   3 kjhealy  staff     96 Jan  9  2024 _extensions
drwxr-xr-x@ 10 kjhealy  staff    320 Nov  5 10:57 _freeze
-rw-r--r--@  1 kjhealy  staff   4750 Nov  5 10:34 _quarto.yml
drwxr-xr-x@  2 kjhealy  staff     64 Nov  5 10:56 _site
drwxr-xr-x@  7 kjhealy  staff    224 Nov  5 10:54 _targets
-rw-r--r--@  1 kjhealy  staff   2737 Sep 27 17:40 _targets.R
-rw-r--r--@  1 kjhealy  staff    997 Aug 26 14:40 _variables.yml
drwxr-xr-x@  4 kjhealy  staff    128 Nov  5 10:57 about
drwxr-xr-x@ 16 kjhealy  staff    512 Nov  5 10:12 assets
drwxr-xr-x@ 22 kjhealy  staff    704 Nov  5 10:57 assignment
lrwxr-xr-x   1 kjhealy  staff    135 Nov  5 10:23 avhrr -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr
drwxr-xr-x@ 24 kjhealy  staff    768 Nov  5 10:57 content
drwxr-xr-x@  5 kjhealy  staff    160 Nov  5 10:14 data
-rwxr-xr-x@  1 kjhealy  staff    437 Oct 23 08:19 deploy.sh
drwxr-xr-x@ 24 kjhealy  staff    768 Nov  5 10:57 example
drwxr-xr-x@ 12 kjhealy  staff    384 Nov  5 10:57 files
drwxr-xr-x  14 kjhealy  staff    448 Jan  8  2024 html
-rw-r--r--@  1 kjhealy  staff  50673 Nov  5 10:57 index.html
-rw-r--r--@  1 kjhealy  staff   6937 Oct 23 10:59 index.qmd
-rw-r--r--@  1 kjhealy  staff    258 Oct 29 17:52 mptc.Rproj
drwxr-xr-x@  7 kjhealy  staff    224 Aug 15  2023 renv
-rw-r--r--@  1 kjhealy  staff  63998 Nov  4 08:04 renv.lock
-rw-r--r--   1 kjhealy  staff  46717 Dec 11  2023 renv.lock.orig
drwxr-xr-x@  4 kjhealy  staff    128 Nov  5 10:57 schedule
lrwxr-xr-x   1 kjhealy  staff     66 Nov  5 10:23 seas -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/World_Seas_IHO_v3
drwxr-xr-x@ 11 kjhealy  staff    352 Nov  5 10:57 site_libs
drwxr-xr-x@ 21 kjhealy  staff    672 Nov  5 10:57 slides
drwxr-xr-x   6 kjhealy  staff    192 Aug 28 09:24 staging
drwxr-xr-x   3 kjhealy  staff     96 Nov  5 10:54 syllabus

Unix naming conventions

ls -la
total 504
drwxr-xr-x@ 44 kjhealy  staff   1408 Nov  5 10:57 .
drwxr-xr-x@ 31 kjhealy  staff    992 Oct  6 09:28 ..
-rw-r--r--@  1 kjhealy  staff  10244 Sep  3 13:23 .DS_Store
-rw-r--r--@  1 kjhealy  staff  17417 Oct 31 15:07 .Rhistory
-rw-r--r--@  1 kjhealy  staff     26 Aug 15  2023 .Rprofile
drwxr-xr-x@  4 kjhealy  staff    128 Aug 10  2023 .Rproj.user
drwxr-xr-x@ 16 kjhealy  staff    512 Nov  5 10:42 .git
-rw-r--r--@  1 kjhealy  staff    346 Oct  8 11:21 .gitignore
-rw-r--r--   1 kjhealy  staff     71 Jan  9  2024 .gitmodules
-rw-r--r--@  1 kjhealy  staff    821 Aug 16  2023 .luarc.json
drwxr-xr-x@  6 kjhealy  staff    192 Nov  5 10:56 .quarto
drwxr-xr-x@  8 kjhealy  staff    256 Aug 15  2023 R
-rw-r--r--@  1 kjhealy  staff   1967 Sep 18  2023 README.md
-rw-r--r--   1 kjhealy  staff   1764 Jan 23  2024 README.qmd
drwxr-xr-x@  3 kjhealy  staff     96 Sep 18  2023 README_files
drwxr-xr-x   3 kjhealy  staff     96 Jan  9  2024 _extensions
drwxr-xr-x@ 10 kjhealy  staff    320 Nov  5 10:57 _freeze
-rw-r--r--@  1 kjhealy  staff   4750 Nov  5 10:34 _quarto.yml
drwxr-xr-x@  2 kjhealy  staff     64 Nov  5 10:56 _site
drwxr-xr-x@  7 kjhealy  staff    224 Nov  5 10:54 _targets
-rw-r--r--@  1 kjhealy  staff   2737 Sep 27 17:40 _targets.R
-rw-r--r--@  1 kjhealy  staff    997 Aug 26 14:40 _variables.yml
drwxr-xr-x@  4 kjhealy  staff    128 Nov  5 10:57 about
drwxr-xr-x@ 16 kjhealy  staff    512 Nov  5 10:12 assets
drwxr-xr-x@ 22 kjhealy  staff    704 Nov  5 10:57 assignment
lrwxr-xr-x   1 kjhealy  staff    135 Nov  5 10:23 avhrr -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr
drwxr-xr-x@ 24 kjhealy  staff    768 Nov  5 10:57 content
drwxr-xr-x@  5 kjhealy  staff    160 Nov  5 10:14 data
-rwxr-xr-x@  1 kjhealy  staff    437 Oct 23 08:19 deploy.sh
drwxr-xr-x@ 24 kjhealy  staff    768 Nov  5 10:57 example
drwxr-xr-x@ 12 kjhealy  staff    384 Nov  5 10:57 files
drwxr-xr-x  14 kjhealy  staff    448 Jan  8  2024 html
-rw-r--r--@  1 kjhealy  staff  50673 Nov  5 10:57 index.html
-rw-r--r--@  1 kjhealy  staff   6937 Oct 23 10:59 index.qmd
-rw-r--r--@  1 kjhealy  staff    258 Oct 29 17:52 mptc.Rproj
drwxr-xr-x@  7 kjhealy  staff    224 Aug 15  2023 renv
-rw-r--r--@  1 kjhealy  staff  63998 Nov  4 08:04 renv.lock
-rw-r--r--   1 kjhealy  staff  46717 Dec 11  2023 renv.lock.orig
drwxr-xr-x@  4 kjhealy  staff    128 Nov  5 10:57 schedule
lrwxr-xr-x   1 kjhealy  staff     66 Nov  5 10:23 seas -> /Users/kjhealy/Documents/data/misc/noaa_ncei/raw/World_Seas_IHO_v3
drwxr-xr-x@ 11 kjhealy  staff    352 Nov  5 10:57 site_libs
drwxr-xr-x@ 21 kjhealy  staff    672 Nov  5 10:57 slides
drwxr-xr-x   6 kjhealy  staff    192 Aug 28 09:24 staging
drwxr-xr-x   3 kjhealy  staff     96 Nov  5 10:54 syllabus

Unix naming conventions

  • Files and folders beginning with a period, ., are “hidden”
  • They won’t show up via ls
  • By convention they are often used for configuration information
  • Files or folders beginning with an underscore, _, are often “generated” (though this is a weak convention)
  • The structure of plain-text config files will depend on the thing they are configuring. It might just a list of words or options, or it might be a structured file based on a Markup language like YAML or TOML, or it might be written to be parsed in R or Python, etc.

Unix naming conventions

Here’s the .gitignore file for this project:

.Rproj.user
.Rhistory
.RData
.Ruserdata

/.quarto/
/_site/
/renv/

/_freeze/
/_targets/

about/*.pdf
about/*.html
assignment/*.html
example/*.html
schedule/*.html
syllabus/*.html
data/dfstrat.csv
slides/*.pdf
slides/*.html
slides/**/*_cache/*
slides/libs/*
projects/*.zip
seas
avhrr 

# knitr and caching
**/*_files/*
**/*_cache/*

README.html

/.luarc.json

Customizing your shell

Bash (often the Linux default)

A .bashrc file to configure non-login shells for Bash:

# Put the contents of this file in your ~/.bashrc file
# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything
case $- in
    *i*) ;;
      *) return;;
esac

# don't put duplicate lines or lines starting with space in the history.
# See bash(1) for more options
HISTCONTROL=ignoreboth

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=1000
HISTFILESIZE=2000

# check the window size after each command and, if necessary,
# update the values of LINES and COLUMNS.
shopt -s checkwinsize

# If set, the pattern "**" used in a pathname expansion context will
# match all files and zero or more directories and subdirectories.
#shopt -s globstar

# make less more friendly for non-text input files, see lesspipe(1)
#[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

# set variable identifying the chroot you work in (used in the prompt below)
if [ -z "${debian_chroot:-}" ] && [ -r /etc/debian_chroot ]; then
    debian_chroot=$(cat /etc/debian_chroot)
fi

# set a fancy prompt (non-color, unless we know we "want" color)
case "$TERM" in
    xterm-color) color_prompt=yes;;
esac

# uncomment for a colored prompt, if the terminal has the capability; turned
# off by default to not distract the user: the focus in a terminal window
# should be on the output of commands, not on the prompt
force_color_prompt=yes

if [ -n "$force_color_prompt" ]; then
    if [ -x /usr/bin/tput ] && tput setaf 1 >&/dev/null; then
        # We have color support; assume it's compliant with Ecma-48
        # (ISO/IEC-6429). (Lack of such support is extremely rare, and such
        # a case would tend to support setf rather than setaf.)
        color_prompt=yes
    else
        color_prompt=
    fi
fi

if [ "$color_prompt" = yes ]; then
#    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
     PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\H\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\] \$ '
else
    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
fi
unset color_prompt force_color_prompt

# If this is an xterm set the title to user@host:dir
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\u@\h: \w\a\]$PS1"
    ;;
*)
    ;;
esac

# enable color support of ls and also add handy aliases
if [ -x /usr/bin/dircolors ]; then
    test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)"
    alias ls='ls --color=auto'
    #alias dir='dir --color=auto'
    #alias vdir='vdir --color=auto'

    alias grep='grep --color=auto'
    alias fgrep='fgrep --color=auto'
    alias egrep='egrep --color=auto'
fi

# some more ls aliases
#alias ll='ls -l'
#alias la='ls -A'
#alias l='ls -CF'

# Alias definitions.
# You may want to put all your additions into a separate file like
# ~/.bash_aliases, instead of adding them here directly.
# See /usr/share/doc/bash-doc/examples in the bash-doc package.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

# enable programmable completion features (you don't need to enable
# this, if it's already enabled in /etc/bash.bashrc and /etc/profile
# sources /etc/bash.bashrc).
if ! shopt -oq posix; then
  if [ -f /usr/share/bash-completion/bash_completion ]; then
    . /usr/share/bash-completion/bash_completion
  elif [ -f /etc/bash_completion ]; then
    . /etc/bash_completion
  fi
fi

Zsh (the Mac default)

# Put the contents of this file in your ~/.zshrc file.
# Source: https://github.com/belak/zsh-utils?tab=readme-ov-file

[[ ! -d "$HOME/.antigen" ]] && git clone https://github.com/zsh-users/antigen.git "$HOME/.antigen"
source "$HOME/.antigen/antigen.zsh"

# Set the default plugin repo to be zsh-utils
antigen use belak/zsh-utils --branch=main

# Specify completions we want before the completion module
antigen bundle zsh-users/zsh-completions

# Specify plugins we want
antigen bundle editor@main
antigen bundle history@main
antigen bundle prompt@main
antigen bundle utility@main
antigen bundle completion@main

# Specify additional external plugins we want
antigen bundle zsh-users/zsh-syntax-highlighting

# Load everything
antigen apply

# Set any settings or overrides here
prompt belak
bindkey -e

Caution

Don’t blindly install things

Installing things via shell scripts should only be done from trusted sources!

The Unix way of thinking

Stepping back

  • Your computer stores files and runs commands.
  • The files are stored in a large hierarchy called a filesystem.
  • You issue instructions to run particluar commands at a command line that is provided by a shell, which is how you the user talk to the operating system.
  • Unix commands and utilities generally try to do a specific thing to files or running processes.
  • The Unix conception of a ‘file’ is very flexible. Connections to other computers can act like files.
  • Unix commands are often composable using pipes.

The Unix pipe

  • Unix commands work with some input and may produce some output
  • Unix systems have the concepts of “standard input”, “standard output”, and “standard error” as streams where things come from, where they go to, and where problems are reported.
  • The idea of a sequence of commands or, more generally, functions that can be composed or pipelined in a smooth sequence is a very general and very powerful idea that we will soon see in action in R and that you may come across in many other settings as well.

The Unix pipe

  • The output of the ls command again:
ls
R
README.md
README.qmd
README_files
_extensions
_freeze
_quarto.yml
_site
_targets
_targets.R
_variables.yml
about
assets
assignment
avhrr
content
data
deploy.sh
example
files
html
index.html
index.qmd
mptc.Rproj
renv
renv.lock
renv.lock.orig
schedule
seas
site_libs
slides
staging
syllabus

The Unix pipe

We can send, or pipe, this output to another command, instead of to the terminal:

ls | wc -l
      33
  • The wc command counts the number of words in a file, or in whatever is sent to it via STDIN.
  • The -l switch to wc means ‘just count lines instead of words’

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 ls -lh access.log
-rw-r--r-- 1 root root 7.0M Aug 29 16:00 access.log
 head access.log
192.195.49.31 - - [27/Aug/2023:00:01:11 +0000] "GET / HTTP/1.1" 200 19219 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /libs/tufte-css-2015.12.29/tufte.css HTTP/1.1" 200 2025 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /libs/tufte-css-2015.12.29/envisioned.css HTTP/1.1" 200 888 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/tablesaw-stackonly.css HTTP/1.1" 200 1640 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/nudge.css HTTP/1.1" 200 1675 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:12 +0000] "GET /css/sourcesans.css HTTP/1.1" 200 1492 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/jquery.js HTTP/1.1" 200 30464 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/tablesaw-stackonly.js HTTP/1.1" 200 2996 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
192.195.49.31 - - [27/Aug/2023:00:01:13 +0000] "GET /js/nudge.min.js HTTP/1.1" 200 937 "https://socviz.co/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 Edg/116.0.1938.54"
52.13.187.67 - - [27/Aug/2023:00:01:13 +0000] "GET /dataviz-pdfl_files/figure-html4/ch-03-fig-lexp-gdp-10-1.png HTTP/1.1" 200 308830 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 head access.log | awk '// {print $11}'

"https://www.google.com/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"https://socviz.co/"
"-"

The Unix pipe

Like with pipelines in R, we can compose sequences of actions at the prompt:

 awk '// {print $11}' access.log | sort | uniq -c | sort -nr | head -n 15

   9729 "https://socviz.co/lookatdata.html"
   4851 "-"
   4212 "https://socviz.co/"
   1719 "https://socviz.co/makeplot.html"
   1477 "https://bookdown.org/"
   1466 "https://socviz.co/gettingstarted.html"
   1373 "https://socviz.co/groupfacettx.html"
    864 "https://socviz.co/workgeoms.html"
    794 "https://socviz.co/maps.html"
    733 "https://socviz.co/refineplots.html"
    671 "https://socviz.co/index.html"
    349 "https://socviz.co/appendix.html"
    228 "https://socviz.co/modeling.html"
    153 "https://www.google.com/"
     50 "http://vissoc.co/"

The Unix pipe

We can do a lot with a pipeline:

curl -s 'http://api.citybik.es/v2/networks/citi-bike-nyc' |
   jq '.network.stations[].free_bikes' |
  gpaste -sd+ |
  bc
32517

This is the number of Citi Bikes available in New York City at the time these slides were made.

We usually won’t use the Unix command line or shell to things like this. We’ll do it in R. You could also do it in other languages. But basic shell competence remains extremely handy for many more common tasks.

Shell Scripting

Shell Scripts

  • If you find yourself doing the same task repeatedly, think about whether it makes sense to write a script
  • Shell scripts can become mini-programs, but can also be just one or two lines that pull together a few commands
  • They really show their strength when there’s some fiddly thing you want to do to a lot of files or directories

Shell Scripts

#!/usr/bin/env bash

echo "Hello World!"
  • #! or “shebang” line saying where the interpreter is
  • chmod 755 script.sh or chmod +x script.sh to make executable
  • The interpreter doesn’t have to be the shell: other languages can be scripted too

Shell Scripts

#!/usr/bin/env bash

# Make a thumbnail for each PNG
for i in *.png; do

  FILENAME=$(basename -- "$i") # Full filename
  EXTENSION="${FILENAME##*.}" # Extension only
  FILENAME="${FILENAME%.*}" # Filename without extension

  convert "$i" -thumbnail 500 "$FILENAME-thumb.$EXTENSION";

done;

Shell Scripts

  • The shell can talk to the clipboard:
echo I am sending this sentence to the clipboard | pbcopy
  • Back from the clipboard:
pbpaste | wc -c
      44

In an era of Generative AI and LLMs, why are we covering this stuff?

Because Unix is still everywhere

“Why am I doing this?”

  • As soon as you try to do anything of any sort of technical complexity, or just simple reproducibility, with your computer—even using the newest and coolest tools—I promise you’ll eventually find yourself in a world governed by the metaphors and methods Unix originated, and, very likely, in a literal Unix-derived environment.

  • That is, you will be in some sort of folder-based hierarchy; you will edit plain-text files in order to configure, launch, generate, or capture the output of applications; and you will do this by way of instructions written down as a series of commands that follow some sort of regular syntax. The details of those instructions (and the particular conventions they use) will vary depending on the task at hand. But in essence you will always be doing the same thing.