Assignment 03b: The Shell, part 2
Due by
There’s no need to submit this assignment. But do it, and be prepared to show your work in class next Monday.
We’re going to run through some exercises to practice using the shell. If you’re on Windows, I recommend using Git Bash, most easily accessed as the Terminal tab from RStudio.
Part 1: Get the example files
You can do this in one of two ways:
Use a browser to download the zip file from this link and unzip it into a folder on your computer.
Use
curlfrom the command line to download the zip file directly into a folder on your computer, then useunzipto unzip it. For example:
## First
curl -o mptc_text_examples.zip https://kjhealy.co/mptc/mptc_text_examples.zip
## Then
unzip mptc_text_examples.zipNext, make sure you are in the mptc_text_examples directory. In a Terminal window, you can use cd to change directories and ls to list the files in the current directory. The output of ls should look something like this:
alice_in_wonderland.txt countries.csv pride_and_prejudice.txt
alice_noboiler.txt country_iso3.tsv README.md
apple_mobility_daily_2021-04-12.csv country_tab.csv roman.txt
ascii_table.xlsx country_tab.tsv sentences.txt
bashrc.txt country-intermediate.tsv shalott_1832.txt
basics.txt country-working.tsv shalott_1842.txt
basicsduped.txt fruit.txt specials.txt
continent_sizes.csv jabberwocky.txt ulysses.txt
continent_tab.csv make_example words.txt
continent_tab.tsv mortality.txt zshrc.txt
countries_iso3.csv mptc_text_examples.RprojNow, you can either stay in the terminal window or launch RStudio and use the Terminal tab there. Either way, you should be in the mptc_text_examples directory. To launch RStudio, do open mptc_text_examples.Rproj from inside the examples folder.
Part 2: Shell exercise
Q1.
Using cat, display the contents of basics.txt to the terminal.
- A. How many lines are in the file? (Hint: use
wc -l.) - B. How many words are in the file? (Hint: use
wc -w.) - C. How many characters are in the file? (Hint: use
wc -c.)
Q2.
Using cat, display the contents of basicsduped.txt to the terminal.
- A. How many lines are in
basicsduped.txt? - B. I want to know how many unique lines are in
basicsduped.txt. There is a command that seems like it should do this:uniq. Try it out. What happens? Did it work? - C. Read the help file for
uniq(tryman uniq). What does it say about howuniqworks? Why didn’t it work the way you might have expected? - D. Having read the help file, how can you use
uniqto get the number of unique lines inbasicsduped.txt?
Q3.
A. The
trcommand translates or transforms characters according to some substitution or deletion rule. For example, you can use it to convert all uppercase letters to lowercase letters. Try it out by runningcat basics.txt | tr '[:upper:]' '[:lower:]'.B.
trcan also be used to replace characters with other characters. For example, you can convert all commas to underscores withcat basics.txt | tr ',' '_'. Try it out.C.
trcan be used to delete characters. For example, you can delete all commas with
cat basics.txt | tr -d ','`Try it out.
- D. Some characters are special and need to be “escaped” with a backslash. For example, the newline character is represented by
\n. Try usingtrto convert all commas inbasics.txtto newlines with
cat basics.txt | tr ',' '\n'Besides [:upper:] and [:lower:], tr also has some additional classes of characters that can be used. For example, [:space:] represents all whitespace characters (spaces, tabs, newlines). Try using tr to convert all whitespace characters in basics.txt to newlines with
cat basics.txt | tr '[:space:]' '\n'Q4.
A. Now, using
cat, andtrand the pipe, put together a sequence of actions that displays the contents ofbasics.txtwith each word on its own line. Make sure that the output contains no punctuation, and no uppercase letters.B. To remove any blank lines, add
| grep -v '^$'to the end of your sequence of piped commands. (The-voption tellsgrepto return all lines except those that match the pattern, and^$is a regular expression that matches blank lines. We will learn more about regular expressions later in the course.)C. Finally, count the number of unique words in
basics.txt.