Assignment 03b: The Shell, part 2
Due by
There’s no need to submit this assignment. But do it, and be prepared to show your work in class next Monday.
We’re going to run through some exercises to practice using the shell. If you’re on Windows, I recommend using Git Bash, most easily accessed as the Terminal tab from RStudio.
Part 1: Get the example files
You can do this in one of two ways:
Use a browser to download the zip file from this link and unzip it into a folder on your computer.
Use
curl
from the command line to download the zip file directly into a folder on your computer, then useunzip
to unzip it. For example:
## First
curl -o mptc_text_examples.zip https://kjhealy.co/mptc/mptc_text_examples.zip
## Then
unzip mptc_text_examples.zip
Next, make sure you are in the mptc_text_examples directory. In a Terminal window, you can use cd
to change directories and ls
to list the files in the current directory. The output of ls
should look something like this:
alice_in_wonderland.txt countries.csv pride_and_prejudice.txt
alice_noboiler.txt country_iso3.tsv README.md
apple_mobility_daily_2021-04-12.csv country_tab.csv roman.txt
ascii_table.xlsx country_tab.tsv sentences.txt
bashrc.txt country-intermediate.tsv shalott_1832.txt
basics.txt country-working.tsv shalott_1842.txt
basicsduped.txt fruit.txt specials.txt
continent_sizes.csv jabberwocky.txt ulysses.txt
continent_tab.csv make_example words.txt
continent_tab.tsv mortality.txt zshrc.txt
countries_iso3.csv mptc_text_examples.Rproj
Now, you can either stay in the terminal window or launch RStudio and use the Terminal tab there. Either way, you should be in the mptc_text_examples
directory. To launch RStudio, do open mptc_text_examples.Rproj
from inside the examples folder.
Part 2: Shell exercise
Q1.
Using cat
, display the contents of basics.txt
to the terminal.
- A. How many lines are in the file? (Hint: use
wc -l
.) - B. How many words are in the file? (Hint: use
wc -w
.) - C. How many characters are in the file? (Hint: use
wc -c
.)
Q2.
Using cat
, display the contents of basicsduped.txt
to the terminal.
- A. How many lines are in
basicsduped.txt
? - B. I want to know how many unique lines are in
basicsduped.txt
. There is a command that seems like it should do this:uniq
. Try it out. What happens? Did it work? - C. Read the help file for
uniq
(tryman uniq
). What does it say about howuniq
works? Why didn’t it work the way you might have expected? - D. Having read the help file, how can you use
uniq
to get the number of unique lines inbasicsduped.txt
?
Q3.
A. The
tr
command translates or transforms characters according to some substitution or deletion rule. For example, you can use it to convert all uppercase letters to lowercase letters. Try it out by runningcat basics.txt | tr '[:upper:]' '[:lower:]'
.B.
tr
can also be used to replace characters with other characters. For example, you can convert all commas to underscores withcat basics.txt | tr ',' '_'
. Try it out.C.
tr
can be used to delete characters. For example, you can delete all commas with
cat basics.txt | tr -d ','`
Try it out.
- D. Some characters are special and need to be “escaped” with a backslash. For example, the newline character is represented by
\n
. Try usingtr
to convert all commas inbasics.txt
to newlines with
cat basics.txt | tr ',' '\n'
Besides [:upper:]
and [:lower:]
, tr
also has some additional classes of characters that can be used. For example, [:space:]
represents all whitespace characters (spaces, tabs, newlines). Try using tr
to convert all whitespace characters in basics.txt
to newlines with
cat basics.txt | tr '[:space:]' '\n'
Q4.
A. Now, using
cat
, andtr
and the pipe, put together a sequence of actions that displays the contents ofbasics.txt
with each word on its own line. Make sure that the output contains no punctuation, and no uppercase letters.B. To remove any blank lines, add
| grep -v '^$'
to the end of your sequence of piped commands. (The-v
option tellsgrep
to return all lines except those that match the pattern, and^$
is a regular expression that matches blank lines. We will learn more about regular expressions later in the course.)C. Finally, count the number of unique words in
basics.txt
.