echo I am sending this sentence to the clipboard | pbcopy
Modern Plain Text Social Science: Week 3
September 18, 2023
#!
or “shebang” line saying where the interpreter ischmod 755 script.sh
or chmod +x script.sh
to make executableOr,
Waiter, there appears to be a language inside my language
☜ This book is a thing of beauty.
stringr
has the same basic form:.
.
” matches any character, how do you specifically match the character “.
”?\.
”\
is also used as an escape symbol in strings. So to create the regular expression \.
we need the string “\\.
”\
?Well that’s ugly
This is the price we pay for having to express searches for patterns using a language containing these same characters, which we may also want to search for.
I promise this will pay off!
^
to match the start of a string.$
to match the end of a string.^
to match the start of a string.$
to match the end of a string.^
and $
[1] │ <apple> pie
[2] │ <apple>
[3] │ <apple> cake
\d
matches any digit. \s
matches any whitespace (e.g. space, tab, newline). [abc]
matches a, b, or c. [^abc]
matches anything except a, b, or c.
Look for a literal character that normally has special meaning in a regex:
Use parentheses to make the precedence of |
clear:
?
is 0 or 1+
is 1 or more*
is 0 or more?
is 0 or 1+
is 1 or more*
is 0 or more?
is 0 or 1+
is 1 or more*
is 0 or more?
is 0 or 1+
is 1 or more*
is 0 or more{n}
is exactly n{n,}
is n or more{,m}
is at most m{n,m}
is between n and m{n}
is exactly n{n,}
is n or more{,m}
is at most m{n,m}
is between n and m{n}
is exactly n{n,}
is n or more{,m}
is at most m{n,m}
is between n and m{n}
is exactly n{n,}
is n or more{,m}
is at most m{n,m}
is between n and m{n}
is exactly n{n,}
is n or more{,m}
is at most m{n,m}
is between n and mBy default these are greedy matches. You can make them “lazy”, matching the shortest string possible by putting a ?
after them. This is often very useful!
{n}
is exactly n{n,}
is n or more{,m}
is at most m{n,m}
is between n and mBy default these are greedy matches. You can make them “lazy”, matching the shortest string possible by putting a ?
after them. This is often very useful!
[1] "apple" "apricot" "avocado"
[4] "banana" "bell pepper" "bilberry"
[7] "blackberry" "blackcurrant" "blood orange"
[10] "blueberry" "boysenberry" "breadfruit"
[13] "canary melon" "cantaloupe" "cherimoya"
[16] "cherry" "chili pepper" "clementine"
[19] "cloudberry" "coconut" "cranberry"
[22] "cucumber" "currant" "damson"
[25] "date" "dragonfruit" "durian"
[28] "eggplant" "elderberry" "feijoa"
[31] "fig" "goji berry" "gooseberry"
[34] "grape" "grapefruit" "guava"
[37] "honeydew" "huckleberry" "jackfruit"
[40] "jambul" "jujube" "kiwi fruit"
[43] "kumquat" "lemon" "lime"
[46] "loquat" "lychee" "mandarine"
[49] "mango" "mulberry" "nectarine"
[52] "nut" "olive" "orange"
[55] "pamelo" "papaya" "passionfruit"
[58] "peach" "pear" "persimmon"
[61] "physalis" "pineapple" "plum"
[64] "pomegranate" "pomelo" "purple mangosteen"
[67] "quince" "raisin" "rambutan"
[70] "raspberry" "redcurrant" "rock melon"
[73] "salal berry" "satsuma" "star fruit"
[76] "strawberry" "tamarillo" "tangerine"
[79] "ugli fruit" "watermelon"
Find all fruits that have a repeated pair of letters:
Backreferences and grouping are very useful for string replacements.
stringr
is Perl- or PCRE-like.rg
is quicker than grep
and has some nice featuresrg
is quicker than grep
and has some nice featuressed
, awk
, and grep
can all use some version of regular expressions.-i
option create backup files of everything it touches:# Find every Rmarkdown file beneath the current directory
# Then edit each one in place to replace every instance of
# `percent_format` with `label_percent`
find . -name "*.Rmd" | xargs perl -p -i.orig -e "s/percent_format/label_percent/g"
-i.orig
flag will back up e.g. analysis.Rmd
to analysis.Rmd.orig
.One view of things