some BASH topics

7 minute read

Regular expressions(REGEX) are sets of characters and/or metacharacters that match patterns —- REGEX intro. I created this article thanks to the doc. They are usefull for data manipulation, especially in Excel and Python. You can create a REGEX for any particular pattern expression of a text to find it.

**Quick links:

abcdefgh Hey doctor

CTRL + f in a word document and you can find easily abc* . Now, if you want to find fgh ..? This is what REGEX is built to handle : specific patterns at any place in any convoluted forms.

Online references:

Regular Expressions

Video tutorial

####Escapes: characters that have special meanning, to be escaped

.[{()\^$|?*+

Match Pattern

. - any character except new line
\d - Digit (0-9)
\D - Not a Digit (0-9)
\w - word Character (a-z, A-Z, 0-9, _)
\W - Not work character
\s - white spaces
\S - not white space

# anchors, don't match any characters
# match invisible positions
\b - Word Boundary
\B - Not word Boundary
^  - beginning of a string
$ - end of string

# character set
[...]  # match any one character in set
-  # specify range when used between number/letters
[^]  # not in the set 
|    # either or 
( )  # Group
 
# quantifier
* - match 0 or more
+ - match 1 or more
? - match 0 or One
{3} - match exact number
{3,4} - match a range of numbers (Minimum, Maximum)

lookaround: lookahead and lookbehind

Lookaround is an assertion (like line start or end point). It matches with characters and only returns match or no match. It does not consume characters in the string.

Basic syntax:

  • lookahead:
    • positive : (?=(regex))
    • different from : (?!(regex))
  • lookbehind:
    • positive: (?<=(regex))
    • different from : (?<!(regex))

Example:

// match "book" with "here" after it
book(?=.*here)

// match "page" with "book" before it
(?<=book.*)page

More Examples:

  • word boundary: \bHe, # He eHe
  • character set: a[fi]c, afc, aic
  • dash for range: [a-z0-9A-Z]
  • not in set: [^1-3]
  • quantifier examples
    • \d{3}: 123
    • Mr\.?\s[A-Z]\w*: Mr. Zeng, Mr Zeng
  • Group examples:
    • (Mr|Mrs)\.?\s\w+: Mr. Zeng, Mrs Zeng …

File Globbing

File Globbing and REGEX can be confusing. REGEX is used in functions for matching text in files, while globbing is used by shells to match file/directory names using wildcards.

Wildcards (some in REGEX may also apply):

  • *: match any string
  • {} is often used to extend list, eg:
    ls {a*,b*} lists files starting with either a or b.
  • []: same as in REGEX

Bash Arrays

  • Arrays can be constructed using round brackets:
    var=(item0 item1 item2) or
    var=($(ls -d ./))
  • To access items or change item values, we can use var[index]. Eg:
    var[index]=new_value
    echo ${var[index]}
    Note that when var is an array, the name var actually only refers to var[0]. To refer to the whole array, need to use var[@] or var[*].
  • sub-array expansion:
    • ${var[*]:s_ind} gives the subarray starting from index s_ind.
    • ${var[@]:s_ind:l} gives you the length l sub-array starting at index s_ind.
    • Can also replace @ with *.

vim

  • In normal mode: all keys are functional keys. Examples are: -p: paste
    • yy: copy current row to clip board
    • dd: copy row to clip board and delete
    • u (ctrl+R): undo (redo) changes
    • hjkl: left, down, right, up
    • :help <command>: get help on a command — vim open the command txt file
    • :wq or :x: w for save; q for quit
    • :q!: quit without saving
  • Insertion:
    • o: insert a new row after current row
    • O: insert a new row before current row
    • a: insert after cursor
    • i: insert at cursor
  • Cursor movement:
    • 0, :0: beginning of row, page
    • $, :$: end of row, page
    • ^: to first non-blank character
    • /pattern: search for pattern (press n to go to next)
    • H,M,L: move cursor to top, middle and bottom of page
    • Ctrl + E,Y: scroll up, down
    • Ctrl + u,d: half page up, down
    • w,W,e,E,b,B: jump cursor by words
  • tabs:
    • :tabedit file, :tabfind file: open new tab
    • gt, gT: next, previous tab
    • :tabonly: close all other tabs
    • :tabnew: open empty new tab
    • can use abreviations, such as :tabe, :tabf, …
    • :Explorer: explore folder with vim
  • string substitution:
    • %s/pattern/replacement/g: replace all occurrences
    • s/pattern/replacement/g: replace in current line
    • flags:
      • g for global
      • c for confirmation
      • i for case-insensitive
  • Visual Mode
    • type v to enter visual mode
    • move cursor to select text
    • y: copy
  • Others: -:syntax on/off : turn on/off text-highlighting colorscheme -:Explore . or :e .: explore current folder

find

General syntax: find path -name **** -mtime +1 -newer 20160621 -size +23M ... We will introduce each of above parameters and some more in this section:

  • find ./ -name "*.txt" : searching by name
  • find ./ -type d -name "*LZ*": specify target type, d for directory, f for file.

  • find ./ -newerct 20130323 (or a file) : file created ct after the date (also could be a file). can also use newer just for modified time

  • find ./ -mtime (-ctime, -atime) +n :
    • m for modified time
    • c for creation time
    • a for access time
    • +n for greater than n days, similarly -n for within n days. Can also change measures
    • can also use amin, cmin, mmin for minutes
  • find ./ -name "PowerGod*" -maxdepth 3:
    set maximum searching depth in this directory; similarly use mindepth to set minimum searching depth

  • -iname : to ignore case

  • piping results found: -exec cp {} ~/LZfolder/ \;: this command will copy the finded files to path ~/LZfolder/
    • finded file will be placed in the position of {} and execute the command

grep

grep is used for searching lines in a file with certain pattern strings. General formula: grep pattern filename
There are rich parameters you can specify:

  • grep abc$ file: match the end of a string

  • grep ^F file: match the beginning of string

  • grep -w over file: grep for words. In this example, words such as overdue, moreover would be skipped.

  • -A3: also show 3 lines after the lines found

  • -B3: show 3 lines before found lines

  • -C3: show 3 lines before and after

  • logical grep:

    • OR grep: grep pattern1|pattern2 filename
    • AND grep: grep pattern1.*pattern2 filename
    • NOT grep: grep -v pattern filename
      where -v stands for invert match

sed

sed is short for Stream EDitor General formula: sed 's/RegEx/replacement/g' file which will do the work of replacing RegEx with replacement.

  • the separator / could be replaced by something like _, |
    • eg: sed 's | age | year | ' file, and would still work.
  • simple back referencing, eg:

      $echo what | sed 's/wha/&&&/'  # input
      whawhawhat # output
    
  • more on back referencing, eg:

      echo 2014-04-01 | sed 's/\(....\)-\(..\)-\(..\)/\1+	\2+\3/'
      2014+04+01
    

Things in \(...\) are referred. A dot $\cdot$ in Regex can signify any character. Useful to use dots to describe patterns.

  • you can also sed multiple patterns separated by ;, eg: sed s/pattern1/replace1/;s/pattern2/replace2/g < file

head, tail

Shell scripting

Debugging:

  • bash (or sh) -v script.sh : displays each command as the program proceeds

  • bash (or sh) -x script.sh : displays values of variables as program runs

Others

Boolean value

You can try : false; echo $? The output is 1, which means in bash shell:
1 for false
0 for true

Different parenthesis and brackets

See Parenthesis difference.

  • Double parenthesis (arithmetic operator) :
    • (( expr )) : enables the usage of things like <, >, <= etc.
    • echo $(( 5 <= 3 )), and we get 0
    • arithmetic operator interprets 1 as true, and 0 as false, which is different from the test command

Braces

  • Used for parameter expansion. Can create lists which are often used in loops, eg:
$ echo {00..8..2} 
00 02 04 06 08 

Single and double square brackets

Much of below is from bash brackets, and bash test functions.

  • [ expression ] is the same as test expression. eg:
    test -e "$HOME" same as [ -e "$HOME" ]
    and both of them requires careful handling of escaping characters.

  • use -a, -o or ||, && for group testing. eg:

# the following are the same
$ test -e "file1" -a -d "file2"
$ test -e "file1" && test -d "file2"
$ [ -e "file1" ] && [ -d "file2" ]
$ [ -e "file1" -a "file2" ]

Note that [ expr1 ] -a [ expr2 ], [ expr1 && expr2 ] results in error.

  • [[ expression ]] allows you to use more natural syntax for file and string comparisons. If you want to compare number, it’s more common to use double brackets (( )).
    eg. [[ -e "file1" && -e "file2" ]].
    [[ ]] doesn’t support -a, -o inside.

Quotes

Things inside the same quote are considered as one variable.

  • Single quotes: preserves whatever inside
  • Double quotes: do not preserve words involving $ or \ and etc.

    See Quotes difference for more.

Environment Variables

  • $PS1: controls shell prompt
  • $PATH: when shell receives non-builtin command, it goes into $PATH to look for it.
  • $HOME: home directory

Easy command substitute

Say my previous command is vim project.txt. Now I want to open this file instead of using vim. Then I can simply input:

$ !vim:s/vim/open   

where $ is the shell prompt. Basically this is performing sed on whatever the results are from !vim.

Redirection

Bash shell has 3 basic streams: input(0), output(1), and error(2). We can use #number> to redirect them to somewhere else, eg:

  • Input redirection:
    • < or 0<
    • command << EOF, and then manually input argument file, using EOF to end inputting (or use ctrl + D). << is here document symbol.
    • command <<< string : it’s here string symbol. Can input a one row string argument.
  • output redirection:
    • > or 1>: redirect output
    • 2>: redirect error log
    • 2>1& : direct stderr to stdout stream, copy where stdout goes. And 1>2& means vice versa. Here the >& is a syntax to pipe one stream to another.
    • &> filename: join stdout and stderr in one stream, and put in a file.
# input redirection
$ ./myprog < file1.txt
# output and err redirection
$ ./myprog arg1 > file.out
$ ./myprog arg1 2> file.err
$ ./myprog arg1 &> out_and_err
  • Want no output
    Use command > /dev/null 2>&1

~/.bash_profile, ~/.profile and ~/.bashrc

These are files where you can personalize commands to be executed upon shell login.

A bash shell would look for ~/.bash_profile first. If it does not exist, it executes ~/.profile.

When you start a shell in an existing session (such as screen), you get an interactive, non-login shell. That shell may read configurations in ~/.bashrc.

See discussions:
login, non-login
different startup files

Command substitution

If we want to use the output of command 1 in a sentence, we can do it in the following two ways:

... ...  `command 1` ... ... #  method 1   
... ... $(command 1)  ... ... # method 2   

Say courses is a symbolic link I created. If I cd this link, and then print working directory:

$ cd courses 
$ pwd
/Users/lizeng/paths/courses

It’s showing the symbolic path, not the absolute path. To get the absolute path, we can resolve the link through:

$ pwd -P 
/Users/lizeng/Google Drive/Yale/courses

# same also works for many other commands
$ cd courses; cd ..; pwd
/Users/lizeng/paths 
$ cd courses; cd -P ..;pwd 
/Users/lizeng/Google Drive/Yale

The -P here stands for physical (directory)

Credits

  • Li Zeng documentation

Leave a Comment