The Advent of CLI - Day 7

Off course I’m late for those advent and a ton of things
(gimme a break December/January are my most busy time of the year)

Today, let’s provide some command-line options to our Bash scripts
and so make more useful little programs :slight_smile:


Parsing Command-line Options in Bash

For this we gonna use getopts, which is the Bash internal command,
and not the getopt (without s at the end) system tool.

So getopts help your program to parse short options with either a letter or digit
like -a, -B, -3, etc.

It follows this syntax

getopts optstring optname [ arg ]

it processes the positional parameters of the parent command (eg. the script itself),
which is usually stored in the variable $@.

for ex:
$ ./myscript -a arg1 -B arg2 -3

The var $@ will contain "-a arg1 -B arg2 -3", and so you use getopts to parse this string.

Here how it works

  • every time you run getopts
  • it looks for one of the options defined in optstring
    • if it find the option letter or digit
      it then place it into the variable optname
      • if the option is expecting an argument
        getopts get the argument and place it in $OPTARG
      • if an expected argument is not found
        it sets the variable optname to ":" (colon)
        and increment the positional index $OPTIND
    • if it does not find the option letter
      (eg. it does not match any letter defined in optstring)
      it sets the variable optname to "?" (question mark)

Few special cases

  • you can not use : or ? as options in optstring
    those are reserved characters
  • there is a special option -- (two dashes)
    that is interpreted by getopts as the end of all options
  • by default getopts is set with verbose error checking
    if it finds an unknown option or if an option argument is missing.
    If you starts the optstring with : then you will set
    the “silent error checking mode” and no error will be reported
  • you can manually pass custom arguments with [ arg ]
    eg. getopts optstring optname [ arg ]
    you could define an array myargs=(arg1 arg2 arg3)
    and do getopts optstring optname "${myargs[@]}"

getopts is meant to run multiple times in a loop,
so it can processes command-line options one by one.

here a little example that we will name aa-bash-opt1

#!/bin/bash

while getopts "a:B:3d" opt; do
    case "$opt" in
        a)
            echo "option a with value=${OPTARG}"
            ;;
        B)
            echo "option B with value=${OPTARG}"
            ;;
        3)
            echo "option 3"
            ;;
        d)
            echo "option d"
            ;;
        '?')
            echo "unknown option" >&2
            exit 1
            ;;
    esac
done

shift $((OPTIND-1));

MYARG="$1"

if [[ ! -z "$MYARG" ]]; then
  echo "positional parameter 1 = ${MYARG}"
fi

here some examples

$ ./aa-bash-opt1

(nothing to display)

$ ./aa-bash-opt1 -c

./aa-bash-opt1: illegal option -- c
unknown option

$ ./aa-bash-opt1 -a

./aa-bash-opt1: option requires an argument -- a
unknown option

$ ./aa-bash-opt1 -a hello

option a with value=hello

$ ./aa-bash-opt1 -a hello -B world

option a with value=hello
option B with value=world

$ ./aa-bash-opt1 -a hello -Bworld

option a with value=hello
option B with value=world

Note:
you can stick the value right after the single letter eg. -Bworld

$ ./aa-bash-opt1 -a hello -Bworld -3 foobar

option a with value=hello
option B with value=world
option 3
positional parameter 1 = foobar

foobar is seen as positional parameter not the value of the option -3

$ ./aa-bash-opt1 -a hello -Bworld -3

option a with value=hello
option B with value=world
option 3

Now it is a bit of a pain to have this loop kind of at the top of the script,
so you can nicely define it into a function

here we will name aa-bash-opt2

#!/bin/bash

show_help() {
cat << EOF
Usage: ${0##*/} [-a arg1] [-B arg2] [-3] [-d] myarg
Do something

options:
  -a arg1       Option 'a' with argument
  -B arg2       Option 'B' with argument
  -3            Option '3' without argument
  -d            Option 'd' without argument
  myarg         An arg not associated to any options

EOF
}

# options
MYARG=""

cli_options() {
    local OPTIND;
    while getopts "a:B:3d" opt; do
        case "$opt" in
            a)
                echo "option a with value=${OPTARG}"
                ;;
            B)
                echo "option B with value=${OPTARG}"
                ;;
            3)
                echo "option 3"
                ;;
            d)
                echo "option d"
                ;;
            '?')
                echo "unknown option" >&2
                show_help >&2
                exit 1
                ;;
        esac
    done
    shift $((OPTIND-1));

    MYARG="$1";
}

# --- main ---

# check cli options
cli_options "${@}";

if [[ ! -z "$MYARG" ]]; then
  echo "positional parameter 1 = ${MYARG}"
fi

couple of examples

$ ./aa-bash-opt2 foobar

positional parameter 1 = foobar

$ ./aa-bash-opt2 -f

./aa-bash-opt2: illegal option -- f
unknown option
Usage: aa-bash-opt2 [-a arg1] [-B arg2] [-3] [-d] myarg
Do something

options:
  -a arg1       Option 'a' with argument
  -B arg2       Option 'B' with argument
  -3            Option '3' without argument
  -d            Option 'd' without argument
  myarg         An arg not associated to any options


Practical Example with Github API

Let’s write a little script that allow to list Github downloads of a project.

Here the script, which I named aa-github-downloads

#!/bin/bash
# Show Github downloads

function show_help() {
cat << EOF
Usage: ${0##*/} [-h] [-D] [-V vendor] [-A] project
Show Github downloads

options:
  -h           Show this help
  -D           Debug mode
  -V vendor    The Github vendor
  -A           Show all tags
  project      The Github project

EOF
}

function split_with() {
    local string="$1";
    local sep="$2";

    SAVE_IFS=$IFS
    IFS=$sep
    result=($string);
    IFS=$SAVE_IFS

    echo ${result[@]};
}

function get_web_content() {
    local url="$1"
    local content=$(curl -s -L "$url" 2>&1)
    echo "$content"
}

function check_tool() {
    if (command -v "$1" >/dev/null 2>&1); then
        return 0
    else
        echo "You need to install '$1'"
        exit 1
    fi
}


# options
GITHUB_API_URL="https://api.github.com/repos/"
GITHUB_REPO=""
GITHUB_VENDOR=""
DEBUG=false
SHOW_ALL_TAG=false

function cli_options() {
    local OPTIND;
    while getopts "hDV:A" opt; do
        case "$opt" in
            h)
                show_help
                exit 0
                ;;
            D)
                # debug
                DEBUG=true
                ;;
            V)
                # vendor
                GITHUB_VENDOR="${OPTARG}"
                ;;
            A)
                # show all tags
                SHOW_ALL_TAG=true
                ;;
            '?')
                show_help >&2
                exit 1
                ;;
        esac
    done
    shift $((OPTIND-1));

    GITHUB_PROJECT="$1";
}

# --- main ---

# check cli options
cli_options "${@}";

# check project
if [[ -z "$GITHUB_PROJECT" ]]; then
    echo "The Github repository project is missing"
    echo "you have to provide a project argument"
    echo "eg. $ ./${0##*/} project"
    exit 1
fi

# check vendor
if [[ -z "$GITHUB_VENDOR" ]]; then
    echo "The Github repository vendor is missing"
    echo "you have to provide a vendor using the option -V"
    echo "eg. $ ./${0##*/} -V vendor project"
    exit 1
fi

GITHUB_URL="${GITHUB_API_URL}${GITHUB_VENDOR}/${GITHUB_PROJECT}/releases"

# debug output
if $DEBUG; then
    echo "debug:"
    echo "GITHUB_VENDOR=$GITHUB_VENDOR"
    echo "GITHUB_PROJECT=$GITHUB_PROJECT"
    echo "GITHUB_URL=$GITHUB_URL"
fi

# check curl is installed
check_tool curl
# check jq is installed
check_tool jq

# fetch data from Github API
GITHUB_DATA="$(get_web_content ${GITHUB_URL})"
SIMULATE_ERROR_1=false
SIMULATE_ERROR_2=false

if $SIMULATE_ERROR_1; then
    # simulate Github API error
    GITHUB_DATA=$(cat << EOF
{
  "message": "API rate limit exceeded for 127.0.0.1. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
  "documentation_url": "https://developer.github.com/v3/#rate-limiting"
}
EOF
)
fi

if $SIMULATE_ERROR_2; then
    # simulate Github API error
    GITHUB_DATA=$(cat << EOF
{
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository"
}
EOF
)
fi

# debug output
if $DEBUG; then
    echo "Github data:"
    echo "--------"
    echo -e "$GITHUB_DATA"
    echo "--------"
fi

# test for { "message": "..." }
FOUND_MESSAGE="$(echo -e "$GITHUB_DATA" | jq -r '.message' 2> /dev/null)"

if [[ ! -z "${FOUND_MESSAGE}" ]]; then
    ERROR_MESSAGE="$(echo -e "$GITHUB_DATA" | jq -r '.message')"
    echo "Github API Error:"
    echo "${ERROR_MESSAGE}"
    exit 1
fi

PROJECT_TAGS=($(echo "$GITHUB_DATA" | jq -r .[].tag_name))
TOTAL_TAGS=${#PROJECT_TAGS[@]}

if $SHOW_ALL_TAG; then
    MAX_TAGS=${TOTAL_TAGS}
else
    MAX_TAGS=1
fi

if $DEBUG; then
    echo "PROJECT_TAGS"
    echo "$PROJECT_TAGS"
    echo "TOTAL_TAGS = $TOTAL_TAGS"
    echo "  MAX_TAGS = $MAX_TAGS"
fi

echo "${GITHUB_VENDOR}/${GITHUB_PROJECT}"

if (($TOTAL_TAGS == 0)); then
    echo "No tags found"
    exit 0
fi

for (( i=0; i<${MAX_TAGS}; i+=1 )); do
    TAG=${PROJECT_TAGS[i]};
    echo "   ${TAG}"
    TAG_TOTAL_ASSETS=$(echo "$GITHUB_DATA" | jq -r ".[$i].assets | length")
    #echo "TAG_TOTAL_ASSETS = $TAG_TOTAL_ASSETS"
    for (( j=0; j<${TAG_TOTAL_ASSETS}; j+=1 )); do
        ASSET_NAME=$(echo "$GITHUB_DATA" | jq -r ".[$i].assets[$j].name")
        ASSET_DOWNLOAD_COUNT=$(echo "$GITHUB_DATA" | jq -r ".[$i].assets[$j].download_count")
        printf "     |_ %3dx\t%s\n" "${ASSET_DOWNLOAD_COUNT}" "${ASSET_NAME}"
    done
done

and here some examples

$ ./aa-github-downloads -h

Usage: aa-github-downloads [-h] [-D] [-V vendor] [-A] project
Show Github downloads

options:
  -h           Show this help
  -D           Debug mode
  -V vendor    The Github vendor
  -A           Show all tags
  project      The Github project

$ ./aa-github-downloads -V Corsaair as3shebang

Corsaair/as3shebang
   1.0.0
     |_ 618x    as3shebang_1.0.0_amd64.deb
     |_  27x    as3shebang_1.0.0_darwin-amd64.deb
     |_   3x    as3shebang_1.0.0_darwin-i386.deb
     |_   3x    as3shebang_1.0.0_i386.deb
     |_   6x    as3shebang_1.0.0_win32.deb
     |_  20x    as3shebang_1.0.0_win64.deb
     |_  10x    redtamarin-setup.bat

$ ./aa-github-downloads -A -V Corsaair as3shebang

Corsaair/as3shebang
   1.0.0
     |_ 618x    as3shebang_1.0.0_amd64.deb
     |_  27x    as3shebang_1.0.0_darwin-amd64.deb
     |_   3x    as3shebang_1.0.0_darwin-i386.deb
     |_   3x    as3shebang_1.0.0_i386.deb
     |_   6x    as3shebang_1.0.0_win32.deb
     |_  20x    as3shebang_1.0.0_win64.deb
     |_  10x    redtamarin-setup.bat
   v0.9-1
     |_  12x    as3shebang_0.9-1.pkg
     |_ 138x    as3shebang_0.9-1_amd64.deb

In the details, let’s look at few things

function show_help() {
cat << EOF
Usage: ${0##*/} [-h] [-D] [-V vendor] [-A] project
Show Github downloads

options:
  -h           Show this help
  -D           Debug mode
  -V vendor    The Github vendor
  -A           Show all tags
  project      The Github project

EOF
}

Yes you can use heredoc in Bash, here we use it to format the “usage” or “help” of the program,
but it can be used to do many other things like creating files from template,
assigning different variables line by line, etc.

See Bash Heredoc and BASH Heredoc Tutorial

the basic format for a heredoc is like that

[COMMAND] <<[-] 'DELIMITER'
  HERE-DOCUMENT
DELIMITER

and it works like that

  • The first line starts with an optional command followed by the special redirection operator << and the delimiting identifier.
    • You can use any string as a delimiting identifier, the most commonly used are EOF or END.
    • If the delimiting identifier is unquoted, the shell will substitute all variables, commands and special characters before passing the here-document lines to the command.
    • Appending a minus sign to the redirection operator <<-, will cause all leading tab characters to be ignored. This allows you to use indentation when writing here-documents in shell scripts. Leading whitespace characters are not allowed, only tab.
  • The here-document block can contain strings, variables, commands and any other type of input.
  • The last line ends with the delimiting identifier. White space in front of the delimiter is not allowed.

I will add if you want to use variable substitution and use the $ character, you can escape it with \

cat << EOF
    this $LOGIN will be replace by its value
    this \$SOMEVAR will not
EOF

but be careful when you do that when running the script on a remote host via SSH

Using Heredoc is one of the most convenient and easiest ways to execute
multiple commands on a remote system over SSH.

When using unquoted delimiter make sure you escape all variables, commands
and special characters otherwise they will be interpolated locally:

ssh -T user@host.com << EOF
echo "The current local working directory is: $PWD"
echo "The current remote working directory is: \$PWD"
EOF

Output

The current local working directory is: /home/linuxize
The current remote working directory is: /home/user

Getting data from the web is almost dumb

function get_web_content() {
    local url="$1"
    local content=$(curl -s -L "$url" 2>&1)
    echo "$content"
}

we simply reuse curl
use the option -s to keep the output quiet (no verbose stuff)
and the option -L to follow redirects
and finally we redirect the stderr to stdout using 2>&1

we could have used wget too

function get_web_page() {
    local url="$1"
    local content=$(wget "$url" -q -O -)
    echo "$content"
}

but I find it less convenient for some very specific cases

Here the point is to show how you can save the content of a web page, or an API endpoint, into a variable, so instead of calling multiple times the same endpoint you just reuse the content in the variable.

For example, with the Github API you might hit the "API rate limit exceeded" error
if you call the API too much in a too short period of time.


Now if your script depends on other tools that are not necessarily installed on the system you should check for those, here one way to do that

function check_tool() {
    if (command -v "$1" >/dev/null 2>&1); then
        return 0
    else
        echo "You need to install '$1'"
        exit 1
    fi
}

we use t to test for curl and jq

# check curl is installed
check_tool curl
# check jq is installed
check_tool jq

if one of those tools is not installed you will get a little message and the program will exit

You need to install 'curl'

We reuse our cli_options function from the main entry point

# --- main ---

# check cli options
cli_options "${@}";

as you can see we delcare the options before the function, and we can use boolean true and false

# options
GITHUB_API_URL="https://api.github.com/repos/"
GITHUB_REPO=""
GITHUB_VENDOR=""
DEBUG=false
SHOW_ALL_TAG=false

function cli_options() {
    local OPTIND;
    while getopts "hDV:A" opt; do
        case "$opt" in
            h)
                show_help
                exit 0
                ;;
            D)
                # debug
                DEBUG=true
                ;;
            V)
                # vendor
                GITHUB_VENDOR="${OPTARG}"
                ;;
            A)
                # show all tags
                SHOW_ALL_TAG=true
                ;;
            '?')
                show_help >&2
                exit 1
                ;;
        esac
    done
    shift $((OPTIND-1));

    GITHUB_PROJECT="$1";
}

You will see also how I used some variables to simulate errors and so test the script

# fetch data from Github API
GITHUB_DATA="$(get_web_content ${GITHUB_URL})"
SIMULATE_ERROR_1=false
SIMULATE_ERROR_2=false

if $SIMULATE_ERROR_1; then
    # simulate Github API error
    GITHUB_DATA=$(cat << EOF
{
  "message": "API rate limit exceeded for 127.0.0.1. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
  "documentation_url": "https://developer.github.com/v3/#rate-limiting"
}
EOF
)
fi

if $SIMULATE_ERROR_2; then
    # simulate Github API error
    GITHUB_DATA=$(cat << EOF
{
  "message": "Not Found",
  "documentation_url": "https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository"
}
EOF
)
fi

In the gritty details, here why we save the web data into a variable

in short we get some fragments of JSON doing $ echo data | jq "some options

jq is very powerful but require to pick up which data you want to keep
so here all the different jq calls

FOUND_MESSAGE="$(echo -e "$GITHUB_DATA" | jq -r '.message' 2> /dev/null)"
ERROR_MESSAGE="$(echo -e "$GITHUB_DATA" | jq -r '.message')"
PROJECT_TAGS=($(echo "$GITHUB_DATA" | jq -r .[].tag_name))
TAG_TOTAL_ASSETS=$(echo "$GITHUB_DATA" | jq -r ".[$i].assets | length")
ASSET_NAME=$(echo "$GITHUB_DATA" | jq -r ".[$i].assets[$j].name")
ASSET_DOWNLOAD_COUNT=$(echo "$GITHUB_DATA" | jq -r ".[$i].assets[$j].download_count")

that’s 6 calls, imagine we had used $ curl -s -L "some url" | jq "some options instead?

yeah that would have made 6 calls to the Github API instead of 1, and you really don’t want to do that.


On all this I did not cover the possibility to parse long options, it is possible
check out this tutorial Optimizing images with Bash script

where you will find also many other useful functions like human_readable_filesize.

All in all, we finally reached the point where bash programming is really useful,
when you reuse external tools this and there, add your own command-line options
to finally shape up your own command-line utility.

should I stop those?