Unix command line 101: How much do you know?

Arguments and options on the half-shell

Summary
Arguments and options are those mysterious little nuggets preceded by minus signs, file names, and other Unix arcana that appear on a command line following the command to be executed. This month Mo unveils the power of these commands, but not before he breaks it down with an examination of the Unix command line itself. (2,600 words)

A Unix command line is a sequence of characters in the syntax of the target shell language. Of the characters found there, some are known as metacharacters, which have a special meaning to the shell. The metacharacters in the Korn shell are:

; Separates multiple commands on a command line
& Causes the preceding command to execute synchronously (i.e., at the same time as the next command on the command line)
() Launches commands enclosed in parentheses in a separate shell
| Pipes the output of the command to the left of the pipe to the input of the command on the right of the pipe
> Redirects output to a file or device
< Redirects input from a file or device
Newline Ends a command or set of commands
Space Separator between command words
Tab Separator between command words

(Note: Some of these metacharacters can be used in combinations, such as || and &&. Consult your manual for a complete description.)

With these metacharacters in mind, you can define a command line word -- a sequence of characters separated by one or more nonquoted metacharacters. In the following example, the passwd file is piped through the cut program, and fields 1 and 3 are output based on a colon delimiter.

In the following command line,

cat /etc/passwd|cut -d ":" -f 1,3 >usruid.txt

the words are:

cat
/etc/passwd
cut
-d
":"
-f
1,3
usruid.txt

Note that the metacharacters |, >, and the space have been removed, and that the metacharacters &, |, (), ;, and the newline are used to separate or terminate multiple commands within a command line.

In our example command line, there are two commands separated by the pipe (|) symbol:

cat /etc/passwd

and

cut -d ":" -f 1,3

The final portion of the command line -- >usruid.txt -- could be thought of as the command "and output the result to usruid.txt," although redirection is not usually considered part of a command.

When a command executes a Unix program, utility, or shell script, it's usual for the command to include arguments. In the example above, the argument to cat is /etc/passwd. The arguments to cut are -d, ":", -f, and (1,3).

In general, arguments are all the words (note the definition of word above) that follow an executable program name in a command. Arguments within a command are separated from one another by spaces or tabs (metacharacters). Most Unix programs were written with standards for the arrangement of arguments and options.

Options are the letters or numbers that follow a minus sign. A simple example of arguments and options would be the use of the cat command:

cat -v -e -t doodah.txt

In the above example, the arguments are -v, -e, -t, and doodah.txt, while the options are the entries for -v, -e, and -t. The -v option asks cat to display all characters, even nonprintable ones; the -e option specifies that the end of a line will be displayed as $; and the -t option specifies that a tab should be displayed as ^I instead of expanding the tab into spaces on the screen.

Unfortunately, no standard terminology has been developed to differentiate an option from a nonoption argument, which is all the more confusing when one considers that an option can itself have an argument. In the first example using cut, the -d option has an option argument of ":", and the -f option has an option argument of (1,3). In order to clarify these, various manuals have adopted standards for naming conventions for the parts of a Unix command. The following examples illustrate the parts:

cat -v -e -t doodah.txt

cat /etc/passwd|cut -d ":" -f 1,3 >usruid.txt

The program name itself, cat in the first example and cat and cut in the second, is variously called the name, progname, executable, or program-name. The nonoption arguments to a command, doodah.txt in the first example and /etc/passwd in the second, is called an operand or cmdarg.

The options -v,-e, and -t in the first example, and -d and -f in the second, are called options, opts, or switches. The arguments to options, ":" and (1,3) in the second example are called option-arguments, or optargs.

The standards used in creating Unix executables are:

Command names must be between two and nine characters long
Command names must include only lowercase letters and digits
Option names (options above) must be only one character long
All options must be preceded by -
Options with no arguments may be grouped after a single - (e.g., -v -e -t could also be written as -vet)
The first option-argument following an option must be preceded by white space
Option-arguments cannot be optional
Groups of option-arguments following an option must either be separated by commas or by white space, and quoted (e.g., -f 1,3 or -o "xxx z yy")
All options must precede operands on the command line
-- may be used to indicate the end of the options
The order of the options relative to one another should not matter
The relative order of the operands (cmdargs) may affect their significance in ways determined by the command with which they appear
- preceded and followed by white space should only be used to mean standard input

Not all Unix commands follow these rules, although all the newer ones do. Older executables were written before the standard was established, but executables dating from these times are in such regular use that it was decided not to change them. For example, cut will function with or without rule number six, which requires a space before the option-argument. Both of the following commands will work on most systems.

cat /etc/passwd|cut -d ":" -f 1,3 >usruid.txt
cat /etc/passwd|cut -d: -f1,3 >usruid.txt

The find command is another example of an antiquated program still in use today. It uses options longer than a single character, which violates rule number three, and allows options to appear after the operand, thus violating rule number nine. In the following example, dot (.) is the operand, -name and -print are options, and data.txt is the option-argument for -name.

find . -name data.txt -print.

The getopts function
You're probably wondering what all this blather is leading up to. Well, Unix provides a handy tool for separating option arguments and operands, and it's known as the getopts function. This function is called by following getopts with a string (which contains the list of valid option characters) and a shell variable (which receives the result of searching the arguments). The function can be called several times, and each time it steps forward through the list of arguments and picks up the next option. It can also pick up an option-argument, and the index of the argument that it has processed.

To illustrate this, imagine a shell script that will archive a file by copying it to an archive directory. The default directory is /u/arch, but the path of the archive directory can be changed on the command line. The archive program will also stop and ask you if it is about to overwrite an earlier archive, but an option can be set to overwrite without warning. A sample command line for this archive program would be:

arch [-r] [-a /new/archive/path] filename

In this example, the -r option will automatically replace an existing archive file without warning, though the default is to warn. The -a option is followed by an alternative archive directory to use instead of /u/archive. Finally, filename is the name of the file to archive.

The following is a listing for arch that covers the processing of the option arguments. It does not include the logic for doing the actual archive operation. Below the complete listing is a step-by-step analysis of how the program works.

#! /bin/sh
#:------------------------------------------------------------
--
usage () {
 echo "Usage:"
 echo "arch - archives a file to /u/arch directory"
 echo "syntax:"
 echo "    arch [-r] [-a /new/archive/path] filename"
 echo "where"
 echo "    -r will automatically replace an existing archive
file"
 echo "       (default is to warn)"
 echo "    -a specifies an alternative archive directory"
 echo "    filename is the name of the file to archive"
 exit
}
#:------------------------------------------------------------
--

replace="w"
arch="/u/arch"
filename=""

optstr=":ra:"

while getopts $optstr opt
do

    case $opt in
        r) replace="r";;
        a) arch=$OPTARG;;
        *) usage;;
    esac
done

shift `expr $OPTIND - 1`

filename=$1

echo "Archiving" $filename " to " $arch "with" $replace
"replace option"

# rest of the code goes here

The getopts function does not always work correctly with the Korn shell, so line 1 forces the script to run in the Bourne shell. The program begins lines 2 through 15 with a comment describing its actions that also doubles as a usage function, which is called when the user makes a mistake.

 1   #! /bin/sh
 2  
#:------------------------------------------------------------
--
 3   usage () {
 4    echo "Usage:"
 5    echo "arch - archives a file to /u/arch directory"
 6    echo "syntax:"
 7    echo "    arch [-r] [-a /new/archive/path] filename"
 8    echo "where"
 9    echo "    -r will automatically replace an existing
archive file"
10    echo "       (default is to warn)"
11    echo "    -a specifies an alternative archive
directory"
12    echo "    filename is the name of the file to archive"
13    exit
14   }
15  
#:------------------------------------------------------------
--

Shell variables are set up at lines 17 and 18 and contain the default values used for archiving and the archive directory, and makes a replacement warning default behavior. Line 19 sets up a variable for the file to be archived:

17   replace="w"
18   arch="/u/arch"
19   filename=""

This program has two possible options: -r and -a. The -a option requires an option-argument that names the directory to use. An options string should contain the list of single character identifiers to be used for options, or ra. In addition, if an option is to be preceded by an option argument, a colon should immediately follow it. Finally, getopts will produce an error message if an invalid option is placed on the command line. In order to suppress the error message, start the option string with a colon or ":ra:". That string is set up at line 21 of the script.

21   optstr=":ra:"

Whenever getopts is called, it locates the next available option, retrieves the character, and places it in the passed variable name. At line 23 this variable, $opt, is passed as the second argument to getopts after $optstr. The getopts function returns true as long as it continues to find arguments that start with a leading -. When it finds -r on the command line, it places r in $opt. When it finds -a, it places a in $opt. Whenever getopts finds an option expecting an option-argument, it retrieves the argument and places it in a variable named $OPTARG. The loop at lines 23 through 31 processes all options and option-arguments by repetitively calling getopts. Inside a case statement the various results are processed.

If -r was encountered, it will appear in $opt and $replace will be set to r. If -a was encountered, a will appear in $opt, and the value in $OPTARG will be used to set the value of $arch. If anything else is encountered, the user has entered an invalid option. This calls the usage function, which displays a usage message and exits the program.

23   while getopts $optstr opt
24   do
25
26       case $opt in
27           r) replace="r";;
28           a) arch=$OPTARG;;
29           *) usage;;
30       esac
31   done

The getopts function also retains one other variable, $OPTIND, which contains the index of the next argument to be processed. When the shell script is first started, $OPTIND is set to 1. If -r is processed as the first argument, $OPTIND will contain 2. If -a is processed as the second argument (and the name of an archive directory as the third argument), $OPTIND will contain 4. On the next call to getopts, getopts returns false, and the loop at lines 23 through 31 ends.

At this point, $OPTIND still contains the value 4, which can be used as the index of the next argument -- the first argument not beginning with a hyphen (-). This should be the name of the file to archive. At line 33, the shift command is used to shift all arguments by $OPTIND - 1; this causes the argument that was at position 4 ($4) to be shifted left three positions, making it argument $1. At line 35 this value is picked up and stored in $filename.

33   shift `expr $OPTIND - 1`
34   
35   filename=$1

At this point, a good script would execute further error checking, such as making sure the file named in $filename and the archiving directory in $arch both exist. In the following example, results of the extracted values are displayed:

37   echo "Archiving" $filename " to " $arch "with" $replace
"replace option"
38   
39   # rest of the code goes here

Using getopts is an excellent way to create scripts that comply with the Unix command standard. It also makes it fairly easy to add additional features to your scripts. For example, let's assume you want to enhance your arch script to put a date and time stamp on an archive. Simply extend the $optstr variable to allow for a -d option, add a variable, and extend the case statement. Voilà! You've just added a -d option to the arch command. Of course, you have to add the code to handle $datestamp="Y", but the user interface is easily taken care of.

replace="w"
arch="/u/arch"
filename=""
datestamp="N"

optstr=":ra:d"

while getopts $optstr opt
do

    case $opt in
        r) replace="r";;
        a) arch=$OPTARG;;
        d) datestamp="Y";;
        *) usage;;
    esac
done

End of article.

Contact us for a free consultation.

MENU:

SOFTWARE DEVELOPMENT:

• EXPERIENCE

PRODUCTS:

UNIX:

• UNIX TUTORIALS

LEGACY SYSTEMS:

    • LEARN COBOL
    • PRODUCTS
    • GEN-CODE
    • COMPILERS

INTERNET:

• CYBERSUITE

WINDOWS:

• PRODUCTS