Command line psychology 101
How the shell program interprets commands
Summary
What actually goes on when you use the command line? This month we'll explain the nine steps, from history substitution to filename expansion, that your shell program goes through to interpret your command. (1,500 words)
One of the mysteries of Unix (aside from Unix itself) is the command line, filled as it is with strange and cryptic characters. Now let's see, do I need a dot between two backslashes, or should it be a backwards quote followed by a hyphen?
One thing that will help sort out what is actually happening in a command line, and maybe even help you construct one of your own, is an understanding of how the command line is interpreted.
The command line is actually the input to the shell program. The shell program (sh, ksh, csh, or any other variant) reads the input line and untangles it before it attempts to execute the command. The sequence of steps the program goes through to untangle a command provides interesting insight into shell programming. By studying it you're sure to learn some new tricks.
We will cover these pieces in more detail in a moment, but first let's take a look at the sequence of evaluation of a command line:
- History substitution (except for the Bourne shell)
- Splitting words, including special characters
- Updating the history list (except for the Bourne shell)
- Interpreting single and double quotes
- Alias substitution (except for the Bourne shell)
- Redirection of input and output (
< > and | )
- Variable substitution (variables starting with
$ )
- Command substitution (commands inside back quotes)
- File name expansion (file name wild cards)
You will note that the Bourne shell is missing the steps to do with history and alias substitution. These are not included in the Bourne shell.
History substitution
If you have history set up in the Korn shell (ksh), C shell (csh), or any similar shell, command lines are saved in a history file before they are executed. You can review your previous commands by typing:
$ history
The list of commands is preceded by a number, as in:
13 ls *.txt
14 cd $HOME
15 ls *.log
In Korn shell you can usually recover a history command by typing r followed by the number. For example, typing r 13 in the example above would repeat the command ls *.txt .
In the C shell, use an exclamation point and no space instead of an r : !13 .
When processing a command line, the shell first checks for these command substitutions, looks them up in the history file, recovers each command, then creates a new command line with each. There is much more to history than these simple steps, but we'll save that for a separate article.
Splitting words
The next step is to separate the words and special characters into words. A
word is basically a token that is recognized by the shell program as an element of a command. For example, the following command does a long listing of the current directory and searches for mjb in any line of the directory information.
ls -l|grep mjb
The words in the command are ls , -l , | , grep , and mjb . A word can also be a quoted string. In the following command, a long directory
listing is searched for files created on "Sep 07."
ls -l|grep "Sep 07"
In this case the words are ls , -l , | , grep , and Sep 07 . Note that Sep 07 is treated as one word because it was quoted in the command.
Update the history list
Once the words are identified, the command is written to the end of the history file. (Assuming that you're using history.)
Single and double quotes
Where a word has been surrounded by double or single quotes, the word is tagged so that variable expansion either does or does not occur within the quotes. Variables that are surrounded by single quotes will be left as is, and variables in double quotes will be expanded. To see this difference for yourself, enter the following commands with no quotes, single quotes, and double quotes.
echo $PATH
echo '$PATH'
echo "$PATH"
The first will display the value of the $PATH variable. The second will display the word $PATH , and the third will again display the value of the $PATH variable, as in the following examples:
echo $PATH
/bin:/usr/bin:/my/bin:.
echo '$PATH'
$PATH
echo "$PATH"
/bin:/usr/bin:/my/bin:.
As an exercise, also try the following commands using double quotes around single quotes, and single quotes around double quotes. When in doubt about the effect of something on the command line, experiment.
echo $PATH
/bin:/usr/bin:/my/bin:.
echo '"$PATH"'
"$PATH"
echo "'$PATH'"
'/bin:/usr/bin:/my/bin:.'
Alias substitution
The first word of each command is checked against the alias list. In the example we have been following, ls and grep are checked in the alias list, and any alias substitution is performed. I haven't discussed aliases much in previous articles so I will elaborate a bit here. An alias is a method of substituting one command for another. For example the following command:
alias ll 'ls -l'
creates a new command, so that if you type:
ll *.txt
it is the equivalent of typing ls -l *.txt .
Alias substitution is done at this stage of the shell processing.
Pipes and redirection
At this point the shell program looks through the words of the command for | ,
> , <, >>, and other redirection commands. When it finds one it creates the pipe or establishes the redirection.
Variable expansion
Now at last the variables in the command line are expanded. Assuming the variable is not surrounded by single quotes, $PATH (or any other variable in the command) is replaced with its value.
Command substitution
Command substitution involves looking for backward quotes, executing the command and arguments between the backward quotes, and then using the results of that execution as arguments within the command that is being executed.
Try this example:
ls -l `ls -p|grep /`|more
The command ls -p will produce a directory listing in which any directories are marked with a trailing slash. A sample listing shown below includes a single directory, mystuff , indicated by the trailing slash.
ls -p
STARTUP
file.txt
file2.txt
mystuff/
xdir/
Adding the grep / to the command line selects only those lines containing the trailing slash.
ls -p|grep /
mystuff/
xdir/
By enclosing the whole of ls -p|grep / in back quotes the command is executed
and the results are handed to the preceding command as arguments. In the example shown
ls -l `ls -p|grep /`|more
is the equivalent of
ls -l mystuff/ xdir/|more
which causes a page-by-page listing of all subdirectories within the requested directory. This is how the command substitution phase of command processing works.
Wild cards
The command processor looks for wild cards used in file names and expands them. These are the standard wild cards: * and ? , as well as the bracket wild card,
ls -l [abc]*
which provides a listing of any files or directories that start with a, b or c.
Execution
Finally the command is executed, and this completes the steps of the shell processing.
One additional note worth mentioning: When the command processor encounters a command between back quotes at step 8, it separates out the command between the quotes and runs steps 1 through 9 on that command.
This also happens to commands separated by semicolons. Steps 1 through 9 are performed on each separate command. You can test this yourself by selecting a directory with only a few files and then issuing the command
echo files in /chosen/directory are ; echo `cd /chosen/directory; ls *`
What will be echoed is the list of files in /chosen/directory . The asterisk argument to ls is obviously not expanded until after the cd command. If this were not the case, the asterisk would be expanded using the list of files in the current directory instead of the target directory.
So commands within back quotes are processed as if they were separate commands with a full set of steps 1 through 9. The same applies to each of the commands within a line of commands separated by semicolons.
Contact
us for a free consultation. |