Processing
files with awk, part two
In this second
of two columns on the awk programming utility, we show you how
to print reports with awk's print and printf commands
Summary
This is the second of two parts on awk, so if you missed the
first part in last month's issue it's advisable to review it
(see Resources
below). Awk is a text processing utility that runs through a
text file by reading and processing a record at a time. This
month we show you how to print and format a user list with awk.
(2,600 words)
One more piece of awk syntax will make it
an even more useful tool. I said in last month's column that awk
treats the spaces in a record as a field separator. It is
possible to change the field separator to another value.
Figure 1 is an example of a passwd file. The password itself
in this example is replaced with a single exclamation mark. This
file has several separate fields in it, but the field separator
is a colon (:) rather than spaces.
Figure 1
root:!:0:1:Super User:/:
daemon:!:1:1:System Daemons:/etc
lbw:!:209:200:Lavinia Bowder Washinton:/home/lbw:/bin/csh
bob:!:210:200:Robbie Cramer:/home/bob:/bin/ksh
joann:!:213:200:Jo Ann Batson:/home/joann:/bin/ksh
jlan:!:214:200:Jack Landon:/home/jlan:/bin/ksh
jank:!:215:200:Jan Kingly:/home/jank:/bin/ksh
ljn:!:216:200:Laura Nugent:/home/ljn:/bin/ksh
mjb:!:220:200:Mo Budlong:/home/mjb:/bin/ksh
bda:!:235:500:Basic Development Accnt:/home/bda:/bin/ksh
obrero:!:245:500::/home/obrero:/bin/ksh
guest1:!:501:500:Guest1 Account:/disk2/guest1:/bin/ksh
guest2:!:502:500:Guest2 Account:/disk2/guest2:/bin/ksh
guest3:!:503:500:Guest3 Account:/disk2/guest3:/bin/ksh
beb:!:248:202:Becky E Brown :/home/beb:/bin/ksh
A passwd file can be used as the input file to awk for for an
awk report by changing the field separator. Figure 2 is a short
example. There are two points to notice.
First the logic in BEGIN{FS=":"}. In awk, FS is a
pre-defined variable that contains the field separator. If you
make no changes to it, the FS value is set to spaces. In this
listing, the BEGIN logic sets FS to a colon (:), so the value of
the field separator is changed before the first record is read.
This allows the passwd records to be broken into fields at the
colons.
The second point to notice is on line 3 of Figure 2. In all
previous examples the file has been piped into awk using "ls
-l|awk etc." In this example, the file is specifically
named by placing it on the command line after the closing single
quote at the end of the awk commands. Awk can take its input
from a pipe as in previous examples, or from an explicitly named
file (or files) as in Figure 2. Remember that the closing quote
ends multiline input so be sure to type the closing quote, a
space and then the name of the file.
Remember to type a TAB wherever you see the ^ mark.
Figure 2
awk '
BEGIN{FS=":"}
{print $1 " ^" $5}' /etc/passwd
Unless you are in the C shell, the closing quote ends
multiline input so be sure to type the closing quote, followed
by a space and followed by the name of the file.
Figure 3 is an example using C shell continuation characters.
The example shown in Figure 2 works correctly. Figure 4 gives
you two further examples, one version that won't work and
another that will.
Figure 3
awk ' \
BEGIN{FS=":"} \
{print $1 " ^" $5}' /etc/passwd
Figure 4
awk '
BEGIN{FS=":"}
{print $1 " ^" $5}
' /etc/passwd < this works as multiline input is still active
awk '
BEGIN{FS=":"}
{print $1 " ^" $5}' < multiline input ends here
/etc/passwd < this won't work multiline input
ended on the previous line
Figure 5 is a sample output from Figure 2 or Figure 3 for the
C shell. The awk script selects field $1 which is the user id,
and field $5 which is the user name and prints them with a tab
between them.
Figure 5
root Super User
daemon System Daemons
lbw Lavinia Bowder Washinton
bob Robbie Cramer
joann Jo Ann Batson
jlan Jack Landon
jank Jan Kingly
ljn Laura Nugent
mjb Mo Budlong
bda Basic Development Accnt
obrero
guest1 Guest1 Account
guest2 Guest2 Account
guest3 Guest3 Account
beb Becky E Brown
Awk has a number of pre-defined variables. You have already
seen FS. Another useful one is NR. This is a variable that
contains the number of the current record. It is updated by 1 as
each record is read. You may use this to number the output
records as in Figure 6, the output of which would look like
Figure 7.
Figure 6
awk '
BEGIN{FS=":"}
{print NR ". ^" $1 " ^" $5}' /etc/passwd
Figure 7
1. root Super User
2. daemon System Daemons
3. lbw Lavinia Bowder Washinton
4. bob Robbie Cramer
6. joann Jo Ann Batson
7. jlan Jack Landon
8. jank Jan Kingly
9. ljn Laura Nugent
10. mjb Mo Budlong
11. bda Basic Development Accnt
12. obrero
13. guest1 Guest1 Account
14. guest2 Guest2 Account
15. guest3 Guest3 Account
16. beb Becky E Brown
You may also use NR in the END logic. After the last record
is read, NR is left set to the value of the last record. Figure
8 would produce output that looks like Figure 9.
Figure 8
awk '
BEGIN{FS=":"}
{print $1 " ^" $5}
END{print "Total users = " NR}' /etc/passwd
Figure 9
root Super User
daemon System Daemons
lbw Lavinia Bowder Washinton
bob Robbie Cramer
joann Jo Ann Batson
jlan Jack Landon
jank Jan Kingly
ljn Laura Nugent
mjb Mo Budlong
bda Basic Development Accnt
obrero
guest1 Guest1 Account
guest2 Guest2 Account
guest3 Guest3 Account
beb Becky E Brown
Total users = 16
Complex reporting: using printf to
make it look right
The awk print command is good enough for a lot of reporting, but
when it comes to more complex or longer print layouts involving
tidy columns of information you need something more powerful.
The intent of Figure 10 is to print four columns of information
from the /etc/passwd file -- User id, name, home pat, and login
shell. The columns are separated by tabs. The actual output
looks something like Figure 11. A single tab is not enough to
produce decent alignment when the fields are of substantially
varying lengths.
Figure 10
awk '
BEGIN{FS=":";print "User ^Name ^Home ^Shell}
{print $1 " ^" $5 " ^" $6 " ^" $7}
END{print "Total users = " NR}' /etc/passwd
Figure 11
User Name Home Shell
root Super User /
daemon System Daemons /etc
lbw Lavinia Bowder Washinton /home/lbw /bin/csh
bob Robbie Cramer /home/bob /bin/ksh
joann Jo Ann Batson /home/joann /bin/ksh
jlan Jack Landon /home/jlan /bin/ksh
jank Jan Kingly /home/jank /bin/ksh
ljn Laura Nugent /home/ljn /bin/ksh
mjb Mo Budlong /home/mjb /bin/ksh
bda Basic Development Accnt /home/bda /bin/ksh
obrero /home/obrero /bin/ksh
guest1 Guest1 Account /disk2/guest1 /bin/ksh
guest2 Guest2 Account /disk2/guest2 /bin/ksh
guest3 Guest3 Account /disk2/guest3 /bin/ksh
beb Becky E Brown /home/beb /bin/ksh
Total users = 16
To handle this it is necessary to use the other awk print
command which is printf (print formatted). The printf command is
similar to the printf command of the C programming language, but
a simplified explanation of the command is in order for those
who do not know C.
The printf command is executed by providing a format string
and a list of the values to be printed using the format string.
These are separated by commas as in:
printf "format_string", $1, $3, $6, $7
Some versions of awk require parentheses around the arguments
as in:
printf("format_string", $1, $3, $6, $7)
It is always safe to include the parentheses.
The values that can be used in a format string are very
extensive and can format data in all sorts of ways, but for
simple reports, the most useful format is the fixed width
string.
A fixed width string field starts with a percent sign (%). If
a minus sign (-) follows, then the printed data is
left-justified within the fixed width of the field. Most string
data is left-justified, so you should usually include the minus
sign. The next part of the format is the length of the field,
and finally an `s' ends the formatting. An example of this would
be "%-30s" which is a field containing 30
left-justified characters. Using this format string with printf
would look something like:
printf("%-30s",$1)
This would print field $1 in a left-justified, 30-character
field space.
If field $1 does not contain 30 characters, then the field is
padded with spaces until 30 character spaces are filled. One big
advantage of a format string is that you can force a field to
always print with a certain width by filling unused portions of
the field with spaces. You may combine multiple format fields in
a format string as in:
printf("%-20s%-30s", $1, $2)
This example will take field $1 and place it, left-justified
into the first printing position. The field will be padded until
it is 20 characters long. Then field $2 will be appended and
padded out to 30 characters. This guarantees that columns will
line up under one another. The format string for each field
should be long enough to accommodate the largest value that will
be placed in the field.
There is one small hitch in printf. The print command
automatically prints a newline at the end of each print
statement. The printf command does not, so you must explicitly
end the format string with a newline "\n".
Using these rules, let's create a format string for the four
fields that we want to print from the /etc/passwd file. In
Figure 12 I have taken the four fields, found the longest
example, made a guess as to a safe width to use, and then
created a format string that is one character longer than the
safe width. This allows for a minimum of a single space between
fields.
Figure 12
Field |
Longest |
Safe Width |
Format |
User id |
6 |
10 |
"%-11s" |
Name |
25 |
30 |
"%-31s" |
Home |
8 |
15 |
"%-16s" |
Shell |
8 |
15 |
"%-16s" |
The next step is to combine all of the fields into one long
format string and append a newline.
printf("%-11s%-31s%-16s%-16s\n")
Finally list the fields to be printed with separating commas.
printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)
For your version of awk the format string and list of values
after printf may not need to be enclosed in parentheses as in:
printf "%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7
It is always safe to use the parentheses, but in many
versions of awk you do not need them.
Figure 13 is the first version of the awk script using printf.
It does not include column titles.
Figure 13
awk '
BEGIN{FS=":"}
{printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd
Figure 14 is the C shell version of the same listing.
Figure 14
awk ' \
BEGIN{FS=":"} \
{printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)} \
END{print "Total users = " NR}' /etc/passwd
Adding column titles involves ensuring that the column titles
actually line up with the fields in the format string. Figure 15
uses a simple trick to ensure that the column titles do align.
The values used by printf to fill a format string when printing
do not need to be variables. They can also be strings. The
header or title line can be created by using the same format
string that was used in the body of the report.
Figure 15
awk '
BEGIN{FS=":";
printf("%-11s%-31s%-16s%-16s\n","User","Name","Home","Shell")}
{printf("%-11s%-31s%-16s%-16s\n",$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd
The output from Figure 15 is shown in Figure 16 -- it's a
much more readable and useful output.
Figure 16
User Name Home Shell
root Super User /
daemon System Daemons /etc
lbw Lavinia Bowder Washinton /home/lbw /bin/csh
bob Robbie Cramer /home/bob /bin/ksh
joann Jo Ann Batson /home/joann /bin/ksh
jlan Jack Landon /home/jlan /bin/ksh
jank Jan Kingly /home/jank /bin/ksh
ljn Laura Nugent /home/ljn /bin/ksh
mjb Mo Budlong /home/mjb /bin/ksh
bda Basic Development Accnt /home/bda /bin/ksh
obrero /home/obrero /bin/ksh
guest1 Guest1 Account /disk2/guest1 /bin/ksh
guest2 Guest2 Account /disk2/guest2 /bin/ksh
guest3 Guest3 Account /disk2/guest3 /bin/ksh
beb Becky E Brown /home/beb /bin/ksh
Total users = 16
In case you're offended by figure 15
Just before I put this article to bed, there is one thing in
Figure 15 that offends me as a programmer. The format string is
repeated twice, on lines 3 and 4. From a programming standpoint
this is not optimum. If you need to change the report layout you
have to modify the format string twice, and that leads to
potential typographical errors.
You will recall from one of the earlier examples that we used
a variable to save the total bytes for all files that were
listed. Why not create a variable that contains the format
string? In Figure 17 the format string has been assigned to a
variable named format as part of the BEGIN logic. In the printf
commands, the variable "format" is used as the format
string for both the title line and the individual record lines
instead of a literal format string. The output is exactly the
same as Figure 16. Figure 18 is the C shell version.
Figure 17
awk '
BEGIN{FS=":";
format = "%-11s%-31s%-16s%-16s\n";
printf(format,"User","Name","Home","Shell")}
{printf(format,$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd
Figure 18
awk ' \
BEGIN{FS=":"; \
format = "%-11s%-31s%-16s%-16s\n"; \
printf(format,"User","Name","Home","Shell")} \
{printf(format,$1,$5,$6,$7)} \
END{print "Total users = " NR}' /etc/passwd
So far all the examples I have given have been typed directly
at the command line. You may also open a file with vi, type the
above lines exactly as given in Figure 17. Add an initial line
that forces a Bourne or Korn shell to execute the commands as in
Figure 19 and save the file as userlist.
Figure 19
#!/bin/ksh
# (or /bin/sh)
awk '
BEGIN{FS=":";
format = "%-11s%-31s%-16s%-16s\n";
printf(format,"User","Name","Home","Shell")}
{printf(format,$1,$5,$6,$7)}
END{print "Total users = " NR}' /etc/passwd
Change the execution privileges using:
chmod a+x userlist
and you now have a script that will display a user list any
time you type "userlist." You may also send the output
to a file using redirection as in:
userlist >userlist.txt
or to a printer using one of the printer pipes such as:
userlist|lp
In Figure 19 I created a shell script that executed an awk
command on a specific file. This is not a true awk script, but a
shell script that executed awk. An awk script includes only the
awk commands. Assume for a moment that for security reasons, a
copy of the /etc/passwd file is saved every week, allowing a
running record of who had access to the system at any time in
the past. An awk script could be created by using only the awk
commands in Figure 19. This would look like Figure 20. Save this
file as userfmt.awk or some similar name to identify it as
containing awk commands.
Figure 20
BEGIN{FS=":";
format = "%-11s%-31s%-16s%-16s\n";
printf(format,"User","Name","Home","Shell")}
{printf(format,$1,$5,$6,$7)}
END{print "Total users = " NR}
To execute the awk script, use a -f switch to identify the
awk script as in:
awk -f userfmt.awk /etc/passwd
Using this awk script you can process any earlier saved
versions of the passwd file as in:
awk -f userfmt.awk /old/passswd.970404 >users_970404.txt
Believe it or not, these two articles only scratch the
surface of awk. An excellent book on the subject is sed
& awk by O'Reilly and Associates, Inc (see Resources
below). If you intend to pursue awk further I recommend the book
strongly.
Contact
us for a free consultation. |