Learn About Awk: The Most Powerful Text-Processing Tool Ever?
Awk is the greatest text processing tool you didn’t know you needed. But if you work with a lot of data, you have probably thought things like, “It would be really nice to extract the second and fifth column of data from this table.” And this, in it’s most simple form, is what Awk does.
A Little History
In the days before most people knew what a relational database was — and almost two decades before the development of MySQL — a great deal of data was stored in text files. The truth is, a lot of data is still stored that way. That’s especially true on Unix operating systems. For example, the Unix /etc/passwd file is just a text file with one line for each user on the system, and each field for the user separated by a colon. For example:
admin:*:1001:2001:Administrator:/home/root:/home/sh brian:*:1002:2002:Brian Kernighan:/home/brian:/home/bash
On big systems, such passwd files could contain thousands of lines. You can image that there might be times when you would like a complete list of the names of the people with accounts on your computer. In this case, that would be the 5th field. So in 1977, three programmers created a general program to do that. They were: Alfred Aho, Peter Weinberger, and Brian Kernighan. And their initials AWK is how Awk got its name.
By default, Awk assumes that fields are separated by space characters. But you can tell Awk to use a different character by using the -F (or –field-separator) flag to give Awk a different field separator. In the case of /etc/passwd, we would want to use the “:” character.
Given the separator character, Awk assigns the first field to the variable $1, the second field to the variable $2, and so on. The entire line is assigned to $0. If this looks familiar, it may be because this is how the Bourne and Bash shell scripts manage command-line parameters.
Although Awk scripts can be put into files, they are usually just placed on the command-line as part of the Awk command. Here is a simple awk command that will output all the user and real names of the accounts listed in the /etc/passwd file:
awk -F : '' /etc/passwd
This would produce the following output from our example /etc/passwd file above:
admin Administrator brian Brian Kernighan
This is about as simple an Awk program as there is. But you can probably see that this alone is very powerful. Often, people will important such a file into a spreadsheet, delete the unneeded columns, and then save the result as a new text file. That’s cumbersome when you can do the same thing with Awk in a couple of seconds. And this is just the beginning. You can make output conditional; you can completely control output; if you are dealing with numerical data, you can do calculations on it; and so much more.
Awk is a very easy language to learn. And there are a lot of resources to do just that. We’ll go over a few below.
Below are a number of tutorials that start at the very beginning and take you through the most important aspects of the language. Which one you find most helpful will depend upon you.
- Grymoire Awk Tutorial: this is Bruce Barnett’s excellent introduction to Awk. Check out all this Unix tutorials.
- Common threads: Awk by Example: this is from IBM, and provides a painless way to learn Awk. Be sure to check out Part 2 after you are done with it.
- Awk Tutorial: this is the Tutorials Point introduction to Awk that even takes you through downloading and installing it on Linux machines.
- An Awk Primer: this tutorial goes pretty fast, but if you are comfortable with shell scripting or you’ve used Awk in the past, it’s a good choice.
There are a number of good books provide a foundation for Awk.
- The Awk Programming Language by Aho and Kernighan: this is the original book on Awk. But unlike most such books by the original developers, this one is really good and easy to understand.
- Sed & Awk by Dougherty and Robbins: this is a classic that deals with Awk as well as the stream editor (sed). The two are often used together. Also of interest is the Sed and Awk: Pocket Reference once you are comfortable with the systems.
- AWK Programming: Questions and Answers by George Duckett: this is an interesting Kindle book that is more or less cookbook. It includes a lot of great questions that will expand the way you think of Awk and the ways that you think it can be used.
- Effective awk Programming: Universal Text Processing and Pattern Matching by Arnold Robbins: this is kind of like a continuation of The Awk Programming Lanuage. It gets deeper into the language and focuses on the Gnu version of Awk, Gawk.
There have been a number of Awk implementations since the first one in 1977. In fact, in 1985 (before The Awk Programming Language was published), Awk was greatly expanded. That version is often referred to as “new Awk” or nawk. There are some of more popular versions currently available.
- Gawk: this is the Gnu Project’s Awk implementation. It is extremely popular and supports other languages better than other versions.
- BWK: this is “the one truth Awk,” since it is the one used for The Awk Programming Lanuage. It is widely used on FreeBSD.
- Mawk: this is a version originally written by Mike Brennan, but maintained and updated by Thomas Dickey since 2009. It’s focus is on speed.
- BusyBox: this is general tool that provides a number of simplified Unix tools, including Awk.
Sometimes, you just need to ask questions. And there are a lot of people on the internet who know Awk well. Here are some of the better places to go to get your questions answered.
- Comp.lang.awk Google Group : this is a relatively active forum well worth checking out.
- Stack Overflow Awk Questions: this is a page of the newest questions that were tagged as having to do with Awk. It’s a great reference and place to go to pose your own questions.
- Awk Reddit: this is the subreddit for Awk. It isn’t terribly active, but there are a lot of knowledgeable people around it, and it is a good place to get questions answered.
Awk is a great language for text processing. And it can do amazing things if you want to push the language far enough. At the same time, it’s syntax is simple enough, that it can quickly become part of your working tool set. The resources presented here should provide you will all the help you will need.