This is a very brief introduction to the sed and awk text processing utilities. We will deal with only a few basic commands here, but that will suffice for understanding simple sed and awk constructs within shell scripts.
sed: a non-interactive text file editor
awk: a field-oriented pattern processing language with a C-like syntax
For all their differences, the two utilities share a similar invocation syntax, both use regular expressions , both read input by default from stdin, and both output to stdout. These are well-behaved UNIX tools, and they work together well. The output from one can be piped into the other, and their combined capabilities give shell scripts some of the power of Perl.
One important difference between the utilities is that while shell scripts can easily pass arguments to sed, it is more complicated for awk (see Example 34-3 and Example 9-22). |
Sed is a non-interactive line editor. It receives text input, whether from stdin or from a file, performs certain operations on specified lines of the input, one line at a time, then outputs the result to stdout or to a file. Within a shell script, sed is usually one of several tool components in a pipe.
Sed determines which lines of its input that it will operate on from the address range passed to it. [1] Specify this address range either by line number or by a pattern to match. For example, 3d signals sed to delete line 3 of the input, and /windows/d tells sed that you want every line of the input containing a match to "windows" deleted.
Of all the operations in the sed toolkit, we will focus primarily on the three most commonly used ones. These are printing (to stdout), deletion, and substitution.
Table B-1. Basic sed operators
Operator | Name | Effect |
---|---|---|
[address-range]/p | Print [specified address range] | |
[address-range]/d | delete | Delete [specified address range] |
s/pattern1/pattern2/ | substitute | Substitute pattern2 for first instance of pattern1 in a line |
[address-range]/s/pattern1/pattern2/ | substitute | Substitute pattern2 for first instance of pattern1 in a line, over address-range |
[address-range]/y/pattern1/pattern2/ | transform | replace any character in pattern1 with the corresponding character in pattern2, over address-range (equivalent of tr) |
g | global | Operate on every pattern match within each matched line of input |
Unless the g (global) operator is appended to a substitute command, the substitution operates only on the first instance of a pattern match within each line. |
From the command line and in a shell script, a sed operation may require quoting and certain options.
1 sed -e '/^$/d' $filename 2 # The -e option causes the next string to be interpreted as an editing instruction. 3 # (If passing only a single instruction to "sed", the "-e" is optional.) 4 # The "strong" quotes ('') protect the RE characters in the instruction 5 #+ from reinterpretation as special characters by the body of the script. 6 # (This reserves RE expansion of the instruction for sed.) 7 # 8 # Operates on the text contained in file $filename. |
In certain cases, a sed editing command will not work with single quotes.
1 filename=file1.txt 2 pattern=BEGIN 3 4 sed "/^$pattern/d" "$filename" # Works as specified. 5 # sed '/^$pattern/d' "$filename" has unexpected results. 6 # In this instance, with strong quoting (' ... '), 7 #+ "$pattern" will not expand to "BEGIN". |
Sed uses the -e option to specify that the following string is an instruction or set of instructions. If there is only a single instruction contained in the string, then this option may be omitted. |
1 sed -n '/xzy/p' $filename 2 # The -n option tells sed to print only those lines matching the pattern. 3 # Otherwise all input lines would print. 4 # The -e option not necessary here since there is only a single editing instruction. |
Table B-2. Examples
Notation | Effect |
---|---|
8d | Delete 8th line of input. |
/^$/d | Delete all blank lines. |
1,/^$/d | Delete from beginning of input up to, and including first blank line. |
/Jones/p | Print only lines containing "Jones" (with -n option). |
s/Windows/Linux/ | Substitute "Linux" for first instance of "Windows" found in each input line. |
s/BSOD/stability/g | Substitute "stability" for every instance of "BSOD" found in each input line. |
s/ *$// | Delete all spaces at the end of every line. |
s/00*/0/g | Compress all consecutive sequences of zeroes into a single zero. |
/GUI/d | Delete all lines containing "GUI". |
s/GUI//g | Delete all instances of "GUI", leaving the remainder of each line intact. |
Substituting a zero-length string for another is equivalent to deleting that string within a line of input. This leaves the remainder of the line intact. Applying s/GUI// to the line
The most important parts of any application are its GUI and sound effects |
The most important parts of any application are its and sound effects |
The backslash represents a newline as a substitution character. In this special case, the replacement expression continues on the next line.
1 s/^ */\ 2 /g |
An address range followed by one or more operations may require open and closed curly brackets, with appropriate newlines.
1 /[0-9A-Za-z]/,/^$/{ 2 /^$/d 3 } |
A quick way to double-space a text file is sed G filename. |
For illustrative examples of sed within shell scripts, see:
For a more extensive treatment of sed, check the appropriate references in the Bibliography.
[1] | If no address range is specified, the default is all lines. |