awk [-F ERE][-v assignment] ... program [argument ...]
awk [-F ERE] -f progfile ... [-v assignment] ...[argument ...]
The awk utility shall execute programs written in the awk programming language, which is specialized for textual data manipulation. An awk program is a sequence of patterns and corresponding actions. When input is read that matches a pattern, the action associated with that pattern is carried out.
Input shall be interpreted as a sequence of records. By default, a record is a line, less its terminating <newline>, but this can be changed by using the RS built-in variable. Each record of input shall be matched in turn against each pattern in the program. For each pattern matched, the associated action shall be executed.
The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non- <blank>s. This default white-space field delimiter can be changed by using the FS built-in variable or -F ERE. The awk utility shall denote the first field in a record $1, the second $2, and so on. The symbol $0 shall refer to the entire record; setting any other field causes the re-evaluation of $0. Assigning to $0 shall reset the values of all other fields and the NF built-in variable.
The awk utility shall conform to the Base Definitions volume of IEEE Std 1003.1-2001, Section 12.2, Utility Syntax Guidelines.
The following options shall be supported:
The following operands shall be supported:
A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no file operands are specified, or if a file operand is ’-’, the standard input shall be used.
An operand that begins with an underscore or alphabetic character from the portable character set (see the table in the Base Definitions volume of IEEE Std 1003.1-2001, Section 6.1, Portable Character Set), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the ’=’ character, shall specify a variable assignment rather than a pathname. The characters before the ’=’ represent the name of an awk variable; if that name is an awk reserved word (see Grammar ) the behavior is undefined. The characters following the equal sign shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote ( ’ )’ character, as a STRING token (see Grammar ), except that if the last character is an unescaped backslash, it shall be interpreted as a literal backslash rather than as the first character of the sequence "\"" . The variable shall be assigned the value of that STRING token and, if appropriate, shall be considered a numeric string (see Expressions in awk ), the variable shall also be assigned its numeric value. Each such variable assignment shall occur just prior to the processing of the following file, if any. Thus, an assignment before the first file argument shall be executed after the BEGIN actions (if any), while an assignment after the last file argument shall occur before the END actions (if any). If there are no file arguments, assignments shall be executed before processing the standard input.
The standard input shall be used only if no file operands are specified, or if a file operand is ’-’ ; see the INPUT FILES section. If the awk program contains no actions and no patterns, but is otherwise a valid awk program, standard input and any file operands shall not be read and awk shall exit with a return status of zero.
Input files to the awk program from any of the following sources shall be text files:
Whether the variable RS is set to a value other than a <newline> or not, for these files, implementations shall support records terminated with the specified separator up to {LINE_MAX} bytes and may support longer records.
If -f progfile is specified, the application shall ensure that the files named by each of the progfile option-arguments are text files and their concatenation, in the same order as they appear in the arguments, is an awk program.
The following environment variables shall affect the execution of awk:
In addition, all environment variables shall be visible via the awk variable ENVIRON.
Default.
The nature of the output files depends on the awk program.
The standard error shall be used only for diagnostic messages.
The nature of the output files depends on the awk program.
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the enclosing brace characters) can be omitted.
A missing pattern shall match any record of input, and a missing action shall be equivalent to:
{ print }
Execution of the awk program shall start by first executing the actions associated with all BEGIN patterns in the order they occur in the program. Then each file operand (or standard input if no files were specified) shall be processed in turn by reading data from the file until a record separator is seen ( <newline> by default). Before the first reference to a field in the record is evaluated, the record shall be split into fields, according to the rules in Regular Expressions, using the value of FS that was current at the time the record was read. Each pattern in the program then shall be evaluated in the order of occurrence, and the action associated with each pattern that matches the current record executed. The action for a matching pattern shall be executed before evaluating subsequent patterns. Finally, the actions associated with all END patterns shall be executed in the order they occur in the program.
Expressions describe computations used in patterns and actions. In the following table, valid expression operations are given in groups from highest precedence first to lowest precedence last, with equal-precedence operators grouped between horizontal lines. In expression evaluation, where the grammar is formally ambiguous, higher precedence operators shall be evaluated before lower precedence operators. In this table expr, expr1, expr2, and expr3 represent any expression, while lvalue represents any entity that can be assigned to (that is, on the left side of an assignment operator). The precise syntax of expressions is given in Grammar .
| Each expression shall have either a string value, a numeric value, |
| or both. Except as stated for specific contexts, the value of |
| an expression shall be implicitly converted to the type needed for |
| the context in which it is used. A string value shall be |
| converted to a numeric value by the equivalent of the following calls |
| to functions defined by the ISO C standard: |
| A numeric value that is exactly equal to the value of an integer (see |
| Concepts Derived |
| from the ISO C Standard ) shall be converted to a string by the |
| equivalent of a call to the sprintf function (see String Functions |
| ) with the string "%d" as the fmt argument and the numeric |
| value being |
| converted as the first and only expr argument. Any other numeric |
| value shall be converted to a string by the equivalent of a |
| call to the sprintf function with the value of the variable |
| CONVFMT as the fmt argument and the numeric value |
| being converted as the first and only expr argument. The result |
| of the conversion is unspecified if the value of |
| CONVFMT is not a floating-point format specification. This volume |
| of IEEE Std 1003.1-2001 specifies no explicit |
| conversions between numbers and strings. An application can force |
| an expression to be treated as a number by adding zero to it, or |
| can force it to be treated as a string by concatenating the null string |
| ( "" ) to it. |
| A string value shall be considered a numeric string if it comes |
| from one of the following: |
| Field variables |
| Input from the getline() function |
| FILENAME |
| ARGV array elements |
| ENVIRON array elements |
| Array elements created by the split() function |
| A command line variable assignment |
| Variable assignment from another numeric string variable |
| and after all the following conversions have been applied, the resulting |
| BEGIN |
| break |
| atan2 |
| close |
| The following single characters shall be recognized as tokens whose |
| names are the character: |
| There is a lexical ambiguity between the token ERE and the tokens |
| ’/’ and DIV_ASSIGN. When an input |
| sequence begins with a slash character in any syntactic context where |
| the token ’/’ or DIV_ASSIGN could appear as |