Command documentation sourced from the linux-command project This comprehensive command reference is part of the linux-command documentation project.

awk - Text Processing Programming Language

The awk command is a powerful programming language designed for text processing and data extraction. It processes text files line by line, allowing users to search, extract, and manipulate text patterns using a concise syntax. Awk is particularly useful for report generation, data analysis, and complex text transformations.

Basic Syntax

awk [options] 'program' file1 file2 ...
awk [options] -f scriptfile file1 file2 ...
awk [options] 'BEGIN{program} pattern{action} END{program}' file

Common Options

Basic Options

-F <fs> - Set field separator (default is whitespace)
`-f `` - Read awk program from file
-v <var>=<value> - Assign variable before processing begins
-W <option> - GNU awk specific options
-m[fr] <val> - Set memory limits (GNU extension)

GNU Awk Extensions

-W re-interval - Enable interval expressions in regex
-W source=program - Source program file
-W posix - POSIX compatibility mode
-W traditional - Traditional awk behavior
-W non-decimal-data - Allow non-decimal data values
-W gen-po - Generate .po file for internationalization
-W use-lc-numeric - Use locale for decimal point

Formatting Options

-W lint - Check for non-portable constructs
-W lint-old - Old lint check
-W compat - Compatibility mode
-W copyleft - Print copyleft information

Awk Program Structure

Basic Components

# Basic syntax
pattern { action }

# Simple one-liners
awk '{print $1}' file.txt                    # Print first field
awk '/pattern/ {print}' file.txt              # Print matching lines
awk '{print NR, $0}' file.txt                 # Print line numbers

# BEGIN and END blocks
awk 'BEGIN{print "Start"} {print $1} END{print "Done"}' file.txt

Patterns

# Regular expression patterns
/regex/           # Lines matching regex
!/regex/          # Lines NOT matching regex

# Relational expressions
$1 > 100          # First field greater than 100
$2 == "error"     # Second field equals "error"
NF >= 5           # Number of fields >= 5

# Range patterns
/start/,/stop/    # From line with "start" to line with "stop"

# Combination patterns
$1 > 100 && $2 < 200    # Both conditions true
$1 == "error" || NF == 0   # Either condition true

Built-in Variables

# Field variables
$0                 # Entire line
$1, $2, $3...      # Individual fields
NF                 # Number of fields
NR                 # Current record number
FNR                # Record number in current file
FS                 # Field separator (input)
OFS                # Output field separator
RS                 # Record separator (input)
ORS                # Output record separator

# File and environment
FILENAME           # Current filename
ARGC               # Number of command line arguments
ARGV               # Array of command line arguments
ENVIRON            # Array of environment variables

# Format and numeric
OFMT               # Output format for numbers
CONVFMT            # Conversion format for numbers
RLENGTH            # Length matched by match function
RSTART             # Start position of match
SUBSEP             # Subscript separator for arrays

Usage Examples

Basic Text Processing

# Print first column
awk '{print $1}' file.txt

# Print last column
awk '{print $NF}' file.txt

# Print specific columns
awk '{print $1, $3, $5}' file.txt

# Print with custom separator
awk 'BEGIN{OFS="|"} {print $1, $2, $3}' file.txt

# Print line numbers
awk '{print NR, $0}' file.txt

Pattern Matching

# Print lines containing pattern
awk '/error/ {print}' file.txt

# Print lines NOT containing pattern
awk '!/error/ {print}' file.txt

# Case-insensitive matching
awk 'BEGIN{IGNORECASE=1} /error/ {print}' file.txt

# Multiple patterns
awk '/error/ || /warning/ {print}' file.txt

Field Processing

# Set custom field separator
awk -F: '{print $1, $6}' /etc/passwd        # Use colon as separator
awk -F'[ ,:]+' '{print $1, $2}' file.txt     # Multiple separators

# Conditional field processing
awk 'NF > 3 {print "Line", NR, "has", NF, "fields"}' file.txt
awk '$1 > 100 {print $1, "is greater than 100"}' file.txt

Calculations and Aggregations

# Sum columns
awk '{sum += $3} END {print "Total:", sum}' file.txt

# Average calculation
awk '{sum += $3; count++} END {print "Average:", sum/count}' file.txt

# Min and Max values
awk 'NR==1 {min=max=$1} {if($1<min) min=$1; if($1>max) max=$1} END {print "Min:", min, "Max:", max}' file.txt

# Count occurrences
awk '/error/ {count++} END {print "Errors:", count}' file.txt

String Operations

# String length
awk '{print $1, length($1)}' file.txt

# String substitution
awk '{gsub(/old/, "new"); print}' file.txt

# Case conversion
awk '{print toupper($1), tolower($2)}' file.txt

# Substring extraction
awk '{print substr($1, 1, 5)}' file.txt        # First 5 characters

Array Operations

# Count unique values
awk '{count[$1]++} END {for (key in count) print key, count[key]}' file.txt

# Group and aggregate
awk '{sum[$1] += $2} END {for (key in sum) print key, sum[key]}' file.txt

# Multiple aggregation
awk '{count[$1]++; sum[$1] += $2} END {for (key in count) print key, count[key], sum[key]/count[key]}' file.txt

File Processing

# Multiple file processing
awk 'FNR==1 {print "Processing:", FILENAME} {print NR, $0}' file1.txt file2.txt

# File header/footer
awk 'BEGIN{print "Report Generated:", strftime()} {print} END{print "End of Report"}' file.txt

Advanced Features

Built-in Functions

String Functions

# Length and substring
length(string)                 # String length
substr(string, start, len)     # Extract substring
index(string, find)           # Find position of substring

# Case conversion
toupper(string)               # Convert to uppercase
tolower(string)               # Convert to lowercase

# Split and join
split(string, array, sep)     # Split string into array
sprintf(format, args...)      # Formatted string

# Search and replace
sub(regex, replacement, string)    # Replace first occurrence
gsub(regex, replacement, string)   # Replace all occurrences
match(string, regex)               # Find and return position

# Trimming
sub(/^[ \t]+/, "")             # Trim leading whitespace
sub(/[ \t]+$/, "")             # Trim trailing whitespace

Mathematical Functions

# Basic math
sqrt(x)                        # Square root
exp(x)                         # Exponential function
log(x)                         # Natural logarithm
sin(x), cos(x), tan(x)         # Trigonometric functions
int(x)                         # Integer part
rand()                         # Random number 0-1
srand(seed)                    # Seed random generator

# Rounding
printf "%.2f", value           # Format with 2 decimal places

Time Functions

# Current time
strftime(format, timestamp)    # Format timestamp
systime()                      # Current Unix timestamp

Advanced Programming

User-defined Functions

awk '
function square(x) {
    return x * x
}

function max(a, b) {
    return (a > b) ? a : b
}

{
    print "Square of", $1, "is", square($1)
    print "Max of", $1, "and", $2, "is", max($1, $2)
}' file.txt

Conditional Statements

awk '
{
    if ($1 > 100) {
        print $1, "is large"
    } else if ($1 > 50) {
        print $1, "is medium"
    } else {
        print $1, "is small"
    }

    # Ternary operator
    result = ($1 > 50) ? "large" : "small"
    print "Result:", result
}' file.txt

Loops

awk '
{
    # For loop
    for (i = 1; i <= NF; i++) {
        print "Field", i, ":", $i
    }

    # While loop
    i = 1
    while (i <= NF) {
        print "Field", i, ":", $i
        i++
    }

    # Do-while loop
    i = 1
    do {
        print "Field", i, ":", $i
        i++
    } while (i <= NF)
}' file.txt

Practical Examples

Log File Analysis

# Analyze Apache access logs
awk '
{
    split($9, code, " ")
    status = code[1]
    count[status]++
    bytes += $10
}
END {
    for (status in count) {
        print "Status", status, ":", count[status], "requests"
    }
    print "Total bytes transferred:", bytes
}' /var/log/apache2/access.log

Data Validation

# Validate CSV format
awk -F, '
NF != 5 {
    print "Line", NR, "has", NF, "fields (expected 5)"
    next
}
{
    for (i = 1; i <= NF; i++) {
        if ($i == "") {
            print "Line", NR, "field", i, "is empty"
        }
    }
}' data.csv

File Processing

# Convert between formats
awk 'BEGIN{OFS=","}
{
    for (i = 1; i <= NF; i++) {
        gsub(/"/, "\\\"", $i)  # Escape quotes
    }
    print $1, $2, $3
}' space_separated.txt > csv_output.csv

System Monitoring

# Monitor system processes
ps aux | awk '
NR > 1 {
    mem[$1] += $4
    cpu[$1] += $3
    count[$1]++
}
END {
    print "USER\t\tCPU%\tMEM%\tPROCS"
    for (user in mem) {
        printf "%-15s\t%.1f\t%.1f\t%d\n", user, cpu[user], mem[user], count[user]
    }
}'

Configuration File Processing

# Parse configuration files
awk -F= '
/^[^#]/ && $1 != "" {
    sub(/^[ \t]+/, "", $1)  # Trim leading whitespace
    sub(/[ \t]+$/, "", $1)  # Trim trailing whitespace
    sub(/^[ \t]+/, "", $2)  # Trim leading whitespace
    sub(/[ \t]+$/, "", $2)  # Trim trailing whitespace
    config[$1] = $2
}
END {
    for (key in config) {
        print key "=" config[key]
    }
}' config.conf

Performance and Optimization

Efficient Processing

# Process large files efficiently
awk '
BEGIN {
    # Initialize variables in BEGIN block
    count = 0
    sum = 0
}
{
    # Avoid repeated function calls
    if ($1 > threshold) {
        count++
        sum += $2
    }
}
END {
    # Final calculations
    print "Count:", count
    print "Sum:", sum
    if (count > 0) print "Average:", sum/count
}' large_file.txt

Memory Management

# Delete array elements when no longer needed
awk '
{
    data[$1] = $2
    if (NR % 1000000 == 0) {
        # Periodically clear data if possible
        for (key in data) delete data[key]
    }
}' very_large_file.txt

sed - Stream editor for text transformation
grep - Pattern searching tool
cut - Remove sections from lines
sort - Sort lines of text files
uniq - Remove duplicate lines
perl - Perl programming language
python - Python programming language
tr - Translate characters
wc - Word, line, and byte count
paste - Merge lines of files

Best Practices

Use appropriate field separators: Choose separators that match your data format
Leverage built-in variables: Use NF, NR, and other built-ins efficiently
Process in BEGIN/END blocks: Initialize variables and print summaries
Use arrays for aggregation: Group and summarize data efficiently
Apply patterns selectively: Only process lines that match your criteria
Use functions for complex logic: Improve code readability and reusability
Optimize for large files: Consider memory usage and performance
Validate input: Check data integrity before processing
Use gawk extensions: Take advantage of GNU awk features when available
Document complex scripts: Add comments for maintainability

Troubleshooting

Common Issues

# Field separator problems
awk -F: '{print $1}' file.txt           # Use explicit separator

# Numeric vs string comparison
awk '$1 == "123" {print}' file.txt        # String comparison
awk '$1 == 123 {print}' file.txt          # Numeric comparison

# File not found
ls -la file.txt                          # Check file exists

# Permission denied
sudo awk '{print}' protected_file.txt     # Use sudo if needed

Performance Issues

# Slow processing of large files
awk 'NR%1000==0 {print "Processed:", NR}' large_file.txt  # Progress indicator

# Memory issues
awk '{print $0}' large_file.txt > processed.txt          # Redirect output

The awk command is a versatile and powerful text processing tool that combines programming capabilities with simplicity. Mastering its features enables efficient data analysis, report generation, and complex text transformations with minimal code.

Basic Syntax​

Common Options​

Basic Options​

GNU Awk Extensions​

Formatting Options​

Awk Program Structure​

Basic Components​

Patterns​

Built-in Variables​

Usage Examples​

Basic Text Processing​

Pattern Matching​

Field Processing​

Calculations and Aggregations​

String Operations​

Array Operations​

File Processing​

Advanced Features​

Built-in Functions​

String Functions​

Mathematical Functions​

Time Functions​

Advanced Programming​

User-defined Functions​

Conditional Statements​

Loops​

Practical Examples​

Log File Analysis​

Data Validation​

File Processing​

System Monitoring​

Configuration File Processing​

Performance and Optimization​

Efficient Processing​

Memory Management​

Related Commands​

Best Practices​

Troubleshooting​

Common Issues​

Performance Issues​

Basic Syntax

Common Options

Basic Options

GNU Awk Extensions

Formatting Options

Awk Program Structure

Basic Components

Patterns

Built-in Variables

Usage Examples

Basic Text Processing

Pattern Matching

Field Processing

Calculations and Aggregations

String Operations

Array Operations

File Processing

Advanced Features

Built-in Functions

String Functions

Mathematical Functions

Time Functions

Advanced Programming

User-defined Functions

Conditional Statements

Loops

Practical Examples

Log File Analysis

Data Validation

File Processing

System Monitoring

Configuration File Processing

Performance and Optimization

Efficient Processing

Memory Management

Related Commands

Best Practices

Troubleshooting

Common Issues

Performance Issues