Learning AWK Programming: A fast, and simple cutting-edge utility for text-processing on the Unix-like environment
9781788391030, 1788391039
Text processing and pattern matching simplified Key FeaturesMaster the fastest and most elegant big data munging languag
128
77
2MB
English
Pages 416
[408]
Report DMCA / Copyright
DOWNLOAD PDF FILE
Table of contents :
Cover
Title Page
Copyright and Credits
Dedication
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: Getting Started with AWK Programming
AWK programming language overview
What is AWK?
Types of AWK
When and where to use AWK
Getting started with AWK
Installation on Linux
Using the package manager
Compiling from the source code
Workflow of AWK
Action and pattern structure of AWK
Example data file
Pattern-only statements
Action-only statements
Printing each input line/record
Using the BEGIN and END blocks construct
The BEGIN block
The body block
The END block
Patterns
Actions
Running AWK programs
AWK as a Unix command line
AWK as a filter (reading input from the Terminal)
Running AWK programs from the source file
AWK programs as executable script files
Extending the AWK command line on multiple lines
Comments in AWK
Shell quotes with AWK
Data files used as examples in this book
Some simple examples with default usage
Multiple rules with AWK
Using standard input with names in AWK
AWK standard options
Standard command-line options
The -F option – field separator
The -f option (read source file)
The -v option (assigning variables)
GAWK-only options
The --dump-variables option (AWK global variables)
The --profile option (profiling)
The --sandbox option
The -i option (including other files in your program)
Include other files in the GAWK program (using @include)
The -V option
Summary
Chapter 2: Working with Regular Expressions
Introduction to regular expressions
What is a regular expression?
Why use regular expressions?
Using regular expressions with AWK
Regular expressions as string-matching patterns with AWK
Basic regular expression construct
Understanding regular expression metacharacters
Quoted metacharacter
Anchors
Matching at the beginning of a string
Matching at the end of a string
Dot
Brackets expressions
Character classes
Named character classes (POSIX standard)
Complemented bracket expressions
Complemented character classes
Complemented named character classes
Alternation operator
Unary operator for repetition
Closure
Positive closure
Zero or one
Repetition ranges with interval expressions
A single number in brackets
A single number followed by a comma in brackets
Two numbers in brackets
Grouping using parentheses
Concatenation using alternation operator within parentheses
Backreferencing in regular expressions – sed and grep
Precedence in regular expressions
GAWK-specific regular expression operators
Matching whitespaces
Matching not whitespaces
Matching words (\w)
Matching non-words
Matching word boundaries
Matching at the beginning of a word
Matching at the end of a word
Matching not as a sub-string using
Matching a string as sub-string only using
Case-sensitive matching
Escape sequences
Summary
Chapter 3: AWK Variables and Constants
Built-in variables in AWK
Field separator
Using a single character or simple string as a value of the FS
Using regular expressions as values of the FS
Using each character as a separate field
Using the command line to set the FS as -F
Output field separator
Record separator
Outputting the record separator
NR and NF
FILENAME
Environment variables in AWK
ARGC and ARGV
CONVFMT and OFMT
RLENGTH and RSTART
FNR
ENVIRON and SUBSET
FIELD (POSITIONAL) VARIABLE ($0 and $n)
Environment variables in GAWK
ARGIND
ERRNO
FIELDWIDTHS
IGNORECASE
PROCINFO
String constants
Numeric constants
Conversion between strings and numbers
Summary
Chapter 4: Working with Arrays in AWK
One-dimensional arrays
Assignment in arrays
Accessing elements in arrays
Referring to members in arrays
Processing arrays using loops
Using the split() function to create arrays
Delete operation in arrays
Multidimensional arrays
Summary
Chapter 5: Printing Output in AWK
The print statement
Role of output separator in print statement
Pretty printing with the printf statement
Escape sequences for special character printing
Different format control characters in the format specifier
Format specification modifiers
Printing with fixed column width
Using the minus modifier (-) for left justification
Printing with fixed width – right justified
Using hash modifier (#)
Using plus modifier (+) for prefixing with sign/symbol
Printing with prefix sign/symbol
Dot precision as modifier
Positional modifier using integer constant followed by $ (N$):
Redirecting output to file
Redirecting output to a file (>)
Appending output to a file (>>)
Sending output on other commands using pipe (|)
Special file for redirecting output (/dev/null, stderr)
Closing files and pipes
Summary
Chapter 6: AWK Expressions
AWK variables and constants
Arithmetic expressions using binary operators
Assignment expressions
Increment and decrement expressions
Relational expressions
Logical or Boolean expressions
Ternary expressions
Unary expressions
Exponential expressions
String concatenation
Regular expression operators
Operators' Precedence
Summary
Chapter 7: AWK Control Flow Statements
Conditional statements
The if statement
if
If...else
The if...else...if statement
The switch statement (a GAWK-specific feature)
Looping statement
The while loop
do...while loop statement
The for loop statement
For each loop statement
Statements affecting flow control
Break usage
Usage of continue
Exit usage
Next usage
Summary
Chapter 8: AWK Functions
Built-in functions
Arithmetic functions
The sin (expr) function
The cos (expr) function
The atan2 (x, y) function
The int (expr) function
The exp (expr) function
The log (expr) function
The sqrt (expr) function
The rand() function
The srand ([expr]) function
Summary table of built-in arithmetic functions
String functions
The index (str, sub) function
The length ( string ) function
The split (str, arr, regex) function
The substr (str, start, [ length ]) function
The sub (regex, replacement, string) function
The gsub (regex, replacement, string) function
The gensub (regex, replacement, occurrence, [ string ]) function
The match (string, regex) function
The tolower (string) function
The toupper (string) function
The sprintf (format, expression) function
The strtonum (string) function
Summary table of built-in string functions
Input/output functions
The close (filename [to/from]) function
The fflush ([ filename ]) function
The system (command) function
The getline command
Simple getline
Getline into a variable
Getline from a file
Using getline to get a variable from a file
Using getline to output into a pipe
Using getline to change the output into a variable from a pipe
Using getline to change the output into a variable from a coprocess
The nextfile() function
The time function
The systime() function
The mktime (datespec) function
The strftime (format, timestamp) function
Bit-manipulating functions
The and (num1, num2) function
The or (num1, num2) function
The xor (num1, num2) function
The lshift (val, count) function
The rshift (val, count) function
The compl (num) function
User-defined functions
Function definition and syntax
Calling user-defined functions
Controlling variable scope
Return statement
Making indirect function calls
Summary
Chapter 9: GNU's Implementation of AWK – GAWK (GNU AWK)
Things you don't know about GAWK
Reading non-decimal input
GAWK's built-in command line debugger
What is debugging?
Debugger concepts
Using GAWK as a debugger
Starting the debugger
Set breakpoint
Removing the breakpoint
Running the program
Looking inside the program
Displaying some variables and data
Setting watch and unwatch
Controlling the execution
Viewing environment information
Saving the commands in file
Exiting the debugger
Array sorting
Sort array by values using asort( )
Sort array indexes using asorti()
Two-way inter-process communication
Using GAWK for network programming
TCP client and server (/inet/tcp)
UDP client and server ( /inet/udp )
Reading a web page using HttpService
Profiling
Summary
Chapter 10: Practical Implementation of AWK
Working with one-liners for text processing and pattern matching with AWK
Selective printing of lines with AWK
Modifying line spacing in a file with AWK
Numbering and calculations with AWK
Selective deletion of certain lines in a file with AWK
String operation on selected lines with AWK
Array creation with AWK one-liner
Text conversion and substitution in files with AWK
One-liners for system administrators
Use case examples of pattern matching using AWK
Parsing web server (Apache/Nginx) log files
Understanding the Apache combined log format
Using AWK for processing different log fields
Identifying problems with the running website
Printing the top 10 request IP addresses with their GeoIP information
Counting and printing unique visits to a website
Real-time IP address lookup for requests
Converting text to HTML table
Converting decimal to binary
Renaming files in a directory with AWK
Printing a generated sequence of numbers in a specified columnate format
Transposing a matrix
Processing multiple files using AWK
Summary
Further reading
Index