Lua - Pattern Matching

Pattern Matching

Master Lua's powerful pattern matching system for text processing, data validation, and extraction. Learn to use patterns effectively for parsing, searching, and transforming text data.

Estimated time: 30-35 minutes

Learning Objectives

  • Understand Lua pattern syntax and character classes
  • Use string.match, string.find, and string.gsub effectively
  • Build complex patterns for data extraction
  • Implement text processing and validation systems
  • Apply pattern matching to real-world parsing tasks

Pattern Basics

Lua patterns are simpler than regular expressions but very powerful for text processing. They use special characters to match different types of text.

-- Basic pattern matching concepts
local function demonstrate_basic_patterns()
  local text = "Hello World 123! How are you today?"
  
  print("=== Basic Pattern Matching ===")
  print("Text:", text)
  print()
  
  -- Simple literal matching
  local match = string.match(text, "World")
  print("Match 'World':", match)
  
  -- Case matters
  match = string.match(text, "world")
  print("Match 'world' (case sensitive):", match or "nil")
  
  -- Find with position
  local start_pos, end_pos = string.find(text, "World")
  print("Position of 'World':", start_pos, end_pos)
  
  -- Extract matched text
  if start_pos then
    local found = text:sub(start_pos, end_pos)
    print("Extracted text:", found)
  end
  
  -- Multiple matches with string.gmatch
  print("\nAll words (simple approach):")
  for word in string.gmatch(text, "%w+") do
    print("  " .. word)
  end
  
  -- Pattern with character classes
  print("\nAll numbers:")
  for number in string.gmatch(text, "%d+") do
    print("  " .. number)
  end
  
  -- Pattern with captures
  local greeting, name = string.match("Hello Alice", "(%w+) (%w+)")
  print("\nCapture groups:")
  print("Greeting:", greeting)
  print("Name:", name)
end

demonstrate_basic_patterns()

Expected Output:

=== Basic Pattern Matching ===
Text: Hello World 123! How are you today?

Match 'World': World
Match 'world' (case sensitive): nil
Position of 'World': 7 12
Extracted text: World

All words (simple approach):
  Hello
  World
  123
  How
  are
  you
  today

All numbers:
  123

Capture groups:
Greeting: Hello
Name: Alice

Character Classes and Modifiers

Lua provides predefined character classes and modifiers for flexible pattern matching.

-- Character classes and modifiers
local function demonstrate_character_classes()
  print("=== Character Classes and Modifiers ===")
  
  local test_strings = {
    "abc123XYZ",
    "[email protected]",
    "Phone: (555) 123-4567",
    "Price: $19.99",
    "   whitespace   ",
    "Mixed123Symbols!@#"
  }
  
  -- Character class reference
  local patterns = {
    {"%a+", "Letters only"},
    {"%d+", "Digits only"},
    {"%w+", "Alphanumeric"},
    {"%s+", "Whitespace"},
    {"%p+", "Punctuation"},
    {"%x+", "Hexadecimal digits"},
    {"%c+", "Control characters"},
    {"%l+", "Lowercase letters"},
    {"%u+", "Uppercase letters"}
  }
  
  for _, test_string in ipairs(test_strings) do
    print(string.format("\nTesting: '%s'", test_string))
    
    for _, pattern_info in ipairs(patterns) do
      local pattern, description = pattern_info[1], pattern_info[2]
      local matches = {}
      
      for match in string.gmatch(test_string, pattern) do
        table.insert(matches, match)
      end
      
      if #matches > 0 then
        print(string.format("  %-20s: %s", description, table.concat(matches, ", ")))
      end
    end
  end
  
  -- Modifiers demonstration
  print("\n=== Pattern Modifiers ===")
  local sample = "aaabbbcccdddeee"
  
  print("Sample text:", sample)
  print("'a+' (one or more):", string.match(sample, "a+"))
  print("'a*' (zero or more):", string.match(sample, "xa*")) -- starts with x, then zero or more a
  print("'a?' (zero or one):", string.match(sample, "ba?"))
  print("'a-' (lazy/non-greedy):", string.match(sample, "a-b"))
  
  -- Character sets
  print("\n=== Custom Character Sets ===")
  local email = "[email protected]"
  
  -- Match email parts
  local username, domain = string.match(email, "([%w%.%+%-]+)@([%w%.%-]+)")
  print("Email:", email)
  print("Username:", username)
  print("Domain:", domain)
  
  -- Negated character classes
  local mixed = "abc123def456"
  print("\nMixed string:", mixed)
  print("Non-digits:", string.match(mixed, "[^%d]+"))
  print("Non-letters:", string.match(mixed, "[^%a]+"))
end

demonstrate_character_classes()

Expected Output:

=== Character Classes and Modifiers ===

Testing: 'abc123XYZ'
  Letters only       : abc, XYZ
  Digits only        : 123
  Alphanumeric       : abc123XYZ
  Lowercase letters  : abc
  Uppercase letters  : XYZ

Testing: '[email protected]'
  Letters only       : user, example, com
  Alphanumeric       : user, example, com
  Punctuation        : @, .
  Lowercase letters  : user, example, com

...

=== Pattern Modifiers ===
Sample text: aaabbbcccdddeee
'a+' (one or more): aaa
'a*' (zero or more): nil
'a?' (zero or one): a
'a-' (lazy/non-greedy): a

=== Custom Character Sets ===
Email: [email protected]
Username: user.name+tag
Domain: domain.com

Mixed string: abc123def456
Non-digits: abc
Non-letters: 123

Advanced Pattern Techniques

Combine patterns with captures, anchors, and advanced techniques for complex text processing.

-- Advanced pattern techniques
local function demonstrate_advanced_patterns()
  print("=== Advanced Pattern Techniques ===")
  
  -- Anchored patterns
  local urls = {
    "https://www.example.com/page",
    "Visit https://site.org for more",
    "ftp://files.domain.net/path"
  }
  
  print("--- Anchored Patterns ---")
  for _, url in ipairs(urls) do
    -- Start anchor ^
    local starts_https = string.match(url, "^https://")
    print(string.format("'%s' starts with https: %s", url, starts_https and "yes" or "no"))
    
    -- End anchor $
    local ends_path = string.match(url, "/[%w]+$")
    print(string.format("'%s' ends with path: %s", url, ends_path or "no"))
  end
  
  -- Balanced patterns
  print("\n--- Balanced Patterns ---")
  local code_snippets = {
    "function test() print('hello') end",
    "if (x > 0) then print(x) end",
    "array[index[nested]]",
    "unbalanced ( bracket"
  }
  
  for _, snippet in ipairs(code_snippets) do
    -- Extract content between parentheses
    local content = string.match(snippet, "%b()")
    print(string.format("'%s' -> parentheses content: %s", 
                       snippet, content or "none"))
    
    -- Extract content between square brackets
    content = string.match(snippet, "%b[]")
    print(string.format("'%s' -> brackets content: %s", 
                       snippet, content or "none"))
  end
  
  -- Multiple captures and numbered captures
  print("\n--- Multiple Captures ---")
  local log_entries = {
    "2024-01-15 14:30:25 [INFO] User logged in: alice",
    "2024-01-15 14:31:02 [ERROR] Database connection failed",
    "2024-01-15 14:31:15 [WARN] High memory usage detected"
  }
  
  for _, entry in ipairs(log_entries) do
    local date, time, level, message = string.match(entry, 
      "(%d+%-%d+%-%d+) (%d+:%d+:%d+) %[(%w+)%] (.+)")
    
    if date then
      print(string.format("Date: %s, Time: %s, Level: %s", date, time, level))
      print(string.format("Message: %s", message))
      print()
    end
  end
  
  -- Frontier patterns
  print("--- Frontier Patterns ---")
  local text = "The price is $19.99 and tax is $2.50"
  
  -- Find word boundaries
  for word in string.gmatch(text, "%f[%a]%w+%f[%A]") do
    print("Word:", word)
  end
  
  -- Find numbers with word boundaries
  for number in string.gmatch(text, "%f[%d]%d+%.?%d*%f[%D]") do
    print("Number:", number)
  end
end

demonstrate_advanced_patterns()

Expected Output:

=== Advanced Pattern Techniques ===
--- Anchored Patterns ---
'https://www.example.com/page' starts with https: yes
'https://www.example.com/page' ends with path: /page
'Visit https://site.org for more' starts with https: no
'Visit https://site.org for more' ends with path: no
'ftp://files.domain.net/path' starts with https: no
'ftp://files.domain.net/path' ends with path: /path

--- Balanced Patterns ---
'function test() print('hello') end' -> parentheses content: ()
'if (x > 0) then print(x) end' -> parentheses content: (x > 0)
'array[index[nested]]' -> brackets content: [nested]
'unbalanced ( bracket' -> parentheses content: none

--- Multiple Captures ---
Date: 2024-01-15, Time: 14:30:25, Level: INFO
Message: User logged in: alice

Date: 2024-01-15, Time: 14:31:02, Level: ERROR
Message: Database connection failed

--- Frontier Patterns ---
Word: The
Word: price
Word: is
Word: and
Word: tax
Word: is
Number: 19.99
Number: 2.50

String Substitution with gsub

Use string.gsub for powerful text replacement and transformation operations.

-- String substitution with gsub
local function demonstrate_gsub()
  print("=== String Substitution with gsub ===")
  
  -- Basic substitution
  local text = "Hello World, Welcome to the World of Lua"
  local result = string.gsub(text, "World", "Universe")
  print("Original:", text)
  print("Replace 'World' with 'Universe':", result)
  
  -- Limited replacements
  result = string.gsub(text, "World", "Universe", 1) -- Replace only first occurrence
  print("Replace first 'World' only:", result)
  
  -- Pattern-based replacement
  local mixed_case = "hELLo WoRLD"
  result = string.gsub(mixed_case, "%u", string.lower) -- Convert uppercase to lowercase
  print("\nOriginal:", mixed_case)
  print("Uppercase to lowercase:", result)
  
  -- Function-based replacement
  local numbers = "The price is 15 dollars and 25 cents"
  result = string.gsub(numbers, "%d+", function(num)
    return tostring(tonumber(num) * 2)
  end)
  print("\nOriginal:", numbers)
  print("Double all numbers:", result)
  
  -- Capture-based replacement
  local names = "Smith, John; Doe, Jane; Brown, Bob"
  result = string.gsub(names, "(%w+), (%w+)", "%2 %1") -- Swap first and last names
  print("\nOriginal:", names)
  print("Swap names:", result)
  
  -- Complex transformation
  print("\n--- Complex Text Processing ---")
  
  -- CSV processing
  local csv_line = '"John Doe",25,"New York","Software Engineer"'
  local fields = {}
  
  -- Extract quoted fields
  string.gsub(csv_line, '"([^"]*)"', function(field)
    table.insert(fields, field)
    return ""
  end)
  
  print("CSV line:", csv_line)
  print("Extracted fields:")
  for i, field in ipairs(fields) do
    print(string.format("  Field %d: %s", i, field))
  end
  
  -- URL processing
  local url = "https://api.example.com/users?name=john&age=25&city=newyork"
  local base_url, params = string.match(url, "([^%?]+)%?(.+)")
  
  print("\nURL:", url)
  print("Base URL:", base_url)
  print("Parameters:")
  
  string.gsub(params, "([^&=]+)=([^&]*)", function(key, value)
    print(string.format("  %s = %s", key, value))
  end)
  
  -- Template processing
  print("\n--- Template Processing ---")
  local template = "Hello {{name}}, you have {{count}} new messages in {{folder}}."
  local variables = {
    name = "Alice",
    count = "5",
    folder = "Inbox"
  }
  
  result = string.gsub(template, "{{(%w+)}}", function(var_name)
    return variables[var_name] or ("{{" .. var_name .. "}}")
  end)
  
  print("Template:", template)
  print("Result:", result)
end

demonstrate_gsub()

Expected Output:

=== String Substitution with gsub ===
Original: Hello World, Welcome to the World of Lua
Replace 'World' with 'Universe': Hello Universe, Welcome to the Universe of Lua
Replace first 'World' only: Hello Universe, Welcome to the World of Lua

Original: hELLo WoRLD
Uppercase to lowercase: hello world

Original: The price is 15 dollars and 25 cents
Double all numbers: The price is 30 dollars and 50 cents

Original: Smith, John; Doe, Jane; Brown, Bob
Swap names: John Smith; Jane Doe; Bob Brown

--- Complex Text Processing ---
CSV line: "John Doe",25,"New York","Software Engineer"
Extracted fields:
  Field 1: John Doe
  Field 2: 25
  Field 3: New York
  Field 4: Software Engineer

URL: https://api.example.com/users?name=john&age=25&city=newyork
Base URL: https://api.example.com/users
Parameters:
  name = john
  age = 25
  city = newyork

--- Template Processing ---
Template: Hello {{name}}, you have {{count}} new messages in {{folder}}.
Result: Hello Alice, you have 5 new messages in Inbox.

Real-World Applications

Apply pattern matching to practical problems like data validation, parsing, and text processing.

-- Real-world pattern matching applications
local Validator = {}

-- Email validation
function Validator.email(email)
  if not email or type(email) ~= "string" then
    return false, "Email must be a string"
  end
  
  -- Simplified email pattern
  local pattern = "^[%w%._%+-]+@[%w%.%-]+%.%a%a+$"
  local is_valid = string.match(email, pattern) ~= nil
  
  return is_valid, is_valid and "Valid email" or "Invalid email format"
end

-- Phone number validation (US format)
function Validator.phone(phone)
  if not phone or type(phone) ~= "string" then
    return false, "Phone must be a string"
  end
  
  -- Remove all non-digits
  local digits = string.gsub(phone, "%D", "")
  
  -- Check if it's 10 or 11 digits (with country code)
  if #digits == 10 then
    return true, "Valid phone number"
  elseif #digits == 11 and digits:sub(1,1) == "1" then
    return true, "Valid phone number with country code"
  else
    return false, "Phone number must be 10 or 11 digits"
  end
end

-- Credit card validation (basic format check)
function Validator.credit_card(card_number)
  if not card_number or type(card_number) ~= "string" then
    return false, "Card number must be a string"
  end
  
  -- Remove spaces and dashes
  local clean_number = string.gsub(card_number, "[%s%-]", "")
  
  -- Check if all digits and proper length
  if not string.match(clean_number, "^%d+$") then
    return false, "Card number must contain only digits"
  end
  
  local length = #clean_number
  if length < 13 or length > 19 then
    return false, "Card number must be 13-19 digits"
  end
  
  -- Identify card type by pattern
  local card_type = "Unknown"
  if string.match(clean_number, "^4") then
    card_type = "Visa"
  elseif string.match(clean_number, "^5[1-5]") then
    card_type = "MasterCard"
  elseif string.match(clean_number, "^3[47]") then
    card_type = "American Express"
  end
  
  return true, string.format("Valid %s card number", card_type)
end

-- Log parser
local LogParser = {}

function LogParser.parse_apache_log(log_line)
  -- Apache Common Log Format pattern
  local pattern = '([%d%.]+) %- %- %[([^%]]+)%] "(%w+) ([^"]+) HTTP/[%d%.]+" (%d+) (%d+)'
  
  local ip, timestamp, method, path, status, size = string.match(log_line, pattern)
  
  if not ip then
    return nil, "Invalid log format"
  end
  
  return {
    ip_address = ip,
    timestamp = timestamp,
    method = method,
    path = path,
    status_code = tonumber(status),
    response_size = tonumber(size)
  }
end

function LogParser.parse_error_log(log_line)
  -- Error log pattern: [timestamp] [level] message
  local pattern = '%[([^%]]+)%] %[(%w+)%] (.+)'
  
  local timestamp, level, message = string.match(log_line, pattern)
  
  if not timestamp then
    return nil, "Invalid error log format"
  end
  
  return {
    timestamp = timestamp,
    level = level,
    message = message
  }
end

-- Configuration parser
local ConfigParser = {}

function ConfigParser.parse_ini_file(content)
  local config = {}
  local current_section = "default"
  config[current_section] = {}
  
  for line in string.gmatch(content, "[^\r\n]+") do
    -- Remove leading/trailing whitespace
    line = string.match(line, "^%s*(.-)%s*$")
    
    -- Skip empty lines and comments
    if line ~= "" and not string.match(line, "^[#;]") then
      -- Section header
      local section = string.match(line, "^%[(.+)%]$")
      if section then
        current_section = section
        config[current_section] = {}
      else
        -- Key-value pair
        local key, value = string.match(line, "^([^=]+)=(.*)$")
        if key and value then
          key = string.match(key, "^%s*(.-)%s*$")     -- trim key
          value = string.match(value, "^%s*(.-)%s*$") -- trim value
          
          -- Handle quoted values
          local quoted_value = string.match(value, '^"(.*)"$')
          if quoted_value then
            value = quoted_value
          end
          
          config[current_section][key] = value
        end
      end
    end
  end
  
  return config
end

-- Demo real-world applications
local function demo_real_world_applications()
  print("=== Real-World Pattern Matching Applications ===")
  
  -- Validation demos
  print("--- Data Validation ---")
  local test_emails = {
    "[email protected]",
    "invalid.email",
    "[email protected]",
    "@invalid.com",
    "user@"
  }
  
  for _, email in ipairs(test_emails) do
    local is_valid, message = Validator.email(email)
    print(string.format("Email '%s': %s", email, message))
  end
  
  print("\n--- Phone Validation ---")
  local test_phones = {
    "(555) 123-4567",
    "555-123-4567",
    "15551234567",
    "555.123.4567",
    "123-45"
  }
  
  for _, phone in ipairs(test_phones) do
    local is_valid, message = Validator.phone(phone)
    print(string.format("Phone '%s': %s", phone, message))
  end
  
  -- Log parsing demo
  print("\n--- Log Parsing ---")
  local apache_log = '192.168.1.100 - - [15/Jan/2024:14:30:25 +0000] "GET /index.html HTTP/1.1" 200 1234'
  local log_entry, error_msg = LogParser.parse_apache_log(apache_log)
  
  if log_entry then
    print("Parsed Apache log:")
    for key, value in pairs(log_entry) do
      print(string.format("  %s: %s", key, value))
    end
  else
    print("Log parsing failed:", error_msg)
  end
  
  -- Configuration parsing demo
  print("\n--- Configuration Parsing ---")
  local ini_content = [[
# Database configuration
[database]
host = localhost
port = 5432
name = "my app database"
user = dbuser

[web]
port = 8080
debug = true
]]
  
  local config = ConfigParser.parse_ini_file(ini_content)
  print("Parsed configuration:")
  for section, values in pairs(config) do
    print(string.format("[%s]", section))
    for key, value in pairs(values) do
      print(string.format("  %s = %s", key, value))
    end
  end
end

demo_real_world_applications()

Expected Output:

=== Real-World Pattern Matching Applications ===
--- Data Validation ---
Email '[email protected]': Valid email
Email 'invalid.email': Invalid email format
Email '[email protected]': Valid email
Email '@invalid.com': Invalid email format
Email 'user@': Invalid email format

--- Phone Validation ---
Phone '(555) 123-4567': Valid phone number
Phone '555-123-4567': Valid phone number
Phone '15551234567': Valid phone number with country code
Phone '555.123.4567': Valid phone number
Phone '123-45': Phone number must be 10 or 11 digits

--- Log Parsing ---
Parsed Apache log:
  ip_address: 192.168.1.100
  timestamp: 15/Jan/2024:14:30:25 +0000
  method: GET
  path: /index.html
  status_code: 200
  response_size: 1234

--- Configuration Parsing ---
Parsed configuration:
[database]
  host = localhost
  port = 5432
  name = my app database
  user = dbuser
[web]
  port = 8080
  debug = true

Common Pitfalls

Pattern Matching Best Practices

  • Escape special characters: Use % to escape pattern special characters in literal text
  • Anchor patterns: Use ^ and $ for start/end matching to avoid partial matches
  • Greedy vs lazy: Understand the difference between + (greedy) and - (lazy) modifiers
  • Character classes: Use appropriate character classes instead of overly broad patterns
  • Performance: Complex patterns can be slow on large texts; consider alternatives for heavy processing

Checks for Understanding

  1. What's the difference between %w and [%w]?
  2. How do you match a literal % character in a pattern?
  3. What does the frontier pattern %f[%a] match?
  4. When should you use captures in string.gsub?
Show answers
  1. No difference - both match alphanumeric characters
  2. Use %% to match a literal percent sign
  3. A position between non-letter and letter characters (word boundary)
  4. When you need to reference matched parts in the replacement string

Exercises

  1. Create a markdown parser using patterns
  2. Build a SQL query validator with pattern matching
  3. Implement a template engine with advanced substitution