Lua - Pattern Matching
Pattern Matching
Master Lua's powerful pattern matching system for text processing, data validation, and extraction. Learn to use patterns effectively for parsing, searching, and transforming text data.
Estimated time: 30-35 minutes
Learning Objectives
- Understand Lua pattern syntax and character classes
- Use string.match, string.find, and string.gsub effectively
- Build complex patterns for data extraction
- Implement text processing and validation systems
- Apply pattern matching to real-world parsing tasks
Pattern Basics
Lua patterns are simpler than regular expressions but very powerful for text processing. They use special characters to match different types of text.
-- Basic pattern matching concepts
local function demonstrate_basic_patterns()
local text = "Hello World 123! How are you today?"
print("=== Basic Pattern Matching ===")
print("Text:", text)
print()
-- Simple literal matching
local match = string.match(text, "World")
print("Match 'World':", match)
-- Case matters
match = string.match(text, "world")
print("Match 'world' (case sensitive):", match or "nil")
-- Find with position
local start_pos, end_pos = string.find(text, "World")
print("Position of 'World':", start_pos, end_pos)
-- Extract matched text
if start_pos then
local found = text:sub(start_pos, end_pos)
print("Extracted text:", found)
end
-- Multiple matches with string.gmatch
print("\nAll words (simple approach):")
for word in string.gmatch(text, "%w+") do
print(" " .. word)
end
-- Pattern with character classes
print("\nAll numbers:")
for number in string.gmatch(text, "%d+") do
print(" " .. number)
end
-- Pattern with captures
local greeting, name = string.match("Hello Alice", "(%w+) (%w+)")
print("\nCapture groups:")
print("Greeting:", greeting)
print("Name:", name)
end
demonstrate_basic_patterns()
Expected Output:
=== Basic Pattern Matching ===
Text: Hello World 123! How are you today?
Match 'World': World
Match 'world' (case sensitive): nil
Position of 'World': 7 12
Extracted text: World
All words (simple approach):
Hello
World
123
How
are
you
today
All numbers:
123
Capture groups:
Greeting: Hello
Name: Alice
Character Classes and Modifiers
Lua provides predefined character classes and modifiers for flexible pattern matching.
-- Character classes and modifiers
local function demonstrate_character_classes()
print("=== Character Classes and Modifiers ===")
local test_strings = {
"abc123XYZ",
"[email protected]",
"Phone: (555) 123-4567",
"Price: $19.99",
" whitespace ",
"Mixed123Symbols!@#"
}
-- Character class reference
local patterns = {
{"%a+", "Letters only"},
{"%d+", "Digits only"},
{"%w+", "Alphanumeric"},
{"%s+", "Whitespace"},
{"%p+", "Punctuation"},
{"%x+", "Hexadecimal digits"},
{"%c+", "Control characters"},
{"%l+", "Lowercase letters"},
{"%u+", "Uppercase letters"}
}
for _, test_string in ipairs(test_strings) do
print(string.format("\nTesting: '%s'", test_string))
for _, pattern_info in ipairs(patterns) do
local pattern, description = pattern_info[1], pattern_info[2]
local matches = {}
for match in string.gmatch(test_string, pattern) do
table.insert(matches, match)
end
if #matches > 0 then
print(string.format(" %-20s: %s", description, table.concat(matches, ", ")))
end
end
end
-- Modifiers demonstration
print("\n=== Pattern Modifiers ===")
local sample = "aaabbbcccdddeee"
print("Sample text:", sample)
print("'a+' (one or more):", string.match(sample, "a+"))
print("'a*' (zero or more):", string.match(sample, "xa*")) -- starts with x, then zero or more a
print("'a?' (zero or one):", string.match(sample, "ba?"))
print("'a-' (lazy/non-greedy):", string.match(sample, "a-b"))
-- Character sets
print("\n=== Custom Character Sets ===")
local email = "[email protected]"
-- Match email parts
local username, domain = string.match(email, "([%w%.%+%-]+)@([%w%.%-]+)")
print("Email:", email)
print("Username:", username)
print("Domain:", domain)
-- Negated character classes
local mixed = "abc123def456"
print("\nMixed string:", mixed)
print("Non-digits:", string.match(mixed, "[^%d]+"))
print("Non-letters:", string.match(mixed, "[^%a]+"))
end
demonstrate_character_classes()
Expected Output:
=== Character Classes and Modifiers ===
Testing: 'abc123XYZ'
Letters only : abc, XYZ
Digits only : 123
Alphanumeric : abc123XYZ
Lowercase letters : abc
Uppercase letters : XYZ
Testing: '[email protected]'
Letters only : user, example, com
Alphanumeric : user, example, com
Punctuation : @, .
Lowercase letters : user, example, com
...
=== Pattern Modifiers ===
Sample text: aaabbbcccdddeee
'a+' (one or more): aaa
'a*' (zero or more): nil
'a?' (zero or one): a
'a-' (lazy/non-greedy): a
=== Custom Character Sets ===
Email: [email protected]
Username: user.name+tag
Domain: domain.com
Mixed string: abc123def456
Non-digits: abc
Non-letters: 123
Advanced Pattern Techniques
Combine patterns with captures, anchors, and advanced techniques for complex text processing.
-- Advanced pattern techniques
local function demonstrate_advanced_patterns()
print("=== Advanced Pattern Techniques ===")
-- Anchored patterns
local urls = {
"https://www.example.com/page",
"Visit https://site.org for more",
"ftp://files.domain.net/path"
}
print("--- Anchored Patterns ---")
for _, url in ipairs(urls) do
-- Start anchor ^
local starts_https = string.match(url, "^https://")
print(string.format("'%s' starts with https: %s", url, starts_https and "yes" or "no"))
-- End anchor $
local ends_path = string.match(url, "/[%w]+$")
print(string.format("'%s' ends with path: %s", url, ends_path or "no"))
end
-- Balanced patterns
print("\n--- Balanced Patterns ---")
local code_snippets = {
"function test() print('hello') end",
"if (x > 0) then print(x) end",
"array[index[nested]]",
"unbalanced ( bracket"
}
for _, snippet in ipairs(code_snippets) do
-- Extract content between parentheses
local content = string.match(snippet, "%b()")
print(string.format("'%s' -> parentheses content: %s",
snippet, content or "none"))
-- Extract content between square brackets
content = string.match(snippet, "%b[]")
print(string.format("'%s' -> brackets content: %s",
snippet, content or "none"))
end
-- Multiple captures and numbered captures
print("\n--- Multiple Captures ---")
local log_entries = {
"2024-01-15 14:30:25 [INFO] User logged in: alice",
"2024-01-15 14:31:02 [ERROR] Database connection failed",
"2024-01-15 14:31:15 [WARN] High memory usage detected"
}
for _, entry in ipairs(log_entries) do
local date, time, level, message = string.match(entry,
"(%d+%-%d+%-%d+) (%d+:%d+:%d+) %[(%w+)%] (.+)")
if date then
print(string.format("Date: %s, Time: %s, Level: %s", date, time, level))
print(string.format("Message: %s", message))
print()
end
end
-- Frontier patterns
print("--- Frontier Patterns ---")
local text = "The price is $19.99 and tax is $2.50"
-- Find word boundaries
for word in string.gmatch(text, "%f[%a]%w+%f[%A]") do
print("Word:", word)
end
-- Find numbers with word boundaries
for number in string.gmatch(text, "%f[%d]%d+%.?%d*%f[%D]") do
print("Number:", number)
end
end
demonstrate_advanced_patterns()
Expected Output:
=== Advanced Pattern Techniques ===
--- Anchored Patterns ---
'https://www.example.com/page' starts with https: yes
'https://www.example.com/page' ends with path: /page
'Visit https://site.org for more' starts with https: no
'Visit https://site.org for more' ends with path: no
'ftp://files.domain.net/path' starts with https: no
'ftp://files.domain.net/path' ends with path: /path
--- Balanced Patterns ---
'function test() print('hello') end' -> parentheses content: ()
'if (x > 0) then print(x) end' -> parentheses content: (x > 0)
'array[index[nested]]' -> brackets content: [nested]
'unbalanced ( bracket' -> parentheses content: none
--- Multiple Captures ---
Date: 2024-01-15, Time: 14:30:25, Level: INFO
Message: User logged in: alice
Date: 2024-01-15, Time: 14:31:02, Level: ERROR
Message: Database connection failed
--- Frontier Patterns ---
Word: The
Word: price
Word: is
Word: and
Word: tax
Word: is
Number: 19.99
Number: 2.50
String Substitution with gsub
Use string.gsub for powerful text replacement and transformation operations.
-- String substitution with gsub
local function demonstrate_gsub()
print("=== String Substitution with gsub ===")
-- Basic substitution
local text = "Hello World, Welcome to the World of Lua"
local result = string.gsub(text, "World", "Universe")
print("Original:", text)
print("Replace 'World' with 'Universe':", result)
-- Limited replacements
result = string.gsub(text, "World", "Universe", 1) -- Replace only first occurrence
print("Replace first 'World' only:", result)
-- Pattern-based replacement
local mixed_case = "hELLo WoRLD"
result = string.gsub(mixed_case, "%u", string.lower) -- Convert uppercase to lowercase
print("\nOriginal:", mixed_case)
print("Uppercase to lowercase:", result)
-- Function-based replacement
local numbers = "The price is 15 dollars and 25 cents"
result = string.gsub(numbers, "%d+", function(num)
return tostring(tonumber(num) * 2)
end)
print("\nOriginal:", numbers)
print("Double all numbers:", result)
-- Capture-based replacement
local names = "Smith, John; Doe, Jane; Brown, Bob"
result = string.gsub(names, "(%w+), (%w+)", "%2 %1") -- Swap first and last names
print("\nOriginal:", names)
print("Swap names:", result)
-- Complex transformation
print("\n--- Complex Text Processing ---")
-- CSV processing
local csv_line = '"John Doe",25,"New York","Software Engineer"'
local fields = {}
-- Extract quoted fields
string.gsub(csv_line, '"([^"]*)"', function(field)
table.insert(fields, field)
return ""
end)
print("CSV line:", csv_line)
print("Extracted fields:")
for i, field in ipairs(fields) do
print(string.format(" Field %d: %s", i, field))
end
-- URL processing
local url = "https://api.example.com/users?name=john&age=25&city=newyork"
local base_url, params = string.match(url, "([^%?]+)%?(.+)")
print("\nURL:", url)
print("Base URL:", base_url)
print("Parameters:")
string.gsub(params, "([^&=]+)=([^&]*)", function(key, value)
print(string.format(" %s = %s", key, value))
end)
-- Template processing
print("\n--- Template Processing ---")
local template = "Hello {{name}}, you have {{count}} new messages in {{folder}}."
local variables = {
name = "Alice",
count = "5",
folder = "Inbox"
}
result = string.gsub(template, "{{(%w+)}}", function(var_name)
return variables[var_name] or ("{{" .. var_name .. "}}")
end)
print("Template:", template)
print("Result:", result)
end
demonstrate_gsub()
Expected Output:
=== String Substitution with gsub ===
Original: Hello World, Welcome to the World of Lua
Replace 'World' with 'Universe': Hello Universe, Welcome to the Universe of Lua
Replace first 'World' only: Hello Universe, Welcome to the World of Lua
Original: hELLo WoRLD
Uppercase to lowercase: hello world
Original: The price is 15 dollars and 25 cents
Double all numbers: The price is 30 dollars and 50 cents
Original: Smith, John; Doe, Jane; Brown, Bob
Swap names: John Smith; Jane Doe; Bob Brown
--- Complex Text Processing ---
CSV line: "John Doe",25,"New York","Software Engineer"
Extracted fields:
Field 1: John Doe
Field 2: 25
Field 3: New York
Field 4: Software Engineer
URL: https://api.example.com/users?name=john&age=25&city=newyork
Base URL: https://api.example.com/users
Parameters:
name = john
age = 25
city = newyork
--- Template Processing ---
Template: Hello {{name}}, you have {{count}} new messages in {{folder}}.
Result: Hello Alice, you have 5 new messages in Inbox.
Real-World Applications
Apply pattern matching to practical problems like data validation, parsing, and text processing.
-- Real-world pattern matching applications
local Validator = {}
-- Email validation
function Validator.email(email)
if not email or type(email) ~= "string" then
return false, "Email must be a string"
end
-- Simplified email pattern
local pattern = "^[%w%._%+-]+@[%w%.%-]+%.%a%a+$"
local is_valid = string.match(email, pattern) ~= nil
return is_valid, is_valid and "Valid email" or "Invalid email format"
end
-- Phone number validation (US format)
function Validator.phone(phone)
if not phone or type(phone) ~= "string" then
return false, "Phone must be a string"
end
-- Remove all non-digits
local digits = string.gsub(phone, "%D", "")
-- Check if it's 10 or 11 digits (with country code)
if #digits == 10 then
return true, "Valid phone number"
elseif #digits == 11 and digits:sub(1,1) == "1" then
return true, "Valid phone number with country code"
else
return false, "Phone number must be 10 or 11 digits"
end
end
-- Credit card validation (basic format check)
function Validator.credit_card(card_number)
if not card_number or type(card_number) ~= "string" then
return false, "Card number must be a string"
end
-- Remove spaces and dashes
local clean_number = string.gsub(card_number, "[%s%-]", "")
-- Check if all digits and proper length
if not string.match(clean_number, "^%d+$") then
return false, "Card number must contain only digits"
end
local length = #clean_number
if length < 13 or length > 19 then
return false, "Card number must be 13-19 digits"
end
-- Identify card type by pattern
local card_type = "Unknown"
if string.match(clean_number, "^4") then
card_type = "Visa"
elseif string.match(clean_number, "^5[1-5]") then
card_type = "MasterCard"
elseif string.match(clean_number, "^3[47]") then
card_type = "American Express"
end
return true, string.format("Valid %s card number", card_type)
end
-- Log parser
local LogParser = {}
function LogParser.parse_apache_log(log_line)
-- Apache Common Log Format pattern
local pattern = '([%d%.]+) %- %- %[([^%]]+)%] "(%w+) ([^"]+) HTTP/[%d%.]+" (%d+) (%d+)'
local ip, timestamp, method, path, status, size = string.match(log_line, pattern)
if not ip then
return nil, "Invalid log format"
end
return {
ip_address = ip,
timestamp = timestamp,
method = method,
path = path,
status_code = tonumber(status),
response_size = tonumber(size)
}
end
function LogParser.parse_error_log(log_line)
-- Error log pattern: [timestamp] [level] message
local pattern = '%[([^%]]+)%] %[(%w+)%] (.+)'
local timestamp, level, message = string.match(log_line, pattern)
if not timestamp then
return nil, "Invalid error log format"
end
return {
timestamp = timestamp,
level = level,
message = message
}
end
-- Configuration parser
local ConfigParser = {}
function ConfigParser.parse_ini_file(content)
local config = {}
local current_section = "default"
config[current_section] = {}
for line in string.gmatch(content, "[^\r\n]+") do
-- Remove leading/trailing whitespace
line = string.match(line, "^%s*(.-)%s*$")
-- Skip empty lines and comments
if line ~= "" and not string.match(line, "^[#;]") then
-- Section header
local section = string.match(line, "^%[(.+)%]$")
if section then
current_section = section
config[current_section] = {}
else
-- Key-value pair
local key, value = string.match(line, "^([^=]+)=(.*)$")
if key and value then
key = string.match(key, "^%s*(.-)%s*$") -- trim key
value = string.match(value, "^%s*(.-)%s*$") -- trim value
-- Handle quoted values
local quoted_value = string.match(value, '^"(.*)"$')
if quoted_value then
value = quoted_value
end
config[current_section][key] = value
end
end
end
end
return config
end
-- Demo real-world applications
local function demo_real_world_applications()
print("=== Real-World Pattern Matching Applications ===")
-- Validation demos
print("--- Data Validation ---")
local test_emails = {
"[email protected]",
"invalid.email",
"[email protected]",
"@invalid.com",
"user@"
}
for _, email in ipairs(test_emails) do
local is_valid, message = Validator.email(email)
print(string.format("Email '%s': %s", email, message))
end
print("\n--- Phone Validation ---")
local test_phones = {
"(555) 123-4567",
"555-123-4567",
"15551234567",
"555.123.4567",
"123-45"
}
for _, phone in ipairs(test_phones) do
local is_valid, message = Validator.phone(phone)
print(string.format("Phone '%s': %s", phone, message))
end
-- Log parsing demo
print("\n--- Log Parsing ---")
local apache_log = '192.168.1.100 - - [15/Jan/2024:14:30:25 +0000] "GET /index.html HTTP/1.1" 200 1234'
local log_entry, error_msg = LogParser.parse_apache_log(apache_log)
if log_entry then
print("Parsed Apache log:")
for key, value in pairs(log_entry) do
print(string.format(" %s: %s", key, value))
end
else
print("Log parsing failed:", error_msg)
end
-- Configuration parsing demo
print("\n--- Configuration Parsing ---")
local ini_content = [[
# Database configuration
[database]
host = localhost
port = 5432
name = "my app database"
user = dbuser
[web]
port = 8080
debug = true
]]
local config = ConfigParser.parse_ini_file(ini_content)
print("Parsed configuration:")
for section, values in pairs(config) do
print(string.format("[%s]", section))
for key, value in pairs(values) do
print(string.format(" %s = %s", key, value))
end
end
end
demo_real_world_applications()
Expected Output:
=== Real-World Pattern Matching Applications ===
--- Data Validation ---
Email '[email protected]': Valid email
Email 'invalid.email': Invalid email format
Email '[email protected]': Valid email
Email '@invalid.com': Invalid email format
Email 'user@': Invalid email format
--- Phone Validation ---
Phone '(555) 123-4567': Valid phone number
Phone '555-123-4567': Valid phone number
Phone '15551234567': Valid phone number with country code
Phone '555.123.4567': Valid phone number
Phone '123-45': Phone number must be 10 or 11 digits
--- Log Parsing ---
Parsed Apache log:
ip_address: 192.168.1.100
timestamp: 15/Jan/2024:14:30:25 +0000
method: GET
path: /index.html
status_code: 200
response_size: 1234
--- Configuration Parsing ---
Parsed configuration:
[database]
host = localhost
port = 5432
name = my app database
user = dbuser
[web]
port = 8080
debug = true
Common Pitfalls
Pattern Matching Best Practices
- Escape special characters: Use
%
to escape pattern special characters in literal text - Anchor patterns: Use
^
and$
for start/end matching to avoid partial matches - Greedy vs lazy: Understand the difference between
+
(greedy) and-
(lazy) modifiers - Character classes: Use appropriate character classes instead of overly broad patterns
- Performance: Complex patterns can be slow on large texts; consider alternatives for heavy processing
Checks for Understanding
- What's the difference between
%w
and[%w]
? - How do you match a literal
%
character in a pattern? - What does the frontier pattern
%f[%a]
match? - When should you use captures in string.gsub?
Show answers
- No difference - both match alphanumeric characters
- Use
%%
to match a literal percent sign - A position between non-letter and letter characters (word boundary)
- When you need to reference matched parts in the replacement string
Exercises
- Create a markdown parser using patterns
- Build a SQL query validator with pattern matching
- Implement a template engine with advanced substitution