Lua - Performance & Optimization
Overview
Estimated time: 35–40 minutes
Writing efficient Lua code requires understanding how Lua works internally and applying optimization techniques. This tutorial covers performance analysis, optimization strategies, profiling tools, and the LuaJIT compiler for maximum performance.
Learning Objectives
- Understand Lua's performance characteristics and bottlenecks
- Apply optimization techniques for tables, functions, and loops
- Use profiling tools to identify performance issues
- Leverage LuaJIT for high-performance applications
- Implement best practices for memory and CPU efficiency
Prerequisites
- Strong understanding of Lua tables, functions, and control flow
- Knowledge of Lua's garbage collection concepts
- Basic understanding of algorithm complexity
Performance Fundamentals
Understanding Lua's execution model is key to optimization:
-- Local vs Global Variable Access
local start_time = os.clock()
-- Slow: Global variable access
function slow_global_access()
for i = 1, 1000000 do
math.sin(i) -- Global lookup for 'math' every time
end
end
-- Fast: Local variable access
function fast_local_access()
local sin = math.sin -- Cache the function locally
for i = 1, 1000000 do
sin(i) -- Direct local access
end
end
-- Timing test
local function time_function(func, name)
local start = os.clock()
func()
local duration = os.clock() - start
print(string.format("%s took %.4f seconds", name, duration))
end
time_function(slow_global_access, "Global access")
time_function(fast_local_access, "Local access")
Expected Output:
Global access took 0.1234 seconds
Local access took 0.0678 seconds
Table Optimization
Tables are fundamental to Lua performance:
Array vs Hash Performance
-- Array part vs hash part performance
local function test_array_vs_hash()
local iterations = 1000000
-- Array part (sequential integer keys starting from 1)
local array = {}
local start_time = os.clock()
for i = 1, iterations do
array[i] = i * 2
end
local array_time = os.clock() - start_time
-- Hash part (non-sequential or non-integer keys)
local hash = {}
start_time = os.clock()
for i = 1, iterations do
hash["key" .. i] = i * 2
end
local hash_time = os.clock() - start_time
print(string.format("Array insertion: %.4f seconds", array_time))
print(string.format("Hash insertion: %.4f seconds", hash_time))
print(string.format("Array is %.2fx faster", hash_time / array_time))
end
test_array_vs_hash()
Table Preallocation
-- Efficient table preallocation
local function compare_table_growth()
local size = 100000
-- Growing table (inefficient)
local start_time = os.clock()
local growing_table = {}
for i = 1, size do
growing_table[i] = i
end
local grow_time = os.clock() - start_time
-- Preallocated table (efficient)
start_time = os.clock()
local preallocated = {table.unpack({}, 1, size)} -- Preallocate
for i = 1, size do
preallocated[i] = i
end
local prealloc_time = os.clock() - start_time
print(string.format("Growing table: %.4f seconds", grow_time))
print(string.format("Preallocated: %.4f seconds", prealloc_time))
-- Better preallocation method
start_time = os.clock()
local better_table = {}
-- Set array size hint
for i = 1, size do
better_table[i] = nil
end
for i = 1, size do
better_table[i] = i
end
local better_time = os.clock() - start_time
print(string.format("Better prealloc: %.4f seconds", better_time))
end
compare_table_growth()
Function Call Optimization
Minimize function call overhead:
-- Function call overhead
local function compare_function_calls()
local iterations = 1000000
-- Regular function calls
local function add(a, b)
return a + b
end
local start_time = os.clock()
local result = 0
for i = 1, iterations do
result = add(result, i)
end
local func_time = os.clock() - start_time
-- Inlined operations
start_time = os.clock()
result = 0
for i = 1, iterations do
result = result + i -- Inlined
end
local inline_time = os.clock() - start_time
print(string.format("Function calls: %.4f seconds", func_time))
print(string.format("Inlined code: %.4f seconds", inline_time))
print(string.format("Inline is %.2fx faster", func_time / inline_time))
end
compare_function_calls()
-- Avoiding closures in loops
local function closure_performance()
local iterations = 100000
local functions = {}
-- Inefficient: Creating closures in loop
local start_time = os.clock()
for i = 1, iterations do
functions[i] = function() return i * 2 end
end
local closure_time = os.clock() - start_time
-- Efficient: Reuse function with parameter
local function multiplier(x)
return x * 2
end
start_time = os.clock()
local values = {}
for i = 1, iterations do
values[i] = multiplier(i)
end
local reuse_time = os.clock() - start_time
print(string.format("Closures: %.4f seconds", closure_time))
print(string.format("Reused function: %.4f seconds", reuse_time))
end
closure_performance()
Loop Optimization
Optimize common loop patterns:
-- Loop optimization techniques
local function loop_optimizations()
local data = {}
for i = 1, 10000 do
data[i] = i
end
-- Inefficient: Length calculation in loop
local start_time = os.clock()
local sum = 0
for i = 1, #data do -- #data calculated each iteration
sum = sum + data[i]
end
local slow_time = os.clock() - start_time
-- Efficient: Cache length
start_time = os.clock()
sum = 0
local n = #data -- Calculate once
for i = 1, n do
sum = sum + data[i]
end
local fast_time = os.clock() - start_time
-- Most efficient: Use ipairs for arrays
start_time = os.clock()
sum = 0
for i, value in ipairs(data) do
sum = sum + value
end
local ipairs_time = os.clock() - start_time
print(string.format("Length in loop: %.6f seconds", slow_time))
print(string.format("Cached length: %.6f seconds", fast_time))
print(string.format("Using ipairs: %.6f seconds", ipairs_time))
end
loop_optimizations()
-- String concatenation optimization
local function string_concat_optimization()
local pieces = {}
for i = 1, 1000 do
pieces[i] = "part" .. i
end
-- Inefficient: String concatenation in loop
local start_time = os.clock()
local result = ""
for i = 1, #pieces do
result = result .. pieces[i] -- Creates new string each time
end
local concat_time = os.clock() - start_time
-- Efficient: table.concat
start_time = os.clock()
result = table.concat(pieces) -- Single operation
local table_concat_time = os.clock() - start_time
print(string.format("String concat: %.6f seconds", concat_time))
print(string.format("table.concat: %.6f seconds", table_concat_time))
print(string.format("table.concat is %.0fx faster",
concat_time / table_concat_time))
end
string_concat_optimization()
Memory Optimization
Manage memory efficiently:
-- Memory usage optimization
local function memory_optimization_demo()
-- Inefficient: Storing unnecessary data
local inefficient_data = {}
for i = 1, 1000 do
inefficient_data[i] = {
id = i,
name = "Item " .. i,
description = "This is item number " .. i,
timestamp = os.time(),
metadata = {
created_by = "system",
version = 1.0,
tags = {"tag1", "tag2", "tag3"}
}
}
end
-- Efficient: Store only necessary data
local efficient_data = {}
for i = 1, 1000 do
efficient_data[i] = {
i, -- id (position 1)
"Item " .. i, -- name (position 2)
os.time() -- timestamp (position 3)
-- Store metadata separately if needed
}
end
-- Even more efficient: Use string interning for repeated values
local cached_strings = {}
local function intern_string(str)
if not cached_strings[str] then
cached_strings[str] = str
end
return cached_strings[str]
end
local interned_data = {}
for i = 1, 1000 do
interned_data[i] = {
i,
intern_string("Item " .. (i % 10)), -- Reuse similar strings
os.time()
}
end
print("Memory optimization examples created")
print("Inefficient data uses more memory per record")
print("Interned strings reduce memory for repeated values")
end
memory_optimization_demo()
-- Garbage collection hints
local function gc_optimization()
print("Before optimization:", collectgarbage("count"), "KB")
-- Create some temporary data
local temp_data = {}
for i = 1, 100000 do
temp_data[i] = "temporary data " .. i
end
print("After creating data:", collectgarbage("count"), "KB")
-- Clear references
temp_data = nil
-- Suggest garbage collection (don't force it frequently)
collectgarbage("collect")
print("After cleanup:", collectgarbage("count"), "KB")
end
gc_optimization()
Profiling and Measurement
Tools and techniques for performance analysis:
-- Simple profiler
local Profiler = {}
Profiler.__index = Profiler
function Profiler:new()
local obj = {
times = {},
counts = {},
start_times = {}
}
setmetatable(obj, self)
return obj
end
function Profiler:start(name)
self.start_times[name] = os.clock()
end
function Profiler:stop(name)
if not self.start_times[name] then
error("Profiler: No start time for " .. name)
end
local duration = os.clock() - self.start_times[name]
self.times[name] = (self.times[name] or 0) + duration
self.counts[name] = (self.counts[name] or 0) + 1
self.start_times[name] = nil
end
function Profiler:report()
print("\n=== Profiler Report ===")
local sorted_names = {}
for name in pairs(self.times) do
table.insert(sorted_names, name)
end
table.sort(sorted_names, function(a, b)
return self.times[a] > self.times[b]
end)
for _, name in ipairs(sorted_names) do
local total_time = self.times[name]
local count = self.counts[name]
local avg_time = total_time / count
print(string.format("%-20s: %8.4fs total, %6d calls, %8.6fs avg",
name, total_time, count, avg_time))
end
print("========================\n")
end
-- Usage example
local profiler = Profiler:new()
local function expensive_operation(n)
profiler:start("expensive_operation")
local result = 0
for i = 1, n do
result = result + math.sin(i) * math.cos(i)
end
profiler:stop("expensive_operation")
return result
end
local function fast_operation(n)
profiler:start("fast_operation")
local result = n * (n + 1) / 2 -- Simple arithmetic
profiler:stop("fast_operation")
return result
end
-- Profile different operations
for i = 1, 5 do
expensive_operation(10000)
fast_operation(10000)
end
profiler:report()
LuaJIT Optimization
Leverage LuaJIT's just-in-time compilation:
-- LuaJIT-specific optimizations
local ffi = require("ffi") -- Only available in LuaJIT
-- Define C structure for better performance
ffi.cdef[[
typedef struct {
double x, y, z;
} point3d_t;
]]
local function luajit_optimization_demo()
-- Regular Lua table approach
local function create_points_lua(n)
local points = {}
for i = 1, n do
points[i] = {x = i, y = i * 2, z = i * 3}
end
return points
end
-- LuaJIT FFI approach (much faster)
local function create_points_ffi(n)
local points = ffi.new("point3d_t[?]", n)
for i = 0, n - 1 do
points[i].x = i + 1
points[i].y = (i + 1) * 2
points[i].z = (i + 1) * 3
end
return points
end
-- Vector operations - LuaJIT optimizable
local function vector_operations(points, n)
local sum = 0
for i = 1, n do
local p = points[i]
sum = sum + p.x * p.x + p.y * p.y + p.z * p.z
end
return sum
end
local function vector_operations_ffi(points, n)
local sum = 0
for i = 0, n - 1 do
local p = points[i]
sum = sum + p.x * p.x + p.y * p.y + p.z * p.z
end
return sum
end
local n = 100000
-- Benchmark Lua tables
local start_time = os.clock()
local lua_points = create_points_lua(n)
local lua_result = vector_operations(lua_points, n)
local lua_time = os.clock() - start_time
-- Benchmark FFI structures (LuaJIT only)
if ffi then
start_time = os.clock()
local ffi_points = create_points_ffi(n)
local ffi_result = vector_operations_ffi(ffi_points, n)
local ffi_time = os.clock() - start_time
print(string.format("Lua tables: %.4f seconds", lua_time))
print(string.format("FFI structs: %.4f seconds", ffi_time))
print(string.format("FFI is %.2fx faster", lua_time / ffi_time))
end
end
-- Only run LuaJIT demo if FFI is available
if pcall(require, "ffi") then
luajit_optimization_demo()
else
print("FFI not available (not running LuaJIT)")
end
Best Practices Summary
Do's
- Cache frequently accessed globals: Store math.sin, table.insert in locals
- Use local variables: Much faster than global access
- Preallocate tables: When you know the approximate size
- Use table.concat: For string building instead of .. operator
- Profile your code: Measure before optimizing
- Use ipairs for arrays: Faster than numeric for loops for sequential data
- Minimize function call overhead: Inline simple operations when performance critical
Don'ts
- Don't premature optimize: Profile first, optimize bottlenecks
- Don't call collectgarbage() frequently: Let Lua manage memory
- Don't create functions in tight loops: Reuse functions when possible
- Don't use string concatenation in loops: Use table.concat instead
- Don't ignore table array/hash distinction: Sequential integer keys are faster
Performance Testing Framework
-- Micro-benchmark framework
local function benchmark(name, func, iterations)
iterations = iterations or 1000000
-- Warm up
for i = 1, 100 do
func()
end
-- Measure
local start_time = os.clock()
for i = 1, iterations do
func()
end
local duration = os.clock() - start_time
print(string.format("%-25s: %.6f seconds (%d iterations)",
name, duration, iterations))
return duration
end
-- Example usage
local data = {}
for i = 1, 1000 do
data[i] = i
end
benchmark("table access", function()
local sum = 0
for i = 1, #data do
sum = sum + data[i]
end
end)
benchmark("ipairs iteration", function()
local sum = 0
for i, v in ipairs(data) do
sum = sum + v
end
end)
Common Pitfalls
- Optimizing before profiling - measure first, optimize the bottlenecks
- Over-optimizing non-critical code - focus on hot paths
- Ignoring memory allocation patterns - preallocate when possible
- Using wrong table access patterns - arrays vs hash tables
- Not considering LuaJIT-specific optimizations if using LuaJIT
- Micro-optimizations that hurt readability without significant gain
Checks for Understanding
- Why are local variables faster than global variables in Lua?
- What's the difference between array part and hash part of a Lua table?
- When should you use table.concat instead of string concatenation?
- What are the benefits of preallocating tables?
- How does LuaJIT improve performance compared to standard Lua?
- What's the first step before optimizing any code?
Show answers
- Local variables are stored in registers/stack, while globals require hash table lookup in the global environment.
- Array part uses direct indexing (faster) for sequential integer keys starting from 1. Hash part uses hash table lookup for other keys.
- When building strings in loops or concatenating many strings - table.concat is O(n) while repeated .. is O(n²).
- Avoids table resizing overhead during growth, reduces memory fragmentation, and improves cache locality.
- LuaJIT uses just-in-time compilation to machine code, FFI for C struct access, and specialized optimizations for number-heavy code.
- Profile the code to identify actual bottlenecks - don't optimize based on assumptions.
Next Steps
Performance optimization is an iterative process. Start with profiling to identify bottlenecks, apply appropriate optimizations, and measure results. Remember that readable, maintainable code is often more valuable than micro-optimizations. Next, learn debugging techniques to troubleshoot performance and correctness issues.