JavaScript - RegExp
Overview
Estimated time: 40–50 minutes
Regular expressions (RegExp) let you search, extract, and transform text with patterns. Learn literals vs constructor, flags, capturing groups, lookarounds, and Unicode-safe matching.
Learning Objectives
- Create regex patterns with literals and the
RegExp
constructor. - Use
test
,exec
,match
, andmatchAll
effectively. - Capture with numbered and named groups and use them in
replace
. - Apply anchors, boundaries, and lookaheads/lookbehinds for precise matches.
- Work with Unicode:
u
flag, property escapes, and pitfalls.
Prerequisites
- JavaScript - Strings
- JavaScript - Arrays (recommended)
Creating regexes and flags
// Literal syntax
const rx1 = /cat/; // simple pattern
const rx2 = /c.at/i; // dot matches any char; i = case-insensitive
// Constructor (useful for dynamic patterns; note escaping of backslashes)
const word = "hello";
const rx3 = new RegExp(`^${word}\\d+$`, 'gm');
// Common flags:
// g - global (find all matches)
// i - ignore case
// m - multiline (^ and $ match per line)
// s - dotAll (dot matches newlines)
// u - unicode (enables code points, \p{...}, better escapes)
// y - sticky (match at lastIndex only)
Testing and matching
const str = 'Cat catalog concatenation';
/cat/i.test(str); // true
// match vs matchAll
const res1 = str.match(/cat/gi); // ['Cat','cat']
// matchAll returns an iterator with groups; spread to array
const matches = [...str.matchAll(/c(at)/gi)];
// each item: ["Cat", "at", index, input, groups]
exec loops and lastIndex
// exec with global or sticky keeps state via lastIndex
const rx = /a./g;
const s = 'a1 a2 a3';
let m;
while ((m = rx.exec(s)) !== null) {
console.log(m[0], 'at', m.index);
}
// Beware: reusing a global regex across different strings can lead to surprises due to lastIndex.
// Prefer creating a fresh regex or reset lastIndex = 0.
Capturing groups and replace
// Reorder YYYY-MM-DD to DD/MM/YYYY
'2025-09-05'.replace(/(\d{4})-(\d{2})-(\d{2})/, '$3/$2/$1'); // '05/09/2025'
// Named groups (modern engines)
const m2 = '2025-09-05'.match(/(?<y\d>\d{4})-(?<m\d>\d{2})-(?<d\d>\d{2})/);
// Access by name in replace
'2025-09-05'.replace(/(?<y\d>\d{4})-(?<m\d>\d{2})-(?<d\d>\d{2})/, '$<d\d>/$<m\d>/$<y\d>');
// Replace with a function for flexible transformations
'foo-12 bar-34'.replace(/(\w+)-(\d+)/g, (m, name, num) => `${name}:${Number(num)*2}`);
// 'foo:24 bar:68'
Anchors, boundaries, and lookarounds
// Anchors and boundaries
/^\w+$/m.test('hello'); // start ^ and end $ (multiline aware)
/\bcat\b/i.test('a cat!'); // word boundary; not true for 'concatenate'
// Lookaheads and lookbehinds
const s2 = 'Item: A-12, B-07';
// Match code letters followed by hyphen and digits (but only capture letters)
const ahead = /[A-Z]+(?=-\d+)/g; // positive lookahead
[...s2.matchAll(ahead)].map(m => m[0]); // ['A','B']
// Extract digits preceded by letters and hyphen (lookbehind)
const behind = /(?<=[A-Z]+-)\d+/g; // positive lookbehind
[...s2.matchAll(behind)].map(m => m[0]); // ['12','07']
Unicode and property escapes
// Use the 'u' flag for full code point support and property escapes
const emoji = 'A😀B';
/.{2}/.test(emoji); // true (but splits the emoji surrogate pair!)
/.{2}/u.test(emoji); // false (correct count by code point)
// Unicode properties (requires 'u')
const words = 'über España 東京 123';
const rxWords = /\p{L}+/gu; // one or more letters from any script
[...words.matchAll(rxWords)].map(m => m[0]);
// ['über','España','東京']
Common Pitfalls
- Escaping: When building patterns with
RegExp
, you must double-escape backslashes (e.g.,"\\d"
). - Global state:
/g
and/y
modifylastIndex
. Don’t reuse a stateful regex across unrelated strings. - ASCII classes:
\w
,\d
,\b
are ASCII-centric. For international text useu
flag and Unicode properties. - Serialization: Regexes aren’t JSON-serializable. Store the pattern and flags separately if needed.
- Lookbehind support: Older environments may lack lookbehind; feature-detect or provide fallbacks.
Try it
Run to extract codes using groups and lookarounds: