JavaScript - RegExp

Overview

Estimated time: 40–50 minutes

Regular expressions (RegExp) let you search, extract, and transform text with patterns. Learn literals vs constructor, flags, capturing groups, lookarounds, and Unicode-safe matching.

Learning Objectives

  • Create regex patterns with literals and the RegExp constructor.
  • Use test, exec, match, and matchAll effectively.
  • Capture with numbered and named groups and use them in replace.
  • Apply anchors, boundaries, and lookaheads/lookbehinds for precise matches.
  • Work with Unicode: u flag, property escapes, and pitfalls.

Prerequisites

Creating regexes and flags

// Literal syntax
const rx1 = /cat/;              // simple pattern
const rx2 = /c.at/i;            // dot matches any char; i = case-insensitive

// Constructor (useful for dynamic patterns; note escaping of backslashes)
const word = "hello";
const rx3 = new RegExp(`^${word}\\d+$`, 'gm');

// Common flags:
// g - global (find all matches)
// i - ignore case
// m - multiline (^ and $ match per line)
// s - dotAll (dot matches newlines)
// u - unicode (enables code points, \p{...}, better escapes)
// y - sticky (match at lastIndex only)

Testing and matching

const str = 'Cat catalog concatenation';
/cat/i.test(str);          // true

// match vs matchAll
const res1 = str.match(/cat/gi);     // ['Cat','cat']

// matchAll returns an iterator with groups; spread to array
const matches = [...str.matchAll(/c(at)/gi)];
// each item: ["Cat", "at", index, input, groups]

exec loops and lastIndex

// exec with global or sticky keeps state via lastIndex
const rx = /a./g;
const s = 'a1 a2 a3';
let m;
while ((m = rx.exec(s)) !== null) {
  console.log(m[0], 'at', m.index);
}
// Beware: reusing a global regex across different strings can lead to surprises due to lastIndex.
// Prefer creating a fresh regex or reset lastIndex = 0.

Capturing groups and replace

// Reorder YYYY-MM-DD to DD/MM/YYYY
'2025-09-05'.replace(/(\d{4})-(\d{2})-(\d{2})/, '$3/$2/$1'); // '05/09/2025'

// Named groups (modern engines)
const m2 = '2025-09-05'.match(/(?<y\d>\d{4})-(?<m\d>\d{2})-(?<d\d>\d{2})/);
// Access by name in replace
'2025-09-05'.replace(/(?<y\d>\d{4})-(?<m\d>\d{2})-(?<d\d>\d{2})/, '$<d\d>/$<m\d>/$<y\d>');

// Replace with a function for flexible transformations
'foo-12 bar-34'.replace(/(\w+)-(\d+)/g, (m, name, num) => `${name}:${Number(num)*2}`);
// 'foo:24 bar:68'

Anchors, boundaries, and lookarounds

// Anchors and boundaries
/^\w+$/m.test('hello');      // start ^ and end $ (multiline aware)
/\bcat\b/i.test('a cat!');  // word boundary; not true for 'concatenate'

// Lookaheads and lookbehinds
const s2 = 'Item: A-12, B-07';
// Match code letters followed by hyphen and digits (but only capture letters)
const ahead = /[A-Z]+(?=-\d+)/g; // positive lookahead
[...s2.matchAll(ahead)].map(m => m[0]); // ['A','B']

// Extract digits preceded by letters and hyphen (lookbehind)
const behind = /(?<=[A-Z]+-)\d+/g;  // positive lookbehind
[...s2.matchAll(behind)].map(m => m[0]); // ['12','07']

Unicode and property escapes

// Use the 'u' flag for full code point support and property escapes
const emoji = 'A😀B';
/.{2}/.test(emoji);          // true (but splits the emoji surrogate pair!)
/.{2}/u.test(emoji);         // false (correct count by code point)

// Unicode properties (requires 'u')
const words = 'über España 東京 123';
const rxWords = /\p{L}+/gu;   // one or more letters from any script
[...words.matchAll(rxWords)].map(m => m[0]);
// ['über','España','東京']

Common Pitfalls

  • Escaping: When building patterns with RegExp, you must double-escape backslashes (e.g., "\\d").
  • Global state: /g and /y modify lastIndex. Don’t reuse a stateful regex across unrelated strings.
  • ASCII classes: \w, \d, \b are ASCII-centric. For international text use u flag and Unicode properties.
  • Serialization: Regexes aren’t JSON-serializable. Store the pattern and flags separately if needed.
  • Lookbehind support: Older environments may lack lookbehind; feature-detect or provide fallbacks.

Try it

Run to extract codes using groups and lookarounds: