Groups and capturing¶

In this tutorial, you will learn how to use parentheses to group parts of a pattern and extract specific portions of matched text. Groups are one of the most useful features in regular expressions, allowing you to organise patterns and pull out the exact data you need.

Time commitment: 15–20 minutes

Prerequisites:

Completion of Character classes and quantifiers
Basic Python knowledge (strings, variables, and functions)

Learning objectives¶

By the end of this tutorial, you will be able to:

Use parentheses to create capturing groups
Extract matched text with .group(), .groups(), and .groupdict()
Use named groups with (?P<name>...)
Apply non-capturing groups with (?:...)
Use the alternation operator | with groups

In [ ]:

Copied!

import re
import re

Capturing groups¶

Capturing groups are created by enclosing part of a pattern in parentheses (...). They serve two purposes:

They group elements together so that quantifiers apply to the whole group
They capture the matched text so you can extract it later

Let us start with a simple example — extracting the day, month, and year from a date.

In [ ]:

Copied!





text = 'The event is on 25/12/2026'
match = re.search(r'(\d{2})/(\d{2})/(\d{4})', text)

if match:
    print(f'Full match: {match.group()}')
    print(f'Day:   {match.group(1)}')
    print(f'Month: {match.group(2)}')
    print(f'Year:  {match.group(3)}')
text = 'The event is on 25/12/2026'
match = re.search(r'(\d{2})/(\d{2})/(\d{4})', text)

if match:
    print(f'Full match: {match.group()}')
    print(f'Day:   {match.group(1)}')
    print(f'Month: {match.group(2)}')
    print(f'Year:  {match.group(3)}')

Each pair of parentheses creates a numbered group, starting from 1. The .group(0) (or simply .group()) returns the entire match, whilst .group(1), .group(2), and so on return each captured group.

You can also use .groups() to get all captured groups as a tuple.

In [ ]:

Copied!





text = 'The event is on 25/12/2026'
match = re.search(r'(\d{2})/(\d{2})/(\d{4})', text)

if match:
    day, month, year = match.groups()
    print(f'Date: {day}/{month}/{year}')
    print(f'All groups: {match.groups()}')
text = 'The event is on 25/12/2026'
match = re.search(r'(\d{2})/(\d{2})/(\d{4})', text)

if match:
    day, month, year = match.groups()
    print(f'Date: {day}/{month}/{year}')
    print(f'All groups: {match.groups()}')

Named groups¶

Numbered groups can be hard to read, especially with many groups. Named groups solve this by letting you assign a name to each group using the syntax (?P<name>...).

In [ ]:

Copied!





text = 'The event is on 25/12/2026'
match = re.search(r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})', text)

if match:
    print(f'Day:   {match.group("day")}')
    print(f'Month: {match.group("month")}')
    print(f'Year:  {match.group("year")}')
text = 'The event is on 25/12/2026'
match = re.search(r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})', text)

if match:
    print(f'Day:   {match.group("day")}')
    print(f'Month: {match.group("month")}')
    print(f'Year:  {match.group("year")}')

Named groups are especially useful when working with complex patterns. You can also access them through .groupdict(), which returns a dictionary of all named groups.

In [ ]:

Copied!





text = 'Contact: Alice Smith, alice@example.com'
pattern = re.compile(
    r'(?P<name>[A-Z][a-z]+ [A-Z][a-z]+), (?P<email>[\w.]+@[\w.]+)'
)
match = pattern.search(text)

if match:
    details = match.groupdict()
    print(f'Group dictionary: {details}')
    print(f'Name:  {details["name"]}')
    print(f'Email: {details["email"]}')
text = 'Contact: Alice Smith, alice@example.com'
pattern = re.compile(
    r'(?P<name>[A-Z][a-z]+ [A-Z][a-z]+), (?P<email>[\w.]+@[\w.]+)'
)
match = pattern.search(text)

if match:
    details = match.groupdict()
    print(f'Group dictionary: {details}')
    print(f'Name:  {details["name"]}')
    print(f'Email: {details["email"]}')

Non-capturing groups¶

Sometimes you need parentheses for grouping (for example, to apply a quantifier to a group of characters) but you do not need to capture the matched text. Use (?:...) to create a non-capturing group.

In [ ]:

Copied!





# Capturing group: the prefix is captured
text = 'http://example.com'
match = re.search(r'(https?)://(.+)', text)
if match:
    print(f'Groups with capturing: {match.groups()}')

# Non-capturing group: the prefix is not captured
match = re.search(r'(?:https?)://(.+)', text)
if match:
    print(f'Groups with non-capturing: {match.groups()}')
# Capturing group: the prefix is captured
text = 'http://example.com'
match = re.search(r'(https?)://(.+)', text)
if match:
    print(f'Groups with capturing: {match.groups()}')

# Non-capturing group: the prefix is not captured
match = re.search(r'(?:https?)://(.+)', text)
if match:
    print(f'Groups with non-capturing: {match.groups()}')

In the first example, match.groups() returns two items (the protocol and the domain). In the second example, only the domain is captured because the protocol group uses (?:...). Non-capturing groups are useful for keeping your group numbering clean.

Alternation with groups¶

The alternation operator | works like a logical OR. When combined with groups, it lets you match one of several alternatives.

In [ ]:

Copied!





# Match different file extensions
filenames = ['report.pdf', 'image.png', 'data.csv', 'script.py', 'notes.txt']

pattern = re.compile(r'\w+\.(?:pdf|png|csv)')

for name in filenames:
    if pattern.fullmatch(name):
        print(f'"{name}" → matched')
    else:
        print(f'"{name}" → not matched')
# Match different file extensions
filenames = ['report.pdf', 'image.png', 'data.csv', 'script.py', 'notes.txt']

pattern = re.compile(r'\w+\.(?:pdf|png|csv)')

for name in filenames:
    if pattern.fullmatch(name):
        print(f'"{name}" → matched')
    else:
        print(f'"{name}" → not matched')

In [ ]:

Copied!





# Use a capturing group to also extract the extension
pattern = re.compile(r'(\w+)\.(pdf|png|csv)')

for name in filenames:
    match = pattern.fullmatch(name)
    if match:
        print(f'"{name}" → base: "{match.group(1)}", extension: "{match.group(2)}"')
# Use a capturing group to also extract the extension
pattern = re.compile(r'(\w+)\.(pdf|png|csv)')

for name in filenames:
    match = pattern.fullmatch(name)
    if match:
        print(f'"{name}" → base: "{match.group(1)}", extension: "{match.group(2)}"')

Groups with quantifiers¶

You can apply quantifiers to groups to repeat the entire group pattern.

In [ ]:

Copied!





# Match an IP address (simplified)
text = 'Server IP: 192.168.1.100'
match = re.search(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})', text)

if match:
    print(f'Full IP: {match.group()}')
    print(f'Octets: {match.groups()}')
# Match an IP address (simplified)
text = 'Server IP: 192.168.1.100'
match = re.search(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})', text)

if match:
    print(f'Full IP: {match.group()}')
    print(f'Octets: {match.groups()}')

In [ ]:

Copied!





# Use a non-capturing group with a quantifier to match repeated patterns
# Match a sequence like "abc-abc-abc"
text = 'Reference: ABC-123-XYZ'
match = re.search(r'[A-Z]{3}(?:-[A-Z0-9]{3}){2}', text)

if match:
    print(f'Reference found: {match.group()}')
# Use a non-capturing group with a quantifier to match repeated patterns
# Match a sequence like "abc-abc-abc"
text = 'Reference: ABC-123-XYZ'
match = re.search(r'[A-Z]{3}(?:-[A-Z0-9]{3}){2}', text)

if match:
    print(f'Reference found: {match.group()}')

Groups with `re.findall()`¶

When you use re.findall() with a pattern that contains capturing groups, it returns the captured groups rather than the full matches. This is a common source of confusion, but it is also very useful for extracting data.

In [ ]:

Copied!

text = 'Dates: 25/12/2026, 01/01/2027, 14/02/2027'

# Without groups: returns full matches
print('Without groups:', re.findall(r'\d{2}/\d{2}/\d{4}', text))

# With one group: returns list of captured strings
print('One group (year):', re.findall(r'\d{2}/\d{2}/(\d{4})', text))

# With multiple groups: returns list of tuples
print('Multiple groups:', re.findall(r'(\d{2})/(\d{2})/(\d{4})', text))
text = 'Dates: 25/12/2026, 01/01/2027, 14/02/2027'

# Without groups: returns full matches
print('Without groups:', re.findall(r'\d{2}/\d{2}/\d{4}', text))

# With one group: returns list of captured strings
print('One group (year):', re.findall(r'\d{2}/\d{2}/(\d{4})', text))

# With multiple groups: returns list of tuples
print('Multiple groups:', re.findall(r'(\d{2})/(\d{2})/(\d{4})', text))

A practical example: parsing log entries¶

Let us combine everything to parse a realistic log entry.

In [ ]:

Copied!





log_entry = '2026-02-09 14:30:45 [ERROR] Connection timeout after 30s'

pattern = re.compile(
    r'(?P<date>\d{4}-\d{2}-\d{2})\s'
    r'(?P<time>\d{2}:\d{2}:\d{2})\s'
    r'\[(?P<level>\w+)\]\s'
    r'(?P<message>.+)'
)

match = pattern.search(log_entry)
if match:
    info = match.groupdict()
    for key, value in info.items():
        print(f'{key:>10}: {value}')
log_entry = '2026-02-09 14:30:45 [ERROR] Connection timeout after 30s'

pattern = re.compile(
    r'(?P<date>\d{4}-\d{2}-\d{2})\s'
    r'(?P<time>\d{2}:\d{2}:\d{2})\s'
    r'\[(?P<level>\w+)\]\s'
    r'(?P<message>.+)'
)

match = pattern.search(log_entry)
if match:
    info = match.groupdict()
    for key, value in info.items():
        print(f'{key:>10}: {value}')

Exercises¶

Exercise 1¶

Write a pattern with named groups to extract the hours and minutes from a time string in HH:MM format. Test it on 'Meeting at 14:30'.

In [ ]:

Copied!

text = 'Meeting at 14:30'

# Your code here
text = 'Meeting at 14:30'

# Your code here

Exercise 2¶

Use re.findall() with capturing groups to extract all the names and ages from the following text.

In [ ]:

Copied!

text = 'Alice is 30, Bob is 25, and Charlie is 35'

# Your code here
text = 'Alice is 30, Bob is 25, and Charlie is 35'

# Your code here

Exercise 3¶

Write a pattern using alternation to match both 'colour' and 'color'. Use a non-capturing group to avoid capturing the optional 'u'.

In [ ]:

Copied!

texts = ['I like this colour', 'I like this color', 'colourful display']

# Your code here
texts = ['I like this colour', 'I like this color', 'colourful display']

# Your code here

Solutions¶

In [ ]:

Copied!





# Exercise 1
text = 'Meeting at 14:30'
match = re.search(r'(?P<hours>\d{2}):(?P<minutes>\d{2})', text)

if match:
    print(f'Hours:   {match.group("hours")}')
    print(f'Minutes: {match.group("minutes")}')
# Exercise 1
text = 'Meeting at 14:30'
match = re.search(r'(?P<hours>\d{2}):(?P<minutes>\d{2})', text)

if match:
    print(f'Hours:   {match.group("hours")}')
    print(f'Minutes: {match.group("minutes")}')

In [ ]:

Copied!





# Exercise 2
text = 'Alice is 30, Bob is 25, and Charlie is 35'
results = re.findall(r'(\w+) is (\d+)', text)

for name, age in results:
    print(f'{name} is {age} years old')
# Exercise 2
text = 'Alice is 30, Bob is 25, and Charlie is 35'
results = re.findall(r'(\w+) is (\d+)', text)

for name, age in results:
    print(f'{name} is {age} years old')

In [ ]:

Copied!





# Exercise 3
texts = ['I like this colour', 'I like this color', 'colourful display']

pattern = re.compile(r'colou?r')

for text in texts:
    match = pattern.search(text)
    if match:
        print(f'"{text}" → found "{match.group()}"')
# Exercise 3
texts = ['I like this colour', 'I like this color', 'colourful display']

pattern = re.compile(r'colou?r')

for text in texts:
    match = pattern.search(text)
    if match:
        print(f'"{text}" → found "{match.group()}"')

Summary¶

In this tutorial, you learned:

Capturing groups: Use (...) to capture matched text and access it with .group(n) or .groups()
Named groups: Use (?P<name>...) for readable patterns and access groups with .group("name") or .groupdict()
Non-capturing groups: Use (?:...) when you need grouping but do not need to capture the text
Alternation: Use | inside groups to match one of several alternatives
Groups with re.findall(): When groups are present, re.findall() returns captured groups rather than full matches

Next steps¶

In the next tutorial, Find and replace, you will learn how to use re.sub(), re.findall(), re.finditer(), and backreferences to search, extract, and transform text.

Groups and capturing¶

Learning objectives¶

Capturing groups¶

Named groups¶

Non-capturing groups¶

Alternation with groups¶

Groups with quantifiers¶

Groups with re.findall()¶

A practical example: parsing log entries¶

Exercises¶

Exercise 1¶

Exercise 2¶

Exercise 3¶

Solutions¶

Summary¶

Next steps¶

Groups with `re.findall()`¶