Find and replace¶

In this tutorial, you will master the art of finding, extracting, and transforming text with regular expressions. You will learn how to use re.findall(), re.finditer(), and re.sub() along with backreferences to perform powerful text manipulation.

Time commitment: 15–20 minutes

Prerequisites:

Completion of Groups and capturing
Basic Python knowledge (strings, variables, and functions)

Learning objectives¶

By the end of this tutorial, you will be able to:

Use re.findall() to extract all matches from a string
Use re.finditer() to iterate over matches with full match object details
Use re.sub() to replace matched text
Use backreferences in replacement strings
Use re.split() to split strings on patterns
Use functions as replacement arguments in re.sub()

In [ ]:

Copied!

import re
import re

Finding all matches with `re.findall()`¶

You have already seen re.search(), which finds the first match. The re.findall() function finds all non-overlapping matches and returns them as a list of strings.

In [ ]:

Copied!

text = 'The prices are £5.99, £12.50, and £100.00'

# Find all prices
prices = re.findall(r'£\d+\.\d{2}', text)
print(f'Prices found: {prices}')
text = 'The prices are £5.99, £12.50, and £100.00'

# Find all prices
prices = re.findall(r'£\d+\.\d{2}', text)
print(f'Prices found: {prices}')

In [ ]:

Copied!





# Find all words that start with a capital letter
text = 'Alice and Bob visited London and Paris last Summer'
capitalised_words = re.findall(r'\b[A-Z][a-z]+\b', text)
print(f'Capitalised words: {capitalised_words}')
# Find all words that start with a capital letter
text = 'Alice and Bob visited London and Paris last Summer'
capitalised_words = re.findall(r'\b[A-Z][a-z]+\b', text)
print(f'Capitalised words: {capitalised_words}')

Remember from the previous tutorial: when the pattern contains capturing groups, re.findall() returns the captured groups rather than the full matches.

In [ ]:

Copied!

text = 'Contact: alice@example.com, bob@test.co.uk'

# Without groups: returns full matches
print('Full matches:', re.findall(r'[\w.]+@[\w.]+', text))

# With groups: returns only the captured parts
print('Domains only:', re.findall(r'[\w.]+@([\w.]+)', text))
text = 'Contact: alice@example.com, bob@test.co.uk'

# Without groups: returns full matches
print('Full matches:', re.findall(r'[\w.]+@[\w.]+', text))

# With groups: returns only the captured parts
print('Domains only:', re.findall(r'[\w.]+@([\w.]+)', text))

Iterating over matches with `re.finditer()`¶

The re.finditer() function returns an iterator of match objects, giving you access to the full match details (position, groups, and so on) for each match. This is more powerful than re.findall() when you need more than just the matched text.

In [ ]:

Copied!

text = 'Order #101 placed on 15/01/2026, Order #102 placed on 20/01/2026'

for match in re.finditer(r'Order #(\d+)', text):
    print(f'Found "{match.group()}" at position {match.start()}-{match.end()}')
    print(f'  Order number: {match.group(1)}')
text = 'Order #101 placed on 15/01/2026, Order #102 placed on 20/01/2026'

for match in re.finditer(r'Order #(\d+)', text):
    print(f'Found "{match.group()}" at position {match.start()}-{match.end()}')
    print(f'  Order number: {match.group(1)}')

In [ ]:

Copied!





# Using finditer() with named groups for structured extraction
log_text = """2026-02-09 14:30:00 [INFO] Server started
2026-02-09 14:31:15 [WARNING] High memory usage
2026-02-09 14:32:00 [ERROR] Connection refused"""

pattern = re.compile(
    r'(?P<date>[\d-]+) (?P<time>[\d:]+) \[(?P<level>\w+)\] (?P<message>.+)'
)

for match in pattern.finditer(log_text):
    info = match.groupdict()
    print(f'[{info["level"]:>7}] {info["time"]} - {info["message"]}')
# Using finditer() with named groups for structured extraction
log_text = """2026-02-09 14:30:00 [INFO] Server started
2026-02-09 14:31:15 [WARNING] High memory usage
2026-02-09 14:32:00 [ERROR] Connection refused"""

pattern = re.compile(
    r'(?P<date>[\d-]+) (?P<time>[\d:]+) \[(?P<level>\w+)\] (?P<message>.+)'
)

for match in pattern.finditer(log_text):
    info = match.groupdict()
    print(f'[{info["level"]:>7}] {info["time"]} - {info["message"]}')

Replacing text with `re.sub()`¶

The re.sub() function replaces all occurrences of a pattern with a replacement string. Its basic syntax is:

re.sub(pattern, replacement, string)

In [ ]:

Copied!

text = 'The colour of the colour wheel is colourful'

# Replace 'colour' with 'color'
result = re.sub(r'colour', 'color', text)
print(result)
text = 'The colour of the colour wheel is colourful'

# Replace 'colour' with 'color'
result = re.sub(r'colour', 'color', text)
print(result)

In [ ]:

Copied!





# Remove all digits from a string
text = 'Room 42, Floor 3, Building 7'
result = re.sub(r'\d+', '', text)
print(f'Without digits: "{result}"')
# Remove all digits from a string
text = 'Room 42, Floor 3, Building 7'
result = re.sub(r'\d+', '', text)
print(f'Without digits: "{result}"')

In [ ]:

Copied!





# Replace multiple whitespace characters with a single space
text = 'Too   many     spaces    here'
result = re.sub(r'\s+', ' ', text)
print(f'Cleaned: "{result}"')
# Replace multiple whitespace characters with a single space
text = 'Too   many     spaces    here'
result = re.sub(r'\s+', ' ', text)
print(f'Cleaned: "{result}"')

Limiting replacements with `count`¶

You can limit the number of replacements using the count parameter.

In [ ]:

Copied!

text = 'one two three four five'

# Replace only the first two words with 'X'
result = re.sub(r'\w+', 'X', text, count=2)
print(result)
text = 'one two three four five'

# Replace only the first two words with 'X'
result = re.sub(r'\w+', 'X', text, count=2)
print(result)

Backreferences in replacements¶

Backreferences allow you to refer to captured groups in the replacement string. Use \1, \2, and so on for numbered groups, or \g<name> for named groups.

In [ ]:

Copied!





# Swap first name and last name
text = 'Smith, Alice'
result = re.sub(r'(\w+), (\w+)', r'\2 \1', text)
print(result)
# Swap first name and last name
text = 'Smith, Alice'
result = re.sub(r'(\w+), (\w+)', r'\2 \1', text)
print(result)

In [ ]:

Copied!





# Convert dates from DD/MM/YYYY to YYYY-MM-DD using named groups
text = 'Dates: 25/12/2026, 01/01/2027'
result = re.sub(
    r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})',
    r'\g<year>-\g<month>-\g<day>',
    text,
)
print(result)
# Convert dates from DD/MM/YYYY to YYYY-MM-DD using named groups
text = 'Dates: 25/12/2026, 01/01/2027'
result = re.sub(
    r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})',
    r'\g<year>-\g<month>-\g<day>',
    text,
)
print(result)

In [ ]:

Copied!





# Wrap all email addresses in angle brackets
text = 'Contact alice@example.com or bob@test.co.uk'
result = re.sub(r'([\w.]+@[\w.]+)', r'<\1>', text)
print(result)
# Wrap all email addresses in angle brackets
text = 'Contact alice@example.com or bob@test.co.uk'
result = re.sub(r'([\w.]+@[\w.]+)', r'<\1>', text)
print(result)

Using functions as replacements¶

For more complex replacements, you can pass a function as the replacement argument. The function receives a match object and must return the replacement string.

In [ ]:

Copied!





def double_number(match: re.Match) -> str:
    """Double the matched number."""
    number = int(match.group())
    return str(number * 2)


text = 'I have 3 cats and 5 dogs'
result = re.sub(r'\d+', double_number, text)
print(result)
def double_number(match: re.Match) -> str:
    """Double the matched number."""
    number = int(match.group())
    return str(number * 2)


text = 'I have 3 cats and 5 dogs'
result = re.sub(r'\d+', double_number, text)
print(result)

In [ ]:

Copied!





# Convert temperatures from Fahrenheit to Celsius
def fahrenheit_to_celsius(match: re.Match) -> str:
    """Convert a Fahrenheit temperature to Celsius."""
    fahrenheit = float(match.group(1))
    celsius = (fahrenheit - 32) * 5 / 9
    return f'{celsius:.1f}°C'


text = 'Today: 68°F, Tomorrow: 77°F, Next week: 50°F'
result = re.sub(r'(\d+)°F', fahrenheit_to_celsius, text)
print(result)
# Convert temperatures from Fahrenheit to Celsius
def fahrenheit_to_celsius(match: re.Match) -> str:
    """Convert a Fahrenheit temperature to Celsius."""
    fahrenheit = float(match.group(1))
    celsius = (fahrenheit - 32) * 5 / 9
    return f'{celsius:.1f}°C'


text = 'Today: 68°F, Tomorrow: 77°F, Next week: 50°F'
result = re.sub(r'(\d+)°F', fahrenheit_to_celsius, text)
print(result)

Splitting strings with `re.split()`¶

The re.split() function splits a string at each occurrence of the pattern. This is more powerful than the built-in str.split() because you can split on patterns.

In [ ]:

Copied!





# Split on any whitespace (similar to str.split())
text = 'one  two\tthree\nfour'
parts = re.split(r'\s+', text)
print(f'Split on whitespace: {parts}')
# Split on any whitespace (similar to str.split())
text = 'one  two\tthree\nfour'
parts = re.split(r'\s+', text)
print(f'Split on whitespace: {parts}')

In [ ]:

Copied!





# Split on multiple delimiters (comma, semicolon, or pipe)
text = 'apple,banana;cherry|date'
parts = re.split(r'[,;|]', text)
print(f'Split on delimiters: {parts}')
# Split on multiple delimiters (comma, semicolon, or pipe)
text = 'apple,banana;cherry|date'
parts = re.split(r'[,;|]', text)
print(f'Split on delimiters: {parts}')

In [ ]:

Copied!





# Split sentences on punctuation
text = 'First sentence. Second sentence! Third sentence? Fourth.'
sentences = re.split(r'[.!?]\s*', text)
# Filter out empty strings
sentences = [s for s in sentences if s]
print(f'Sentences: {sentences}')
# Split sentences on punctuation
text = 'First sentence. Second sentence! Third sentence? Fourth.'
sentences = re.split(r'[.!?]\s*', text)
# Filter out empty strings
sentences = [s for s in sentences if s]
print(f'Sentences: {sentences}')

Keeping the delimiters¶

If you wrap the pattern in a capturing group, re.split() includes the delimiters in the result.

In [ ]:

Copied!

text = '3+5-2*4'

# Without capturing group: delimiters are removed
print('Without delimiters:', re.split(r'[+\-*]', text))

# With capturing group: delimiters are kept
print('With delimiters:   ', re.split(r'([+\-*])', text))
text = '3+5-2*4'

# Without capturing group: delimiters are removed
print('Without delimiters:', re.split(r'[+\-*]', text))

# With capturing group: delimiters are kept
print('With delimiters:   ', re.split(r'([+\-*])', text))

Using `re.subn()` to count replacements¶

The re.subn() function works just like re.sub() but also returns the number of replacements made.

In [ ]:

Copied!





text = 'foo bar foo baz foo'
result, count = re.subn(r'foo', 'qux', text)
print(f'Result: "{result}"')
print(f'Replacements made: {count}')
text = 'foo bar foo baz foo'
result, count = re.subn(r'foo', 'qux', text)
print(f'Result: "{result}"')
print(f'Replacements made: {count}')

A practical example: cleaning messy data¶

Let us combine the techniques from this tutorial to clean up messy input data.

In [ ]:

Copied!





messy_data = """
  Name:   Alice   Smith  
  Email: alice@example.com   
  Phone:  01234  567890
  Date:  25/12/2026  
"""

# Step 1: Extract key-value pairs
pairs = re.findall(r'(\w+):\s*(.+?)\s*$', messy_data, re.MULTILINE)
print('Extracted pairs:')
for key, value in pairs:
    # Step 2: Clean up extra whitespace within values
    clean_value = re.sub(r'\s+', ' ', value.strip())
    print(f'  {key}: {clean_value}')
messy_data = """
  Name:   Alice   Smith  
  Email: alice@example.com   
  Phone:  01234  567890
  Date:  25/12/2026  
"""

# Step 1: Extract key-value pairs
pairs = re.findall(r'(\w+):\s*(.+?)\s*$', messy_data, re.MULTILINE)
print('Extracted pairs:')
for key, value in pairs:
    # Step 2: Clean up extra whitespace within values
    clean_value = re.sub(r'\s+', ' ', value.strip())
    print(f'  {key}: {clean_value}')

Exercises¶

Exercise 1¶

Use re.findall() to extract all hashtags (words starting with #) from the following text.

In [ ]:

Copied!

text = 'Learning #Python and #regex is great! #coding #programming'

# Your code here
text = 'Learning #Python and #regex is great! #coding #programming'

# Your code here

Exercise 2¶

Use re.sub() with a backreference to convert the names from "Last, First" format to "First Last" format.

In [ ]:

Copied!

names = 'Smith, Alice\nJones, Bob\nBrown, Charlie'

# Your code here
names = 'Smith, Alice\nJones, Bob\nBrown, Charlie'

# Your code here

Exercise 3¶

Write a replacement function for re.sub() that censors any number in the text by replacing each digit with *.

In [ ]:

Copied!

text = 'Card number: 1234 5678 9012, PIN: 4321'

# Your code here
text = 'Card number: 1234 5678 9012, PIN: 4321'

# Your code here

Solutions¶

In [ ]:

Copied!





# Exercise 1
text = 'Learning #Python and #regex is great! #coding #programming'
hashtags = re.findall(r'#\w+', text)
print(f'Hashtags: {hashtags}')
# Exercise 1
text = 'Learning #Python and #regex is great! #coding #programming'
hashtags = re.findall(r'#\w+', text)
print(f'Hashtags: {hashtags}')

In [ ]:

Copied!





# Exercise 2
names = 'Smith, Alice\nJones, Bob\nBrown, Charlie'
result = re.sub(r'(\w+), (\w+)', r'\2 \1', names)
print(result)
# Exercise 2
names = 'Smith, Alice\nJones, Bob\nBrown, Charlie'
result = re.sub(r'(\w+), (\w+)', r'\2 \1', names)
print(result)

In [ ]:

Copied!





# Exercise 3
def censor_number(match: re.Match) -> str:
    """Replace each digit in the matched text with an asterisk."""
    return '*' * len(match.group())


text = 'Card number: 1234 5678 9012, PIN: 4321'
result = re.sub(r'\d+', censor_number, text)
print(result)
# Exercise 3
def censor_number(match: re.Match) -> str:
    """Replace each digit in the matched text with an asterisk."""
    return '*' * len(match.group())


text = 'Card number: 1234 5678 9012, PIN: 4321'
result = re.sub(r'\d+', censor_number, text)
print(result)

Summary¶

In this tutorial, you learned:

re.findall(): Find all non-overlapping matches and return them as a list
re.finditer(): Iterate over matches with full match object details
re.sub(): Replace all occurrences of a pattern with a replacement string
Backreferences: Use \1, \2, or \g<name> in replacement strings to refer to captured groups
Function replacements: Pass a function to re.sub() for complex replacement logic
re.split(): Split strings on regex patterns, with optional delimiter retention
re.subn(): Replace and count the number of replacements made

Next steps¶

Congratulations — you have completed all four tutorials! You now have a solid foundation in Python regular expressions. From here, you can:

Explore the Recipes for practical, real-world applications
Consult the Reference documentation for detailed technical information
Read the Concepts articles to deepen your understanding

Find and replace¶

Learning objectives¶

Finding all matches with re.findall()¶

Iterating over matches with re.finditer()¶

Replacing text with re.sub()¶

Limiting replacements with count¶

Backreferences in replacements¶

Using functions as replacements¶

Splitting strings with re.split()¶

Keeping the delimiters¶

Using re.subn() to count replacements¶

A practical example: cleaning messy data¶

Exercises¶

Exercise 1¶

Exercise 2¶

Exercise 3¶

Solutions¶

Summary¶

Next steps¶

Finding all matches with `re.findall()`¶

Iterating over matches with `re.finditer()`¶

Replacing text with `re.sub()`¶

Limiting replacements with `count`¶

Splitting strings with `re.split()`¶

Using `re.subn()` to count replacements¶