{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Your first pattern\n",
    "\n",
    "In this tutorial, you will write your first regular expression pattern and learn how to use Python's `re` module to search for text. By the end, you will be comfortable using `re.search()`, `re.match()`, and `re.fullmatch()` to find patterns in strings.\n",
    "\n",
    "**Time commitment:** 15–20 minutes\n",
    "\n",
    "**Prerequisites:**\n",
    "\n",
    "- Basic Python knowledge (strings, variables, and functions)\n",
    "- Python 3.12 or later installed\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "By the end of this tutorial, you will be able to:\n",
    "\n",
    "- Import and use the `re` module\n",
    "- Search for literal text using `re.search()`\n",
    "- Understand the difference between `re.search()`, `re.match()`, and `re.fullmatch()`\n",
    "- Work with match objects to extract matched text\n",
    "- Use raw strings for regex patterns\n",
    "- Use `re.compile()` for reusable patterns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting started with the `re` module\n",
    "\n",
    "Python includes a powerful regular expression module called `re` in its standard library. You do not need to install anything — simply import it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Searching for literal text\n",
    "\n",
    "The simplest type of regular expression is a **literal pattern** — a pattern that matches exact text. The `re.search()` function scans through a string looking for the first location where the pattern matches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = re.search(r'hello', 'Say hello to the world')\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `re.search()` function returns a **match object** when it finds a match, or `None` when it does not. The match object tells you where the match was found and what text was matched.\n",
    "\n",
    "Let us look at what happens when a pattern is not found."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = re.search(r'goodbye', 'Say hello to the world')\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When no match is found, `re.search()` returns `None`. This makes it easy to use in conditional statements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'The quick brown fox jumps over the lazy dog'\n",
    "\n",
    "if re.search(r'fox', text):\n",
    "    print('Found \"fox\" in the text!')\n",
    "else:\n",
    "    print('\"fox\" was not found.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Working with match objects\n",
    "\n",
    "When `re.search()` finds a match, it returns a match object that contains useful information. You can use the following methods:\n",
    "\n",
    "- `.group()` — returns the matched text\n",
    "- `.start()` — returns the start position of the match\n",
    "- `.end()` — returns the end position of the match\n",
    "- `.span()` — returns a tuple of (start, end) positions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'My postcode is SW1A 1AA'\n",
    "match = re.search(r'SW1A', text)\n",
    "\n",
    "if match:\n",
    "    print(f'Matched text: {match.group()}')\n",
    "    print(f'Start position: {match.start()}')\n",
    "    print(f'End position: {match.end()}')\n",
    "    print(f'Span: {match.span()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the start position is inclusive and the end position is exclusive, just like Python string slicing. You can verify this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'My postcode is SW1A 1AA'\n",
    "match = re.search(r'SW1A', text)\n",
    "\n",
    "if match:\n",
    "    start, end = match.span()\n",
    "    print(f'Using string slicing: \"{text[start:end]}\"')\n",
    "    print(f'Using .group():       \"{match.group()}\"')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why use raw strings?\n",
    "\n",
    "You may have noticed that the patterns above use `r'...'` syntax. This creates a **raw string** in Python. Raw strings treat backslashes as literal characters rather than escape sequences.\n",
    "\n",
    "This matters because regular expressions use backslashes extensively. For example, `\\d` matches any digit. Without a raw string, Python would try to interpret `\\d` as an escape sequence before the `re` module ever sees it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# With a raw string (recommended)\n",
    "print(r'\\d+')  # Python sees: \\d+\n",
    "\n",
    "# Without a raw string (not recommended for regex)\n",
    "print('\\\\d+')  # You need to double the backslash"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Always use raw strings for regex patterns.** It makes your patterns easier to read and avoids subtle bugs.\n",
    "\n",
    "Let us see a practical example. The pattern `\\d` matches any digit character (0–9)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'Order number: 42'\n",
    "match = re.search(r'\\d', text)\n",
    "\n",
    "if match:\n",
    "    print(f'First digit found: {match.group()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## `re.search()` versus `re.match()` versus `re.fullmatch()`\n",
    "\n",
    "The `re` module provides three functions for checking whether a pattern matches a string. Each behaves differently:\n",
    "\n",
    "| Function | Behaviour |\n",
    "|---|---|\n",
    "| `re.search()` | Scans the **entire string** for the first match anywhere |\n",
    "| `re.match()` | Checks for a match only at the **beginning** of the string |\n",
    "| `re.fullmatch()` | Checks whether the **entire string** matches the pattern |\n",
    "\n",
    "Let us see how they differ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'hello world'\n",
    "\n",
    "# re.search() finds 'world' anywhere in the string\n",
    "print('search for \"world\":', re.search(r'world', text))\n",
    "\n",
    "# re.match() only checks the beginning, so 'world' is not found\n",
    "print('match for \"world\":', re.match(r'world', text))\n",
    "\n",
    "# re.match() finds 'hello' because it is at the beginning\n",
    "print('match for \"hello\":', re.match(r'hello', text))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'hello'\n",
    "\n",
    "# re.fullmatch() requires the entire string to match\n",
    "print('fullmatch for \"hello\":', re.fullmatch(r'hello', text))\n",
    "print('fullmatch for \"hell\":', re.fullmatch(r'hell', text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common mistake is using `re.match()` when you mean `re.search()`. Remember:\n",
    "\n",
    "- Use `re.search()` to find a pattern **anywhere** in a string\n",
    "- Use `re.match()` to check if a string **starts with** a pattern\n",
    "- Use `re.fullmatch()` to check if a string **is exactly** a pattern"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Case sensitivity\n",
    "\n",
    "By default, pattern matching is case-sensitive. The pattern `r'hello'` will not match `'Hello'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Case-sensitive:', re.search(r'hello', 'Hello world'))\n",
    "\n",
    "# Use the re.IGNORECASE flag for case-insensitive matching\n",
    "print('Case-insensitive:', re.search(r'hello', 'Hello world', re.IGNORECASE))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `re.IGNORECASE` flag (often shortened to `re.I`) makes the pattern match regardless of case. You will learn more about flags in later tutorials."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compiling patterns with `re.compile()`\n",
    "\n",
    "If you use the same pattern multiple times, you can compile it into a **pattern object** using `re.compile()`. This makes your code cleaner and can improve performance when the pattern is used repeatedly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compile the pattern once\n",
    "pattern = re.compile(r'python', re.IGNORECASE)\n",
    "\n",
    "# Use the compiled pattern multiple times\n",
    "texts = [\n",
    "    'I love Python programming',\n",
    "    'PYTHON is versatile',\n",
    "    'Java is also popular',\n",
    "    'python scripts are useful',\n",
    "]\n",
    "\n",
    "for text in texts:\n",
    "    match = pattern.search(text)\n",
    "    if match:\n",
    "        print(f'Found \"{match.group()}\" in: {text}')\n",
    "    else:\n",
    "        print(f'No match in: {text}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A compiled pattern object has the same methods as the `re` module — `.search()`, `.match()`, `.fullmatch()`, and more — but you do not need to pass the pattern string each time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introducing metacharacters: the dot\n",
    "\n",
    "So far, every character in our patterns has been a **literal character** that matches itself. Regular expressions become powerful when you use **metacharacters** — characters with special meanings.\n",
    "\n",
    "The most basic metacharacter is the **dot** (`.`), which matches any single character except a newline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The dot matches any single character\n",
    "print(re.search(r'h.t', 'hat'))    # matches 'hat'\n",
    "print(re.search(r'h.t', 'hot'))    # matches 'hot'\n",
    "print(re.search(r'h.t', 'hit'))    # matches 'hit'\n",
    "print(re.search(r'h.t', 'hoot'))   # no match (two characters between h and t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The pattern `r'h.t'` matches the letter `h`, followed by **any single character**, followed by the letter `t`.\n",
    "\n",
    "If you need to match a literal dot, you must **escape** it with a backslash: `r'\\.'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Matching a literal dot\n",
    "print(re.search(r'example\\.com', 'Visit example.com'))      # matches\n",
    "print(re.search(r'example\\.com', 'Visit exampleXcom'))      # no match\n",
    "\n",
    "# Without escaping, the dot matches any character\n",
    "print(re.search(r'example.com', 'Visit exampleXcom'))        # matches (dot matches X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Try these exercises to practise what you have learned. Solutions are provided below.\n",
    "\n",
    "### Exercise 1\n",
    "\n",
    "Write a pattern that finds the word `\"regex\"` in the following text. Use `re.search()` and print the matched text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = 'Learning regex is rewarding'\n",
    "\n",
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2\n",
    "\n",
    "Use `re.match()` to check whether the following string starts with `\"Error\"`. Print `\"Starts with Error\"` or `\"Does not start with Error\"` accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "log_line = 'Error: file not found'\n",
    "\n",
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3\n",
    "\n",
    "Use `re.fullmatch()` to check whether the string `\"yes\"` is an exact match. Test it against the strings `\"yes\"`, `\"yes please\"`, and `\"YES\"` (with `re.IGNORECASE`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "strings_to_test = ['yes', 'yes please', 'YES']\n",
    "\n",
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 4\n",
    "\n",
    "Write a compiled pattern that matches the word `\"python\"` (case-insensitive). Use it to search through the list of strings below and print which ones contain a match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "texts = [\n",
    "    'Python is great',\n",
    "    'I prefer JavaScript',\n",
    "    'PYTHON is versatile',\n",
    "    'Learning python is fun',\n",
    "]\n",
    "\n",
    "# Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Solutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 1\n",
    "text = 'Learning regex is rewarding'\n",
    "match = re.search(r'regex', text)\n",
    "if match:\n",
    "    print(f'Found: {match.group()}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 2\n",
    "log_line = 'Error: file not found'\n",
    "if re.match(r'Error', log_line):\n",
    "    print('Starts with Error')\n",
    "else:\n",
    "    print('Does not start with Error')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 3\n",
    "strings_to_test = ['yes', 'yes please', 'YES']\n",
    "\n",
    "for s in strings_to_test:\n",
    "    result = re.fullmatch(r'yes', s, re.IGNORECASE)\n",
    "    if result:\n",
    "        print(f'\"{s}\" is an exact match')\n",
    "    else:\n",
    "        print(f'\"{s}\" is not an exact match')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 4\n",
    "texts = [\n",
    "    'Python is great',\n",
    "    'I prefer JavaScript',\n",
    "    'PYTHON is versatile',\n",
    "    'Learning python is fun',\n",
    "]\n",
    "\n",
    "pattern = re.compile(r'python', re.IGNORECASE)\n",
    "\n",
    "for text in texts:\n",
    "    if pattern.search(text):\n",
    "        print(f'Match found in: \"{text}\"')\n",
    "    else:\n",
    "        print(f'No match in: \"{text}\"')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, you learned:\n",
    "\n",
    "- **Importing `re`**: The `re` module is built into Python and requires no installation\n",
    "- **`re.search()`**: Finds the first match anywhere in a string\n",
    "- **`re.match()`**: Checks for a match at the beginning of a string\n",
    "- **`re.fullmatch()`**: Checks whether the entire string matches the pattern\n",
    "- **Match objects**: Use `.group()`, `.start()`, `.end()`, and `.span()` to inspect matches\n",
    "- **Raw strings**: Always use `r'...'` for regex patterns to avoid backslash issues\n",
    "- **`re.compile()`**: Compile frequently used patterns for cleaner, faster code\n",
    "- **The dot metacharacter**: `.` matches any single character (except newlines)\n",
    "\n",
    "## Next steps\n",
    "\n",
    "In the next tutorial, [Character classes and quantifiers](https://agilearn.co.uk/guides/regex/learn/02-character-classes-and-quantifiers), you will learn how to build more flexible patterns that match ranges of characters and control how many times a pattern repeats."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}