{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4",
   "metadata": {},
   "source": [
    "# String methods\n",
    "\n",
    "In this tutorial, you will explore the most commonly used string methods in Python. These methods allow you to transform, split, join, and clean text with ease.\n",
    "\n",
    "**Time commitment:** 15&ndash;20 minutes\n",
    "\n",
    "**Prerequisites:**\n",
    "- Completion of [String basics](https://agilearn.co.uk/guides/string-processing/learn/01-string-basics)\n",
    "- Understanding of string indexing and slicing\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "By the end of this tutorial, you will be able to:\n",
    "\n",
    "- Use case conversion methods (`upper()`, `lower()`, `title()`, `capitalize()`, `swapcase()`)\n",
    "- Split strings into lists with `split()` and `rsplit()`\n",
    "- Join lists into strings with `join()`\n",
    "- Remove whitespace with `strip()`, `lstrip()`, and `rstrip()`\n",
    "- Replace substrings with `replace()`\n",
    "- Chain multiple string methods together"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2c3d4e5",
   "metadata": {},
   "source": [
    "## Case conversion methods\n",
    "\n",
    "Python provides several methods for changing the case of characters in a string. Because strings are immutable, each of these methods returns a **new** string rather than modifying the original."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3d4e5f6",
   "metadata": {},
   "source": [
    "### `str.upper()` and `str.lower()`\n",
    "\n",
    "The `str.upper()` method converts all characters to uppercase, and `str.lower()` converts all characters to lowercase. These are useful when you need to normalise text for comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4e5f6a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "greeting = \"Hello, World!\"\n",
    "\n",
    "print(greeting.upper())\n",
    "print(greeting.lower())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5f6a7b8",
   "metadata": {},
   "source": [
    "### `str.title()` and `str.capitalize()`\n",
    "\n",
    "The `str.title()` method capitalises the first letter of every word in the string. The `str.capitalize()` method capitalises only the first character of the entire string and converts the rest to lowercase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6a7b8c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "message = \"string processing with python\"\n",
    "\n",
    "print(message.title())\n",
    "print(message.capitalize())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7b8c9d0",
   "metadata": {},
   "source": [
    "### `str.swapcase()`\n",
    "\n",
    "The `str.swapcase()` method swaps the case of every character &ndash; uppercase becomes lowercase, and lowercase becomes uppercase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8c9d0e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "mixed = \"Hello, World!\"\n",
    "\n",
    "print(mixed.swapcase())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9d0e1f2",
   "metadata": {},
   "source": [
    "### `str.casefold()` for case-insensitive comparisons\n",
    "\n",
    "The `str.casefold()` method is similar to `str.lower()`, but it is more aggressive. It handles special characters from other languages that `str.lower()` does not convert. Use `str.casefold()` when you need reliable case-insensitive comparisons."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0e1f2a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The German lowercase letter sharp s\n",
    "german_word = \"Straße\"\n",
    "\n",
    "print(german_word.lower())\n",
    "print(german_word.casefold())\n",
    "\n",
    "# casefold is better for case-insensitive comparison\n",
    "print(\"STRASSE\".casefold() == \"Straße\".casefold())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1f2a3b4",
   "metadata": {},
   "source": [
    "## Splitting strings\n",
    "\n",
    "Splitting is one of the most common string operations. It allows you to break a string into a list of smaller strings based on a delimiter."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2a3b4c5",
   "metadata": {},
   "source": [
    "### `str.split()` with no arguments\n",
    "\n",
    "When you call `str.split()` without any arguments, it splits the string on any whitespace (spaces, tabs, and newlines) and automatically removes empty strings from the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3b4c5d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "sentence = \"  Python   is   great  \"\n",
    "\n",
    "words = sentence.split()\n",
    "print(words)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4c5d6e7",
   "metadata": {},
   "source": [
    "### `str.split(sep)` with a delimiter\n",
    "\n",
    "You can pass a specific separator to `str.split()`. When you do, it splits on that exact separator and does **not** remove empty strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d6e7f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "csv_row = \"Alice,Bob,,Charlie\"\n",
    "\n",
    "fields = csv_row.split(\",\")\n",
    "print(fields)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6e7f8a9",
   "metadata": {},
   "source": [
    "### `str.split(sep, maxsplit)` -- limiting the number of splits\n",
    "\n",
    "The `maxsplit` parameter limits how many splits are performed. The remaining text stays in the last element of the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7f8a9b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "log_entry = \"ERROR:2026-02-09:Something went wrong\"\n",
    "\n",
    "# Split into at most 2 parts\n",
    "parts = log_entry.split(\":\", maxsplit=2)\n",
    "print(parts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8a9b0c1",
   "metadata": {},
   "source": [
    "### `str.rsplit()` -- splitting from the right\n",
    "\n",
    "The `str.rsplit()` method works like `str.split()`, but it splits from the right side of the string. This is particularly useful with `maxsplit` when you want to keep the beginning of the string intact."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9b0c1d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "file_path = \"home/user/documents/report.txt\"\n",
    "\n",
    "# Split from the right to separate the filename from the directory\n",
    "parts = file_path.rsplit(\"/\", maxsplit=1)\n",
    "print(parts)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0c1d2e3",
   "metadata": {},
   "source": [
    "### `str.splitlines()` and `str.partition()`\n",
    "\n",
    "The `str.splitlines()` method splits a string on line boundaries (including `\\n`, `\\r\\n`, and `\\r`). The `str.partition()` method splits a string into three parts: everything before the separator, the separator itself, and everything after."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1d2e3f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "multiline = \"Line one\\nLine two\\nLine three\"\n",
    "\n",
    "lines = multiline.splitlines()\n",
    "print(lines)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2e3f4a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "email = \"user@example.com\"\n",
    "\n",
    "# partition splits into (before, separator, after)\n",
    "local, separator, domain = email.partition(\"@\")\n",
    "print(f\"Local part: {local}\")\n",
    "print(f\"Domain: {domain}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3f4a5b6",
   "metadata": {},
   "source": [
    "## Joining strings\n",
    "\n",
    "The `str.join()` method is the complement of `str.split()`. It takes a list (or any iterable) of strings and joins them together using the string as a separator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4a5b6c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "words = [\"Python\", \"is\", \"great\"]\n",
    "\n",
    "# Join with a space\n",
    "sentence = \" \".join(words)\n",
    "print(sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5b6c7d8",
   "metadata": {},
   "source": [
    "You can use any string as the separator, including an empty string, a comma, or a newline character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6c7d8e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "letters = [\"P\", \"y\", \"t\", \"h\", \"o\", \"n\"]\n",
    "\n",
    "# Join with no separator\n",
    "print(\"\".join(letters))\n",
    "\n",
    "# Join with a comma and space\n",
    "fruits = [\"apple\", \"banana\", \"cherry\"]\n",
    "print(\", \".join(fruits))\n",
    "\n",
    "# Join with a newline\n",
    "lines = [\"First line\", \"Second line\", \"Third line\"]\n",
    "print(\"\\n\".join(lines))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7d8e9f0",
   "metadata": {},
   "source": [
    "### Combining `split()` and `join()` for text transformation\n",
    "\n",
    "A common pattern is to split a string, process the parts, and then join them back together. This is a powerful technique for cleaning and transforming text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8e9f0a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "messy_text = \"  too   many    spaces   here  \"\n",
    "\n",
    "# Normalise whitespace by splitting and rejoining\n",
    "clean_text = \" \".join(messy_text.split())\n",
    "print(clean_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9f0a1b2",
   "metadata": {},
   "source": [
    "## Stripping whitespace\n",
    "\n",
    "User input and data from files often contain unwanted whitespace. Python provides three methods for removing it."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0a1b2c3",
   "metadata": {},
   "source": [
    "### `str.strip()`, `str.lstrip()`, and `str.rstrip()`\n",
    "\n",
    "The `str.strip()` method removes whitespace from both ends of a string. The `str.lstrip()` method removes whitespace from the left (start) only, and `str.rstrip()` removes it from the right (end) only."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "padded = \"   Hello, World!   \"\n",
    "\n",
    "print(repr(padded.strip()))\n",
    "print(repr(padded.lstrip()))\n",
    "print(repr(padded.rstrip()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2c3d4e6",
   "metadata": {},
   "source": [
    "The `repr()` function is used here to show the exact string, including any remaining whitespace, by wrapping it in quotes.\n",
    "\n",
    "### Stripping specific characters\n",
    "\n",
    "You can pass a string of characters to `str.strip()` to remove those specific characters instead of whitespace. Python removes any combination of the specified characters from the ends of the string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3d4e5f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://example.com/\"\n",
    "\n",
    "# Remove trailing slash\n",
    "print(url.rstrip(\"/\"))\n",
    "\n",
    "# Remove surrounding punctuation\n",
    "text = \"***Important***\"\n",
    "print(text.strip(\"*\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4e5f6a8",
   "metadata": {},
   "source": [
    "## Replacing substrings\n",
    "\n",
    "The `str.replace()` method returns a new string with all occurrences of a substring replaced by another substring."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5f6a7b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "message = \"I like cats. Cats are wonderful.\"\n",
    "\n",
    "# Replace all occurrences\n",
    "new_message = message.replace(\"cats\", \"dogs\")\n",
    "print(new_message)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a7b8d0",
   "metadata": {},
   "source": [
    "Notice that `str.replace()` is case-sensitive &ndash; it replaced \"cats\" but not \"Cats\". Keep this in mind when working with mixed-case text.\n",
    "\n",
    "### Limiting the number of replacements\n",
    "\n",
    "You can pass a third argument to `str.replace()` to limit the number of replacements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7b8c9d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"one-two-three-four-five\"\n",
    "\n",
    "# Replace only the first two hyphens with spaces\n",
    "result = text.replace(\"-\", \" \", 2)\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8c9d0e2",
   "metadata": {},
   "source": [
    "## Method chaining\n",
    "\n",
    "Because each string method returns a new string, you can **chain** multiple method calls together. This allows you to perform several transformations in a single expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9d0e1f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "raw_input = \"   Hello, World!   \"\n",
    "\n",
    "# Strip whitespace, convert to lowercase, and replace the comma\n",
    "cleaned = raw_input.strip().lower().replace(\",\", \"\")\n",
    "print(cleaned)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0e1f2a4",
   "metadata": {},
   "source": [
    "Method chaining works because each method returns a new string, and the next method is called on that new string. You can read the chain from left to right:\n",
    "\n",
    "1. `strip()` removes the leading and trailing whitespace\n",
    "2. `lower()` converts the result to lowercase\n",
    "3. `replace(\",\", \"\")` removes the comma\n",
    "\n",
    "Here is a practical example &ndash; creating a URL-friendly slug from a title."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1f2a3b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "title = \"  String Processing with Python!  \"\n",
    "\n",
    "slug = title.strip().lower().replace(\" \", \"-\").replace(\"!\", \"\")\n",
    "print(slug)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2a3b4c6",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Now it is time to practise what you have learned. Try to complete each exercise before looking at the solutions.\n",
    "\n",
    "### Exercise 1: Normalise a name\n",
    "\n",
    "Given a name with inconsistent casing and extra whitespace, clean it up so that each word is capitalised and there is no leading or trailing whitespace.\n",
    "\n",
    "For example, `\"  alice   SMITH  \"` should become `\"Alice Smith\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3b4c5d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "name = \"  alice   SMITH  \"\n",
    "\n",
    "# Your solution here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4c5d6e8",
   "metadata": {},
   "source": [
    "### Exercise 2: Build a CSV row\n",
    "\n",
    "Given a list of values, join them into a comma-separated string. Then split the result back into a list to verify it matches the original.\n",
    "\n",
    "For example, `[\"red\", \"green\", \"blue\"]` should produce `\"red,green,blue\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d6e7f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "colours = [\"red\", \"green\", \"blue\"]\n",
    "\n",
    "# Your solution here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6e7f8b0",
   "metadata": {},
   "source": [
    "### Exercise 3: Clean up user input\n",
    "\n",
    "Given a messy string from user input, perform the following steps:\n",
    "\n",
    "1. Strip leading and trailing whitespace\n",
    "2. Replace multiple spaces with a single space\n",
    "3. Convert everything to lowercase\n",
    "\n",
    "For example, `\"   Hello    WORLD   \"` should become `\"hello world\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7f8a9b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "user_input = \"   Hello    WORLD   \"\n",
    "\n",
    "# Your solution here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8a9b0c2",
   "metadata": {},
   "source": [
    "### Exercise 4: Extract a file extension\n",
    "\n",
    "Given a filename, use `str.rsplit()` to extract the file extension. Return the extension in lowercase without the dot.\n",
    "\n",
    "For example, `\"Report.PDF\"` should produce `\"pdf\"`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9b0c1d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = \"Report.PDF\"\n",
    "\n",
    "# Your solution here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0c1d2e4",
   "metadata": {},
   "source": [
    "### Solutions\n",
    "\n",
    "Below are sample solutions for each exercise. Do not look at these until you have tried the exercises yourself!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1d2e3f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 1: Normalise a name\n",
    "name = \"  alice   SMITH  \"\n",
    "normalised_name = \" \".join(name.split()).title()\n",
    "print(normalised_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2e3f4a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 2: Build a CSV row\n",
    "colours = [\"red\", \"green\", \"blue\"]\n",
    "csv_row = \",\".join(colours)\n",
    "print(csv_row)\n",
    "\n",
    "# Verify by splitting it back\n",
    "print(csv_row.split(\",\") == colours)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3f4a5b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 3: Clean up user input\n",
    "user_input = \"   Hello    WORLD   \"\n",
    "cleaned = \" \".join(user_input.split()).lower()\n",
    "print(cleaned)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4a5b6c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 4: Extract a file extension\n",
    "filename = \"Report.PDF\"\n",
    "extension = filename.rsplit(\".\", maxsplit=1)[1].lower()\n",
    "print(extension)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5b6c7d9",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, you have learned about the most commonly used string methods in Python:\n",
    "\n",
    "- **Case conversion** -- `upper()`, `lower()`, `title()`, `capitalize()`, `swapcase()`, and `casefold()` allow you to change the case of characters\n",
    "- **Splitting** -- `split()`, `rsplit()`, `splitlines()`, and `partition()` break strings into smaller parts\n",
    "- **Joining** -- `join()` combines a list of strings into a single string using a separator\n",
    "- **Stripping** -- `strip()`, `lstrip()`, and `rstrip()` remove unwanted characters from the ends of a string\n",
    "- **Replacing** -- `replace()` substitutes one substring for another\n",
    "- **Method chaining** -- you can call multiple methods in sequence because each method returns a new string\n",
    "\n",
    "These methods form the foundation of text processing in Python. You will use them frequently in real-world projects.\n",
    "\n",
    "In the next tutorial, [String formatting](https://agilearn.co.uk/guides/string-processing/learn/03-string-formatting), you will learn how to create well-formatted output using f-strings, `str.format()`, and the format specification mini-language."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbformat_minor": 5,
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}