{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4",
   "metadata": {},
   "source": [
    "# String basics\n",
    "\n",
    "Welcome to your first tutorial on string processing with Python. In this tutorial, you will learn how Python represents text and discover the fundamental operations you can perform on strings.\n",
    "\n",
    "**Time commitment:** 15&ndash;20 minutes\n",
    "\n",
    "**Prerequisites:**\n",
    "- Python 3.12 or later installed\n",
    "- Basic familiarity with running Python code\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "By the end of this tutorial, you will be able to:\n",
    "\n",
    "- Create strings using single quotes, double quotes, and triple quotes\n",
    "- Access individual characters using indexing\n",
    "- Extract substrings using slicing\n",
    "- Understand why strings are immutable\n",
    "- Use basic string operators such as `+` and `*`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2c3d4e5",
   "metadata": {},
   "source": [
    "## Creating strings\n",
    "\n",
    "A **string** is a sequence of characters used to represent text. In Python, you create a string by enclosing text in quotation marks. Python offers several ways to do this, and each has its own advantages."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3d4e5f6",
   "metadata": {},
   "source": [
    "### Single quotes and double quotes\n",
    "\n",
    "The most common way to create a string is with single quotes (`'...'`) or double quotes (`\"...\"`). Both produce exactly the same result &ndash; Python treats them identically."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4e5f6a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "greeting = 'Hello, world!'\n",
    "print(greeting)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5f6a7b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "farewell = \"Goodbye, world!\"\n",
    "print(farewell)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a7b8c9",
   "metadata": {},
   "source": [
    "One practical reason to have both options is that you can include one type of quote inside the other without any special handling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7b8c9d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use double quotes when the string contains an apostrophe\n",
    "message = \"It is a lovely day, isn't it?\"\n",
    "print(message)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8c9d0e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use single quotes when the string contains double quotes\n",
    "dialogue = 'She said, \"Hello!\"'\n",
    "print(dialogue)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9d0e1f2",
   "metadata": {},
   "source": [
    "### Triple quotes\n",
    "\n",
    "For strings that span multiple lines, use triple quotes (`'''...'''` or `\"\"\"...\"\"\"`). Triple-quoted strings preserve the line breaks exactly as you type them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0e1f2a3",
   "metadata": {},
   "outputs": [],
   "source": [
    "poem = \"\"\"Roses are red,\n",
    "Violets are blue,\n",
    "Python is wonderful,\n",
    "And strings are too.\"\"\"\n",
    "print(poem)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1f2a3b4",
   "metadata": {},
   "source": [
    "### Escape characters\n",
    "\n",
    "Sometimes you need to include special characters in a string that you cannot simply type. Python uses the backslash (`\\`) as an **escape character** to represent these. The following are the most common escape sequences:\n",
    "\n",
    "| Escape sequence | Meaning |\n",
    "|---|---|\n",
    "| `\\n` | Newline (line break) |\n",
    "| `\\t` | Tab |\n",
    "| `\\\\` | Literal backslash |\n",
    "| `\\'` | Single quote |\n",
    "| `\\\"` | Double quote |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2a3b4c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Using escape characters\n",
    "print(\"First line\\nSecond line\")\n",
    "print(\"Column1\\tColumn2\\tColumn3\")\n",
    "print(\"This is a backslash: \\\\\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3b4c5d6",
   "metadata": {},
   "source": [
    "### Raw strings\n",
    "\n",
    "If you need a string where backslashes are treated as literal characters (for example, when working with file paths), prefix the string with `r` to create a **raw string**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4c5d6e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Without the r prefix, \\n is interpreted as a newline\n",
    "normal_string = \"C:\\new_folder\\test\"\n",
    "print(\"Normal string:\", normal_string)\n",
    "\n",
    "# With the r prefix, backslashes are treated literally\n",
    "raw_string = r\"C:\\new_folder\\test\"\n",
    "print(\"Raw string:   \", raw_string)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5d6e7f8",
   "metadata": {},
   "source": [
    "## String length\n",
    "\n",
    "To find out how many characters a string contains, use the built-in `len()` function. Every character counts, including spaces, punctuation, and escape sequences (each escape sequence counts as one character)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6e7f8a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world!\"\n",
    "print(len(text))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7f8a9b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Spaces count as characters\n",
    "spaced = \"a b c\"\n",
    "print(len(spaced))\n",
    "\n",
    "# An empty string has length zero\n",
    "empty = \"\"\n",
    "print(len(empty))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8a9b0c1",
   "metadata": {},
   "source": [
    "## Indexing\n",
    "\n",
    "You can access individual characters in a string using **indexing**. Python uses zero-based indexing, which means the first character is at position 0, the second at position 1, and so on.\n",
    "\n",
    "Use square brackets after the string (or variable name) with the index number inside: `text[0]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9b0c1d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "word = \"Python\"\n",
    "\n",
    "print(word[0])  # First character\n",
    "print(word[1])  # Second character\n",
    "print(word[5])  # Sixth (last) character"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0c1d2e3",
   "metadata": {},
   "source": [
    "### Negative indexing\n",
    "\n",
    "Python also supports **negative indexing**, which counts from the end of the string. The last character is at index `-1`, the second-to-last at `-2`, and so on. This is very convenient when you need to access characters near the end of a string without knowing its exact length."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1d2e3f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "word = \"Python\"\n",
    "\n",
    "print(word[-1])  # Last character\n",
    "print(word[-2])  # Second-to-last character\n",
    "print(word[-6])  # First character (same as word[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2e3f4a5",
   "metadata": {},
   "source": [
    "### Index out of range\n",
    "\n",
    "If you try to access an index that does not exist, Python raises an `IndexError`. The following cell demonstrates this &ndash; do not worry about the error, it is expected!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3f4a5b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "word = \"Python\"\n",
    "\n",
    "try:\n",
    "    print(word[10])\n",
    "except IndexError as error:\n",
    "    print(f\"IndexError: {error}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4a5b6c7",
   "metadata": {},
   "source": [
    "## Slicing\n",
    "\n",
    "**Slicing** allows you to extract a portion (a substring) of a string. The syntax is `text[start:stop]`, where:\n",
    "\n",
    "- `start` is the index where the slice begins (inclusive)\n",
    "- `stop` is the index where the slice ends (exclusive &ndash; the character at this position is not included)\n",
    "\n",
    "Think of the indices as pointing *between* the characters, with 0 before the first character and `len(text)` after the last."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5b6c7d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world!\"\n",
    "\n",
    "print(text[0:5])   # Characters from index 0 up to (not including) index 5\n",
    "print(text[7:12])  # Characters from index 7 up to (not including) index 12"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6c7d8e9",
   "metadata": {},
   "source": [
    "### Omitting start or stop\n",
    "\n",
    "You can omit the `start` or `stop` value. When you omit `start`, the slice begins at the beginning of the string. When you omit `stop`, the slice continues to the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7d8e9f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Hello, world!\"\n",
    "\n",
    "print(text[:5])   # From the beginning up to index 5\n",
    "print(text[7:])   # From index 7 to the end\n",
    "print(text[:])    # A copy of the entire string"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8e9f0a1",
   "metadata": {},
   "source": [
    "### Slicing with a step\n",
    "\n",
    "You can add a third value to the slice &ndash; the **step** -- using the syntax `text[start:stop:step]`. The step determines how many characters to skip between each selected character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e9f0a1b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"abcdefghij\"\n",
    "\n",
    "print(text[0:10:2])  # Every second character\n",
    "print(text[1:10:2])  # Every second character, starting from index 1\n",
    "print(text[::3])     # Every third character from the whole string"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0a1b2c3",
   "metadata": {},
   "source": [
    "### Reversing a string\n",
    "\n",
    "A particularly useful trick is to reverse a string using a step of `-1`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"Python\"\n",
    "reversed_text = text[::-1]\n",
    "print(reversed_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2c3d4e6",
   "metadata": {},
   "source": [
    "## String immutability\n",
    "\n",
    "An important characteristic of strings in Python is that they are **immutable** -- once a string is created, you cannot change its individual characters. If you try to assign a new character to a specific index, Python raises a `TypeError`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3d4e5f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "word = \"Hello\"\n",
    "\n",
    "try:\n",
    "    word[0] = \"J\"\n",
    "except TypeError as error:\n",
    "    print(f\"TypeError: {error}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4e5f6a8",
   "metadata": {},
   "source": [
    "Instead of modifying a string in place, you create a **new string** with the changes you want. There are several ways to do this &ndash; here is one using concatenation and slicing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5f6a7b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "word = \"Hello\"\n",
    "\n",
    "# Create a new string with 'J' in place of the first character\n",
    "new_word = \"J\" + word[1:]\n",
    "print(new_word)\n",
    "\n",
    "# The original string is unchanged\n",
    "print(word)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a7b8ca",
   "metadata": {},
   "source": [
    "Why are strings immutable? Immutability has several benefits:\n",
    "\n",
    "- **Safety:** You can pass strings to functions without worrying that the function will change them unexpectedly.\n",
    "- **Efficiency:** Python can optimise how it stores and reuses strings in memory.\n",
    "- **Hashing:** Immutable objects can be used as dictionary keys and in sets, which require a stable hash value."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7b8c9d1",
   "metadata": {},
   "source": [
    "## String operators\n",
    "\n",
    "Python provides several operators that work with strings, making it easy to combine, repeat, and search text."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8c9d0e2",
   "metadata": {},
   "source": [
    "### Concatenation with `+`\n",
    "\n",
    "The `+` operator joins two strings together into a new string. This is called **concatenation**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9d0e1f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "first_name = \"Ada\"\n",
    "last_name = \"Lovelace\"\n",
    "\n",
    "full_name = first_name + \" \" + last_name\n",
    "print(full_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0e1f2a4",
   "metadata": {},
   "source": [
    "### Repetition with `*`\n",
    "\n",
    "The `*` operator repeats a string a given number of times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1f2a3b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "line = \"-\" * 40\n",
    "print(line)\n",
    "\n",
    "echo = \"ha\" * 3\n",
    "print(echo)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2a3b4c6",
   "metadata": {},
   "source": [
    "### Membership testing with `in` and `not in`\n",
    "\n",
    "You can check whether a substring exists within a string using the `in` and `not in` operators. These return `True` or `False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3b4c5d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "sentence = \"The quick brown fox jumps over the lazy dog\"\n",
    "\n",
    "print(\"fox\" in sentence)\n",
    "print(\"cat\" in sentence)\n",
    "print(\"cat\" not in sentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4c5d6e8",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Now it is time to practise what you have learned. Try each exercise in the empty code cell below it. If you get stuck, you can check the solutions at the end of this section.\n",
    "\n",
    "### Exercise 1\n",
    "\n",
    "Create a variable called `full_address` that contains the following text on three separate lines (use a single string):\n",
    "\n",
    "```\n",
    "221B Baker Street\n",
    "London\n",
    "United Kingdom\n",
    "```\n",
    "\n",
    "Print the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d6e7f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 1: Your code here\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6e7f8aa",
   "metadata": {},
   "source": [
    "### Exercise 2\n",
    "\n",
    "Given the string `\"abcdefghijklmnopqrstuvwxyz\"`, use slicing to extract:\n",
    "\n",
    "1. The first five letters\n",
    "2. The last five letters\n",
    "3. Every third letter\n",
    "\n",
    "Print each result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7f8a9b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 2: Your code here\n",
    "alphabet = \"abcdefghijklmnopqrstuvwxyz\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8a9b0c2",
   "metadata": {},
   "source": [
    "### Exercise 3\n",
    "\n",
    "Given the string `\"racecar\"`, write code to check whether it is a palindrome (a word that reads the same forwards and backwards). Print `True` if it is, or `False` if it is not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9b0c1d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 3: Your code here\n",
    "word = \"racecar\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0c1d2e4",
   "metadata": {},
   "source": [
    "### Exercise 4\n",
    "\n",
    "Given the string `\"Hello, World!\"`, create a new string where the comma is replaced with a semicolon. Do this using concatenation and slicing (not the `str.replace()` method). Print the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1d2e3f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 4: Your code here\n",
    "text = \"Hello, World!\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2e3f4a6",
   "metadata": {},
   "source": [
    "### Solutions\n",
    "\n",
    "Expand the cells below to check your answers. It is perfectly fine if your solution differs from the one shown &ndash; there is often more than one correct approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3f4a5b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 1\n",
    "full_address = \"221B Baker Street\\nLondon\\nUnited Kingdom\"\n",
    "print(full_address)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4a5b6c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 2\n",
    "alphabet = \"abcdefghijklmnopqrstuvwxyz\"\n",
    "\n",
    "print(alphabet[:5])   # First five letters\n",
    "print(alphabet[-5:])  # Last five letters\n",
    "print(alphabet[::3])  # Every third letter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5b6c7d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 3\n",
    "word = \"racecar\"\n",
    "print(word == word[::-1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6c7d8ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Solution 4\n",
    "text = \"Hello, World!\"\n",
    "new_text = text[:5] + \";\" + text[6:]\n",
    "print(new_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7d8e9f1",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "Well done &ndash; you have covered the fundamentals of working with strings in Python! Here is a recap of the key points:\n",
    "\n",
    "- Strings are sequences of characters, created with single quotes (`'...'`), double quotes (`\"...\"`), or triple quotes (`'''...'''` / `\"\"\"...\"\"\"`)\n",
    "- The `len()` function returns the number of characters in a string\n",
    "- **Indexing** lets you access individual characters using `text[index]`, with zero-based positive indexing and negative indexing from the end\n",
    "- **Slicing** lets you extract substrings using `text[start:stop:step]`\n",
    "- Strings are **immutable** -- you cannot change them in place, but you can create new strings based on existing ones\n",
    "- The `+` operator concatenates strings, the `*` operator repeats them, and `in` / `not in` test for membership\n",
    "\n",
    "In the next tutorial, [String methods](https://agilearn.co.uk/guides/string-processing/learn/02-string-methods), you will learn about the powerful built-in methods that Python provides for transforming, searching, and manipulating strings."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbformat_minor": 5,
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}