{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "intro",
   "metadata": {},
   "source": [
    "# Reading files\n",
    "\n",
    "Reading files is one of the most fundamental and practical skills you will learn in Python. Whether you need to process data, read configuration, or analyse text, file reading is where it all begins.\n",
    "\n",
    "**Time commitment:** 15–20 minutes\n",
    "\n",
    "**Prerequisites:**\n",
    "\n",
    "- Basic Python knowledge (variables, strings, lists, the `print()` function)\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "By the end of this tutorial, you will be able to:\n",
    "\n",
    "- Open text files using the built-in `open()` function\n",
    "- Use `with` statements to manage files safely\n",
    "- Read entire file contents with the `read()` method\n",
    "- Read files line by line for memory efficiency\n",
    "- Use `pathlib.Path` for a modern approach to file reading"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "setup-heading",
   "metadata": {},
   "source": [
    "## Setting up a sample file\n",
    "\n",
    "Before you can read a file, you need a file to read. The following code creates a sample text file that you will use throughout this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "setup-code",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "sample_content = \"\"\"Line one of the sample file.\n",
    "Line two has some different text.\n",
    "Line three is the final line.\"\"\"\n",
    "\n",
    "Path(\"sample.txt\").write_text(sample_content, encoding=\"utf-8\")\n",
    "print(\"Sample file created successfully.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "open-heading",
   "metadata": {},
   "source": [
    "## Opening a file with `open()`\n",
    "\n",
    "The built-in `open()` function is the primary way to open files in Python. It returns a file object that you can use to read or write data.\n",
    "\n",
    "The recommended way to open a file is with a `with` statement. This ensures the file is closed automatically when you are finished, even if an error occurs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "open-basic",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"sample.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    content = f.read()\n",
    "\n",
    "print(content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "open-explain",
   "metadata": {},
   "source": [
    "Here is what happened in that code:\n",
    "\n",
    "- The `with` statement opens the file and assigns the file object to `f`\n",
    "- `\"r\"` means read mode, which is the default mode\n",
    "- `encoding=\"utf-8\"` ensures consistent behaviour across different operating systems\n",
    "- The file is automatically closed when the `with` block ends\n",
    "\n",
    "You should always specify the encoding explicitly. Without it, Python uses the platform default, which varies between operating systems and can lead to unexpected results."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "read-heading",
   "metadata": {},
   "source": [
    "## Reading the entire file with `read()`\n",
    "\n",
    "The `read()` method reads the entire file content as a single string. This is convenient for small files, but it is not ideal for very large files because the entire content must fit in memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "read-demo",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"sample.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    content = f.read()\n",
    "\n",
    "print(type(content))\n",
    "print(repr(content))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "read-explain",
   "metadata": {},
   "source": [
    "Notice that `read()` returns a single string containing the entire file, including newline characters (`\\n`)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "readlines-heading",
   "metadata": {},
   "source": [
    "## Reading all lines with `readlines()`\n",
    "\n",
    "The `readlines()` method returns a list of strings, one for each line in the file. Note that each string includes the newline character `\\n` at the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "readlines-demo",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"sample.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    lines = f.readlines()\n",
    "\n",
    "print(lines)\n",
    "print(f\"Number of lines: {len(lines)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "readlines-explain",
   "metadata": {},
   "source": [
    "Each line includes the trailing newline character. You can remove these using the `strip()` method with a list comprehension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "readlines-strip",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"sample.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    lines = [line.strip() for line in f.readlines()]\n",
    "\n",
    "print(lines)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "lineby-heading",
   "metadata": {},
   "source": [
    "## Reading line by line\n",
    "\n",
    "You can iterate directly over a file object to read it line by line. This is the most memory-efficient approach because only one line is held in memory at a time. This is the recommended approach for large files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "lineby-demo",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"sample.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    for line in f:\n",
    "        print(line.strip())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "readline-heading",
   "metadata": {},
   "source": [
    "## Using `readline()` for one line at a time\n",
    "\n",
    "The `readline()` method reads a single line each time it is called. This is useful when you only need the first few lines of a file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "readline-demo",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"sample.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    first_line = f.readline()\n",
    "    second_line = f.readline()\n",
    "\n",
    "print(f\"First line: {first_line.strip()}\")\n",
    "print(f\"Second line: {second_line.strip()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pathlib-heading",
   "metadata": {},
   "source": [
    "## A modern approach with `pathlib`\n",
    "\n",
    "The `pathlib` module provides an object-oriented approach to file system paths. The `Path` class offers convenient methods for reading files.\n",
    "\n",
    "`Path.read_text()` is the simplest way to read an entire text file. It opens the file, reads it, and closes it &ndash; all in one step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "pathlib-readtext",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "content = Path(\"sample.txt\").read_text(encoding=\"utf-8\")\n",
    "print(content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "pathlib-open-explain",
   "metadata": {},
   "source": [
    "For more control, you can use `Path.open()`, which works like the built-in `open()` but is called on a `Path` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "pathlib-open",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "path = Path(\"sample.txt\")\n",
    "with path.open(\"r\", encoding=\"utf-8\") as f:\n",
    "    for line in f:\n",
    "        print(line.strip())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exercises-heading",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Try these exercises to practise what you have learned.\n",
    "\n",
    "**Exercise 1:** Create a text file called `greeting.txt` containing three lines (your name, your favourite colour, and your favourite food), then read the file and print each line.\n",
    "\n",
    "**Exercise 2:** Write a function called `count_lines` that takes a file path and returns the number of lines in the file. Use type hints.\n",
    "\n",
    "**Exercise 3:** Write a function called `find_longest_line` that takes a file path and returns the longest line (stripped of whitespace). Use `pathlib.Path`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "solutions-heading",
   "metadata": {},
   "source": [
    "### Solutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "solution-1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "Path(\"greeting.txt\").write_text(\n",
    "    \"Alice\\nBlue\\nPasta\\n\", encoding=\"utf-8\"\n",
    ")\n",
    "\n",
    "with open(\"greeting.txt\", \"r\", encoding=\"utf-8\") as f:\n",
    "    for line in f:\n",
    "        print(line.strip())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "solution-2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "\n",
    "def count_lines(filepath: str | Path) -> int:\n",
    "    \"\"\"Count the number of lines in a text file.\n",
    "\n",
    "    Args:\n",
    "        filepath: The path to the file to count lines in.\n",
    "\n",
    "    Returns:\n",
    "        The number of lines in the file.\n",
    "    \"\"\"\n",
    "    path = Path(filepath)\n",
    "    with path.open(\"r\", encoding=\"utf-8\") as f:\n",
    "        return sum(1 for _ in f)\n",
    "\n",
    "\n",
    "result = count_lines(\"sample.txt\")\n",
    "print(f\"The file has {result} lines.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "solution-3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "\n",
    "def find_longest_line(filepath: str | Path) -> str:\n",
    "    \"\"\"Find the longest line in a text file.\n",
    "\n",
    "    Args:\n",
    "        filepath: The path to the file to search.\n",
    "\n",
    "    Returns:\n",
    "        The longest line with leading and trailing whitespace removed.\n",
    "    \"\"\"\n",
    "    path = Path(filepath)\n",
    "    with path.open(\"r\", encoding=\"utf-8\") as f:\n",
    "        return max((line.strip() for line in f), key=len)\n",
    "\n",
    "\n",
    "longest = find_longest_line(\"sample.txt\")\n",
    "print(f\"Longest line: {longest}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cleanup",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "Path(\"sample.txt\").unlink(missing_ok=True)\n",
    "Path(\"greeting.txt\").unlink(missing_ok=True)\n",
    "print(\"Temporary files removed.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "summary",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, you learned how to read text files in Python. Here are the key takeaways:\n",
    "\n",
    "- The `open()` function opens files for reading with the `\"r\"` mode\n",
    "- `with` statements ensure files are closed properly, even if an error occurs\n",
    "- `read()` reads the entire file as a single string\n",
    "- `readlines()` returns a list of lines (each with a trailing newline)\n",
    "- Iterating over a file object reads line by line, which is memory-efficient\n",
    "- `pathlib.Path.read_text()` is the simplest way to read a text file\n",
    "- Always specify `encoding=\"utf-8\"` for consistent behaviour across platforms\n",
    "\n",
    "In the [next tutorial](https://agilearn.co.uk/guides/file-handling/learn/02-writing-files), you will learn how to write and append content to files."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}