{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "intro",
   "metadata": {},
   "source": [
    "# CSV and JSON files\n",
    "\n",
    "CSV and JSON are two of the most common formats for storing and exchanging structured data. Python provides built-in modules for working with both formats, making it straightforward to read, write, and process structured data.\n",
    "\n",
    "**Time commitment:** 15–20 minutes\n",
    "\n",
    "**Prerequisites:**\n",
    "\n",
    "- Basic Python knowledge (dictionaries, lists)\n",
    "- Completion of [Reading files](https://agilearn.co.uk/guides/file-handling/learn/01-reading-files), [Writing files](https://agilearn.co.uk/guides/file-handling/learn/02-writing-files), and [Working with paths](https://agilearn.co.uk/guides/file-handling/learn/03-working-with-paths)\n",
    "\n",
    "## Learning objectives\n",
    "\n",
    "By the end of this tutorial, you will be able to:\n",
    "\n",
    "- Read CSV files using `csv.reader()`\n",
    "- Write CSV files using `csv.writer()`\n",
    "- Use `csv.DictReader()` and `csv.DictWriter()` for dictionary-based access\n",
    "- Read JSON files using `json.load()`\n",
    "- Write JSON files using `json.dump()`\n",
    "- Convert between Python objects and JSON strings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-intro",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Part 1: CSV files\n",
    "\n",
    "CSV (Comma-Separated Values) files store tabular data as plain text. Each line represents a row, and values within a row are separated by commas."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-read-heading",
   "metadata": {},
   "source": [
    "## Reading CSV files\n",
    "\n",
    "First, let us create a sample CSV file. Then you will read it using `csv.reader()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "csv-create",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "csv_content = \"\"\"name,age,city\n",
    "Alice,30,London\n",
    "Bob,25,Manchester\n",
    "Charlie,35,Edinburgh\"\"\"\n",
    "\n",
    "Path(\"people.csv\").write_text(csv_content, encoding=\"utf-8\")\n",
    "print(\"CSV file created.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "csv-read",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "with open(\"people.csv\", \"r\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    reader = csv.reader(f)\n",
    "    for row in reader:\n",
    "        print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-read-explain",
   "metadata": {},
   "source": [
    "Each row is returned as a list of strings. Notice the `newline=\"\"` parameter &ndash; this is required when opening CSV files to prevent issues with newline handling."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-skip-heading",
   "metadata": {},
   "source": [
    "## Skipping the header row\n",
    "\n",
    "The first row of a CSV file typically contains column names. Use `next()` to read and skip the header."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "csv-skip",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "with open(\"people.csv\", \"r\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    reader = csv.reader(f)\n",
    "    header = next(reader)\n",
    "    print(f\"Columns: {header}\")\n",
    "    for row in reader:\n",
    "        print(f\"{row[0]} is {row[1]} years old and lives in {row[2]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-dict-heading",
   "metadata": {},
   "source": [
    "## Using `csv.DictReader()` for named access\n",
    "\n",
    "`csv.DictReader()` automatically uses the first row as field names. Each row is returned as a dictionary, which makes the code easier to read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "csv-dict",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "with open(\"people.csv\", \"r\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    reader = csv.DictReader(f)\n",
    "    for row in reader:\n",
    "        print(f\"{row['name']} is {row['age']} years old\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-write-heading",
   "metadata": {},
   "source": [
    "## Writing CSV files\n",
    "\n",
    "Use `csv.writer()` to write data to a CSV file. The `writerow()` method writes a single row, and `writerows()` writes multiple rows at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "csv-write",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "from pathlib import Path\n",
    "\n",
    "data = [\n",
    "    [\"product\", \"price\", \"quantity\"],\n",
    "    [\"Apples\", \"1.50\", \"10\"],\n",
    "    [\"Bread\", \"2.00\", \"5\"],\n",
    "    [\"Milk\", \"1.20\", \"8\"],\n",
    "]\n",
    "\n",
    "with open(\"products.csv\", \"w\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    writer = csv.writer(f)\n",
    "    writer.writerows(data)\n",
    "\n",
    "print(Path(\"products.csv\").read_text(encoding=\"utf-8\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "csv-dictwrite-heading",
   "metadata": {},
   "source": [
    "## Writing CSV files with `csv.DictWriter()`\n",
    "\n",
    "`csv.DictWriter()` writes dictionaries to a CSV file. You specify the field names, and the writer handles the ordering."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "csv-dictwrite",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "from pathlib import Path\n",
    "\n",
    "people = [\n",
    "    {\"name\": \"Diana\", \"age\": 28, \"city\": \"Bristol\"},\n",
    "    {\"name\": \"Edward\", \"age\": 42, \"city\": \"Leeds\"},\n",
    "]\n",
    "\n",
    "with open(\"new-people.csv\", \"w\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    fieldnames = [\"name\", \"age\", \"city\"]\n",
    "    writer = csv.DictWriter(f, fieldnames=fieldnames)\n",
    "    writer.writeheader()\n",
    "    writer.writerows(people)\n",
    "\n",
    "print(Path(\"new-people.csv\").read_text(encoding=\"utf-8\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "json-intro",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Part 2: JSON files\n",
    "\n",
    "JSON (JavaScript Object Notation) is a lightweight data format that is easy for both humans and machines to read. It maps naturally to Python dictionaries and lists."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "json-read-heading",
   "metadata": {},
   "source": [
    "## Reading JSON files\n",
    "\n",
    "Use `json.load()` to read JSON data from a file. The data is automatically converted to Python objects (dictionaries, lists, strings, numbers, and booleans)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "json-create",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "json_content = \"\"\"{\n",
    "    \"name\": \"Alice\",\n",
    "    \"age\": 30,\n",
    "    \"city\": \"London\",\n",
    "    \"hobbies\": [\"reading\", \"cycling\", \"cooking\"]\n",
    "}\"\"\"\n",
    "\n",
    "Path(\"person.json\").write_text(json_content, encoding=\"utf-8\")\n",
    "print(\"JSON file created.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "json-read",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "with Path(\"person.json\").open(\"r\", encoding=\"utf-8\") as f:\n",
    "    data = json.load(f)\n",
    "\n",
    "print(data)\n",
    "print(f\"Name: {data['name']}\")\n",
    "print(f\"Hobbies: {', '.join(data['hobbies'])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "json-write-heading",
   "metadata": {},
   "source": [
    "## Writing JSON files\n",
    "\n",
    "Use `json.dump()` to write Python objects to a JSON file. The `indent` parameter produces human-readable output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "json-write",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "student = {\n",
    "    \"name\": \"Bob\",\n",
    "    \"age\": 25,\n",
    "    \"subjects\": [\"Mathematics\", \"Physics\", \"Chemistry\"],\n",
    "    \"graduated\": False,\n",
    "}\n",
    "\n",
    "with Path(\"student.json\").open(\"w\", encoding=\"utf-8\") as f:\n",
    "    json.dump(student, f, indent=4)\n",
    "\n",
    "print(Path(\"student.json\").read_text(encoding=\"utf-8\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "json-strings-heading",
   "metadata": {},
   "source": [
    "## Working with JSON strings\n",
    "\n",
    "The `json` module also provides functions for working with JSON strings directly, without files:\n",
    "\n",
    "- `json.loads()` -- parse a JSON string into Python objects\n",
    "- `json.dumps()` -- convert Python objects to a JSON string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "json-strings",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "json_string = '{\"temperature\": 21.5, \"unit\": \"Celsius\"}'\n",
    "data = json.loads(json_string)\n",
    "print(data)\n",
    "print(type(data))\n",
    "\n",
    "back_to_string = json.dumps(data, indent=2)\n",
    "print(back_to_string)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "json-lists-heading",
   "metadata": {},
   "source": [
    "## Working with lists of objects\n",
    "\n",
    "JSON files often contain lists of objects. Reading and writing these works the same way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "json-lists",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "books = [\n",
    "    {\"title\": \"Pride and Prejudice\", \"author\": \"Jane Austen\", \"year\": 1813},\n",
    "    {\"title\": \"1984\", \"author\": \"George Orwell\", \"year\": 1949},\n",
    "    {\"title\": \"Great Expectations\", \"author\": \"Charles Dickens\", \"year\": 1861},\n",
    "]\n",
    "\n",
    "with Path(\"books.json\").open(\"w\", encoding=\"utf-8\") as f:\n",
    "    json.dump(books, f, indent=4)\n",
    "\n",
    "with Path(\"books.json\").open(\"r\", encoding=\"utf-8\") as f:\n",
    "    loaded_books = json.load(f)\n",
    "\n",
    "for book in loaded_books:\n",
    "    print(f\"{book['title']} by {book['author']} ({book['year']})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exercises-heading",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "Try these exercises to practise what you have learned.\n",
    "\n",
    "**Exercise 1:** Create a CSV file with a list of five books (title, author, year), then read it back using `csv.DictReader()` and print each book.\n",
    "\n",
    "**Exercise 2:** Write a function that reads a JSON file containing a list of dictionaries and returns only items where a given key matches a given value.\n",
    "\n",
    "**Exercise 3:** Write a function that converts a CSV file to a JSON file."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "solutions-heading",
   "metadata": {},
   "source": [
    "### Solutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "solution-1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "books_data = [\n",
    "    {\"title\": \"Emma\", \"author\": \"Jane Austen\", \"year\": \"1815\"},\n",
    "    {\"title\": \"Dracula\", \"author\": \"Bram Stoker\", \"year\": \"1897\"},\n",
    "    {\"title\": \"Frankenstein\", \"author\": \"Mary Shelley\", \"year\": \"1818\"},\n",
    "    {\"title\": \"Jane Eyre\", \"author\": \"Charlotte Bront\\u00eb\", \"year\": \"1847\"},\n",
    "    {\"title\": \"Wuthering Heights\", \"author\": \"Emily Bront\\u00eb\", \"year\": \"1847\"},\n",
    "]\n",
    "\n",
    "with open(\"books.csv\", \"w\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    writer = csv.DictWriter(f, fieldnames=[\"title\", \"author\", \"year\"])\n",
    "    writer.writeheader()\n",
    "    writer.writerows(books_data)\n",
    "\n",
    "with open(\"books.csv\", \"r\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "    reader = csv.DictReader(f)\n",
    "    for row in reader:\n",
    "        print(f\"{row['title']} by {row['author']} ({row['year']})\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "solution-2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "\n",
    "def filter_json(\n",
    "    filepath: str | Path, key: str, value: object\n",
    ") -> list[dict]:\n",
    "    \"\"\"Read a JSON file and return items matching a key-value pair.\n",
    "\n",
    "    Args:\n",
    "        filepath: The path to the JSON file.\n",
    "        key: The key to filter on.\n",
    "        value: The value to match.\n",
    "\n",
    "    Returns:\n",
    "        A list of dictionaries where the key matches the value.\n",
    "    \"\"\"\n",
    "    path = Path(filepath)\n",
    "    with path.open(\"r\", encoding=\"utf-8\") as f:\n",
    "        data = json.load(f)\n",
    "    return [item for item in data if item.get(key) == value]\n",
    "\n",
    "\n",
    "results = filter_json(\"books.json\", \"author\", \"Jane Austen\")\n",
    "for book in results:\n",
    "    print(book)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "solution-3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "import json\n",
    "from pathlib import Path\n",
    "\n",
    "\n",
    "def csv_to_json(csv_path: str | Path, json_path: str | Path) -> None:\n",
    "    \"\"\"Convert a CSV file to a JSON file.\n",
    "\n",
    "    Each row in the CSV becomes a dictionary in a JSON array.\n",
    "\n",
    "    Args:\n",
    "        csv_path: The path to the input CSV file.\n",
    "        json_path: The path to the output JSON file.\n",
    "    \"\"\"\n",
    "    with open(csv_path, \"r\", encoding=\"utf-8\", newline=\"\") as f:\n",
    "        reader = csv.DictReader(f)\n",
    "        rows = list(reader)\n",
    "\n",
    "    with Path(json_path).open(\"w\", encoding=\"utf-8\") as f:\n",
    "        json.dump(rows, f, indent=4)\n",
    "\n",
    "\n",
    "csv_to_json(\"books.csv\", \"books-converted.json\")\n",
    "print(Path(\"books-converted.json\").read_text(encoding=\"utf-8\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cleanup",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "for filename in [\"people.csv\", \"products.csv\", \"new-people.csv\",\n",
    "                 \"person.json\", \"student.json\", \"books.json\",\n",
    "                 \"books.csv\", \"books-converted.json\"]:\n",
    "    Path(filename).unlink(missing_ok=True)\n",
    "\n",
    "print(\"Temporary files removed.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "summary",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, you learned how to work with two of the most common structured data formats. Here are the key takeaways:\n",
    "\n",
    "- `csv.reader()` and `csv.writer()` handle CSV data as lists of strings\n",
    "- `csv.DictReader()` and `csv.DictWriter()` provide dictionary-based access for cleaner code\n",
    "- Always use `newline=\"\"` when opening CSV files\n",
    "- `json.load()` and `json.dump()` read and write JSON files\n",
    "- `json.loads()` and `json.dumps()` work with JSON strings\n",
    "- The `indent` parameter makes JSON output human-readable\n",
    "- Always specify `encoding=\"utf-8\"` for consistent behaviour\n",
    "\n",
    "Congratulations! You have completed all four tutorials. You now have a solid foundation in file handling with Python. Explore the [Recipes](https://agilearn.co.uk/guides/file-handling/recipes/index) for more advanced techniques, or consult the [Reference](https://agilearn.co.uk/guides/file-handling/reference/index) section for detailed technical documentation."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}