Lesson 4

Databases & Where Data Lives

Explore how databases store persistent data, the difference between SQL and NoSQL, and how to choose the right one.

28 min read · Beginner

Data outlives your server

Here is a fact that surprises many new developers: when your server restarts, everything in its memory disappears. Variables, in-memory caches, session data — gone. If you want information to survive restarts, deployments, and crashes, you need persistent storage. That is what databases are for.

Databases are specialized systems designed to store, retrieve, and manage data reliably. They handle the hard problems — concurrent access, crash recovery, efficient lookups — so your application code can focus on business logic.

SQL vs NoSQL comparison

Feature	SQL (PostgreSQL, MySQL)	NoSQL (MongoDB, DynamoDB)
Data model	Tables with rows and columns	Documents, key-value, graphs
Schema	Fixed schema (with migrations)	Flexible, schema-on-read
Relationships	Foreign keys, JOINs	Embedded docs or manual references
Queries	Powerful SQL with JOINs	Simpler query languages
Scaling	Vertical + read replicas	Horizontal sharding built-in
Best for	Structured, relational data	Flexible schemas, high write volume
Examples	User accounts, orders, inventory	Activity feeds, IoT sensor data

Neither is universally better. PostgreSQL handles most startup use cases excellently. Choose based on your data shape and query patterns, not hype.

E-commerce schema: the full picture

A typical online store needs several related tables. Here is how they connect:

E-commerce table relationships

Foreign keys link orders to users and order_items to products. Each table owns one kind of data.

In ASCII, the same relationships look like this:

  users                orders              order_items           products
+---------+         +-----------+         +-------------+       +----------+
| id (PK) |<--------| user_id   |         | order_id    |------>| id (PK)  |
| name    |    1:N  | id (PK)   |<--------| product_id  |--1:N--| name     |
| email   |         | total     |    1:N  | quantity    |       | price    |
+---------+         | created_at|         | unit_price  |       +----------+
                    +-----------+         +-------------+

users — one row per customer
orders — one row per purchase, linked to a user via user_id
order_items — line items (product + quantity) for each order
products — catalog of things you sell

This is normalized design: each fact lives in exactly one place.

Read path: how data flows out

Database read path

The server translates API requests into database queries and shapes the response for the client.

On a read request, the server often checks a cache first. If the data is cached (a hit), it returns immediately without touching the database. On a cache miss, it queries the database, stores the result in cache, and returns it.

Write path: how data flows in

Database write path

Writes go to the database first, then invalidate stale cache entries.

Writes are more careful. The server validates input, writes to the database, then invalidates any cached copies of that data so the next read gets fresh information.

Interactive: database components

Database architecture components

App Server

Your application code sends queries through a connection pool — a set of reusable database connections. Without pooling, opening a new connection per request is slow and can exhaust the database's connection limit.

ACID: the bank transfer example

ACID properties guarantee reliable transactions:

Property	Meaning	Bank transfer example
Atomicity	All or nothing	Debit AND credit both happen, or neither does
Consistency	Rules always hold	Account balance never goes negative
Isolation	Concurrent txs don’t interfere	Two transfers don’t corrupt each other’s state
Durability	Committed data survives crashes	After “transfer complete,” a power outage won’t undo it

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 'alice';
  UPDATE accounts SET balance = balance + 100 WHERE id = 'bob';
COMMIT;

If the server crashes between the two UPDATEs, the transaction rolls back — Alice is not charged without Bob receiving the money.

Normalization: why split tables?

Normalization organizes data to reduce redundancy. The goal is simple: each fact should be stored exactly once.

Before normalization: the problems

Here is a denormalized “everything in one table” design:

Denormalized orders tablesql

-- ONE giant table (bad)
orders_denormalized:
| order_id | customer_name | customer_email      | product    | price |
|----------|---------------|---------------------|------------|-------|
| 1        | Alice Chen    | alice@example.com   | Keyboard   | 79.99 |
| 2        | Alice Chen    | alice@example.com   | Mouse      | 29.99 |
| 3        | Bob Smith     | bob@example.com     | Monitor    | 349.00 |

Customer info is duplicated on every order row. This causes three classic anomalies.

Three problems emerge:

Update anomaly — Alice changes her email to alice.chen@work.com. You must find and update every row where she appears. Miss one row and your data is inconsistent.

Insert anomaly — You want to add a new customer who has not ordered yet. There is nowhere to put them without creating a fake order row.

Delete anomaly — You delete Bob’s only order. You accidentally lose his name and email too, even though he is still a customer.

After normalization: splitting tables

Split the data so each entity has its own table, linked by foreign keys:

Normalized schemasql

CREATE TABLE customers (
id    SERIAL PRIMARY KEY,
name  TEXT NOT NULL,
email TEXT UNIQUE NOT NULL
);

CREATE TABLE orders (
id          SERIAL PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customers(id),
total       DECIMAL(10,2) NOT NULL,
created_at  TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE products (
id    SERIAL PRIMARY KEY,
name  TEXT NOT NULL,
price DECIMAL(10,2) NOT NULL
);

CREATE TABLE order_items (
id         SERIAL PRIMARY KEY,
order_id   INT NOT NULL REFERENCES orders(id),
product_id INT NOT NULL REFERENCES products(id),
quantity   INT NOT NULL DEFAULT 1,
unit_price DECIMAL(10,2) NOT NULL
);

Foreign keys (REFERENCES) enforce that every order belongs to a real user.

Now Alice’s email lives in one row. Change it once, every future query sees the update.

Querying normalized data with JOINs

Normalized data lives in separate tables — you combine it at query time with JOINs:

JOIN query for order detailssql

SELECT
o.id AS order_id,
c.name AS customer_name,
p.name AS product_name,
oi.quantity,
oi.unit_price
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.id = 42;

JOINs are the payoff of normalization: flexible queries without duplicated storage.

This single query assembles a complete order view from four tables. Yes, JOINs add query complexity — but they eliminate the update/insert/delete anomalies that plague denormalized designs.

Normalization journey

Each normal form removes a specific kind of redundancy. Most apps aim for 3NF.

Normal form	Rule	Example fix
1NF	No repeating groups	One product per `order_items` row, not a comma-separated list
2NF	No partial dependencies	`order_date` depends on `order_id`, not on `product_id`
3NF	No transitive dependencies	Store `city` on `addresses`, not derived from `zip` on `customers`

When denormalization makes sense

Normalization is the default, but sometimes you intentionally duplicate data for performance:

Read-heavy analytics — a dashboard that counts daily orders should not JOIN four tables on every page load
Caching aggregated counts — store total_orders on the customer row, updated by a background job
Materialized views — pre-computed query results refreshed periodically

Denormalized summary tablesql

CREATE TABLE customer_stats (
customer_id   INT PRIMARY KEY REFERENCES customers(id),
total_orders  INT NOT NULL DEFAULT 0,
total_spent   DECIMAL(12,2) NOT NULL DEFAULT 0,
last_order_at TIMESTAMPTZ
);

-- Refreshed by a nightly job or trigger after each order

Trade storage for read speed. Update this table when orders change, not on every dashboard view.

	Normalized	Denormalized
Writes	Simple, no duplicates	Must update multiple places
Reads	JOINs required	Fast single-table lookups
Consistency	Easy to maintain	Risk of stale copies
Best for	OLTP (transactions)	Analytics, dashboards

For most apps, aim for 3NF in your core schema, then denormalize specific hot paths only after you measure a performance problem.

Core database concepts

Indexes speed up lookups. Without an index, finding a user by email requires scanning every row. With an index, the database jumps directly to the right row.

Replication copies data to multiple machines for reliability and read performance.

Sharding splits data across machines when a single server cannot hold it all.

In practice

Start with PostgreSQL for most web apps. It handles relational data, JSON documents, full-text search, and scales further than most startups need. Add read replicas when query latency becomes a problem, not before.

Test your understanding

Database Basics Quiz

Question 1 of 6

Why should clients never connect to the database directly?

Key takeaways

Databases provide persistent storage that survives server restarts and crashes
Only servers should talk to databases — never expose database access to clients
SQL for structured relationships; NoSQL for flexible schemas
ACID transactions protect data integrity for critical operations
Normalization reduces redundancy — aim for 3NF unless you have a reason not to
Indexes, transactions, and replication are essential tools for performance and reliability

Common mistakes

Picking a database because it is trendy — PostgreSQL handles most use cases well
Skipping indexes — queries fine with 100 rows become painfully slow with 10 million
Storing everything in one giant table — related data belongs in related tables
Denormalizing too early — optimize for correctness first, performance when measured

Go deeper

Designing Data-Intensive Applications — Chapters 2-3 cover data models and storage engines
PostgreSQL Tutorial — hands-on introduction
Use The Index, Luke — practical guide to SQL indexing