Normalization is a design technique for relational databases that structures tables to reduce duplication and avoid data anomalies, usually by progressing through 1NF, 2NF, and 3NF. In OLTP systems, normalizing to 3NF is a strong default; you can selectively denormalize later for performance.


Why Normalize?

Normalization helps you:

  • Avoid storing the same fact in many rows, which reduces inconsistent data.
  • Prevent insert, update, and delete anomalies (for example, losing a customer when their last order is removed).
  • Align the schema with real business rules, making constraints and changes easier to reason about.

A classic outcome is breaking one big “Orders” sheet into Customer, OrderHeader, and OrderLine tables, each responsible for its own facts.


Functional Dependencies – The Simple View

A functional dependency simply states what decides what in a table.

  • Written as X → Y.
  • Read as: “If two rows have the same X, they must have the same Y.”

Examples:

  • CustomerId → CustomerName
    Knowing CustomerId uniquely fixes CustomerName.
  • OrderId → OrderDate, CustomerId
    Knowing OrderId uniquely fixes the date and customer for the order.

A key is just a special case where the determinant decides all other columns:

  • If OrderId is a primary key, OrderId → all other columns in Order.

First Normal Form (1NF)

1NF ensures your tables are truly relational: no lists or repeating groups.

A table is in 1NF if:

  • Each column holds atomic (indivisible) values, not sets or comma-separated lists.
  • There are no repeating groups like Phone1, Phone2, Phone3.
  • Each row is uniquely identifiable by a key.

Example before 1NF:

OrderId CustomerName Products
1001 Alice Mouse,Laptop

Products contains multiple values.

1NF version:

OrderId CustomerName Product
1001 Alice Mouse
1001 Alice Laptop

Each cell is atomic, with one product per row.


Second Normal Form (2NF)

2NF applies when you have composite keys, and it removes dependencies on just part of the key.

A table is in 2NF if:

  • It is already in 1NF.
  • Every non-key column depends on the whole key, not just part.

Suppose:

OrderLine(
  OrderId,
  ProductId,
  OrderDate,
  CustomerId,
  ProductName,
  UnitPrice,
  Quantity
)

Primary key: (OrderId, ProductId)

Typical “decides” rules:

  • (OrderId, ProductId) → Quantity
  • OrderId → OrderDate, CustomerId
  • ProductId → ProductName, UnitPrice

Here:

  • OrderDate and CustomerId depend only on OrderId.
  • ProductName and UnitPrice depend only on ProductId.

These are partial dependencies, so the table is not in 2NF.

We decompose:

OrderHeader(
  OrderId    PK,
  OrderDate,
  CustomerId
)

Product(
  ProductId  PK,
  ProductName,
  UnitPrice
)

OrderLine(
  OrderId    FK,
  ProductId  FK,
  Quantity,
  PRIMARY KEY (OrderId, ProductId)
)

Now each non-key column in OrderLine depends on the full key (OrderId, ProductId).


Third Normal Form (3NF)

3NF removes transitive dependencies, where a non-key column depends on the key through another non-key column.

A table is in 3NF if:

  • It is in 2NF.
  • No non-key column depends on another non-key column (no transitive dependency).

Using OrderHeader:

OrderHeader(
  OrderId      PK,
  OrderDate,
  CustomerId,
  CustomerName,
  CustomerCity
)

Assume:

  • OrderId → OrderDate, CustomerId
  • CustomerId → CustomerName, CustomerCity

Then:

  • OrderId → CustomerName, CustomerCity via CustomerId.

CustomerName and CustomerCity are transitively dependent on OrderId through CustomerId, so this violates 3NF.

We fix this by splitting customer details:

Customer(
  CustomerId   PK,
  CustomerName,
  CustomerCity
)

OrderHeader(
  OrderId      PK,
  OrderDate,
  CustomerId   FK
)

Now each table’s non-key columns depend directly on its key.


Summary Table: 1NF, 2NF, 3NF

The table below captures the core rule and main problem each normal form addresses.

Normal Form Rule Main problem removed
1NF Only atomic values; no repeating groups; unique rows. Multi-valued cells and spreadsheet-style repetition.
2NF Already in 1NF and no partial dependency on part of a composite key. Redundancy caused by attributes tied to only part of the key.
3NF Already in 2NF and no non-key depends on another non-key (no transitive dependency). Redundancy and anomalies from attributes that depend on other attributes, not directly on the key.