Normalization is a design technique for relational databases that structures tables to reduce duplication and avoid data anomalies, usually by progressing through 1NF, 2NF, and 3NF. In OLTP systems, normalizing to 3NF is a strong default; you can selectively denormalize later for performance.
Why Normalize?
Normalization helps you:
- Avoid storing the same fact in many rows, which reduces inconsistent data.
- Prevent insert, update, and delete anomalies (for example, losing a customer when their last order is removed).
- Align the schema with real business rules, making constraints and changes easier to reason about.
A classic outcome is breaking one big “Orders” sheet into Customer, OrderHeader, and OrderLine tables, each responsible for its own facts.
Functional Dependencies – The Simple View
A functional dependency simply states what decides what in a table.
- Written as
X → Y. - Read as: “If two rows have the same X, they must have the same Y.”
Examples:
CustomerId → CustomerName
KnowingCustomerIduniquely fixesCustomerName.OrderId → OrderDate, CustomerId
KnowingOrderIduniquely fixes the date and customer for the order.
A key is just a special case where the determinant decides all other columns:
- If
OrderIdis a primary key,OrderId → all other columns in Order.
First Normal Form (1NF)
1NF ensures your tables are truly relational: no lists or repeating groups.
A table is in 1NF if:
- Each column holds atomic (indivisible) values, not sets or comma-separated lists.
- There are no repeating groups like
Phone1,Phone2,Phone3. - Each row is uniquely identifiable by a key.
Example before 1NF:
| OrderId | CustomerName | Products |
|---|---|---|
| 1001 | Alice | Mouse,Laptop |
Products contains multiple values.
1NF version:
| OrderId | CustomerName | Product |
|---|---|---|
| 1001 | Alice | Mouse |
| 1001 | Alice | Laptop |
Each cell is atomic, with one product per row.
Second Normal Form (2NF)
2NF applies when you have composite keys, and it removes dependencies on just part of the key.
A table is in 2NF if:
- It is already in 1NF.
- Every non-key column depends on the whole key, not just part.
Suppose:
OrderLine(
OrderId,
ProductId,
OrderDate,
CustomerId,
ProductName,
UnitPrice,
Quantity
)
Primary key: (OrderId, ProductId)
Typical “decides” rules:
(OrderId, ProductId) → QuantityOrderId → OrderDate, CustomerIdProductId → ProductName, UnitPrice
Here:
OrderDateandCustomerIddepend only onOrderId.ProductNameandUnitPricedepend only onProductId.
These are partial dependencies, so the table is not in 2NF.
We decompose:
OrderHeader(
OrderId PK,
OrderDate,
CustomerId
)
Product(
ProductId PK,
ProductName,
UnitPrice
)
OrderLine(
OrderId FK,
ProductId FK,
Quantity,
PRIMARY KEY (OrderId, ProductId)
)
Now each non-key column in OrderLine depends on the full key (OrderId, ProductId).
Third Normal Form (3NF)
3NF removes transitive dependencies, where a non-key column depends on the key through another non-key column.
A table is in 3NF if:
- It is in 2NF.
- No non-key column depends on another non-key column (no transitive dependency).
Using OrderHeader:
OrderHeader(
OrderId PK,
OrderDate,
CustomerId,
CustomerName,
CustomerCity
)
Assume:
OrderId → OrderDate, CustomerIdCustomerId → CustomerName, CustomerCity
Then:
OrderId → CustomerName, CustomerCityviaCustomerId.
CustomerName and CustomerCity are transitively dependent on OrderId through CustomerId, so this violates 3NF.
We fix this by splitting customer details:
Customer(
CustomerId PK,
CustomerName,
CustomerCity
)
OrderHeader(
OrderId PK,
OrderDate,
CustomerId FK
)
Now each table’s non-key columns depend directly on its key.
Summary Table: 1NF, 2NF, 3NF
The table below captures the core rule and main problem each normal form addresses.
| Normal Form | Rule | Main problem removed |
|---|---|---|
| 1NF | Only atomic values; no repeating groups; unique rows. | Multi-valued cells and spreadsheet-style repetition. |
| 2NF | Already in 1NF and no partial dependency on part of a composite key. | Redundancy caused by attributes tied to only part of the key. |
| 3NF | Already in 2NF and no non-key depends on another non-key (no transitive dependency). | Redundancy and anomalies from attributes that depend on other attributes, not directly on the key. |
Leave a Reply