2 pages
5.1 Normal form
Redundancy and dependence
Redundancy is the repetition of related values in a table. Redundancy causes database management problems. When related values are updated or deleted, all copies must be changed, which makes queries slow and complex. If copies are not updated or deleted uniformly, the copies become inconsistent and the correct version is uncertain.
Column A depends on column B means each B value is related to at most one A value. Columns A and B may be simple or composite. ‘A depends on B’ is denoted ‘B → A’.

- The fact that passenger number 222 is named Elvira Yin is repeated.
- The fact that first class fares board first is repeated.
- Each number always has the same name, so PassengerName depends on PassengerNumber.
- Each fare class always has the same boarding zone, so BoardingZoneNumber depends on FareClass.
Redundancy occurs when a dependency is on a column that is not unique.
Ex: In the Booking table, PassengerName depends on PassengerNumber, which is not unique:
- Since PassengerNumber is not unique, 222 appears in multiple rows.
- Since PassengerName depends on PassengerNumber, passenger 222 is always named Elvira Yin.
- Therefore, (222, Elvira Yin) appears in multiple rows.
In other words, the Booking table is redundant.
Normal forms
Normal forms are rules for designing tables with less redundancy. Normal forms are numbered, first normal form through fifth normal form. An additional normal form, named Boyce-Codd normal form, is an improved version of third normal form. The six normal forms comprise a sequence, with each normal form allowing less redundancy than the prior normal form.
As a practical matter, Boyce-Codd normal form is most important. Boyce-Codd normal form eliminates all dependencies on columns that are not unique, and all associated redundancy.
Dependence of one column on another is formally called functional dependence. Functional dependence is not the only type of dependence. Multivalued dependence and join dependence entail dependencies between three or more columns. Multivalued and join dependencies are complex, uncommon, and primarily of theoretical interest.
Fourth and fifth normal forms eliminate multivalued and join dependencies, respectively, and associated redundancy. Fourth and fifth normal forms are not discussed in this material.
First normal form
A table is in first normal form when all non-key columns depend on the primary key.
If a table has no duplicate rows, the composite of all columns is unique, so either all columns or some subset comprise a primary key. A table with a primary key is in first normal form, as illustrated in the animation below. Thus, a table with no duplicate rows is in first normal form.

- PassengerNumber is the primary key of the Passenger table.
- Since primary keys are unique, the PassengerNumber value 222 cannot relate to multiple PassengerName values Elvira Yin and Gabriela Lopez.
- Every non-key column (PassengerName) depends on the key PassengerNumber, so Passenger is in first normal form.
The relational model, as defined in the original paper by E. F. Codd, stipulates that tables may not have duplicate rows and thus are always in first normal form. However, relational databases do allow duplicate rows, primarily for temporary tables that load data or display query results. In practice, all permanent tables should disallow duplicates, have a primary key, and be in first normal form.
Second normal form
A table is in second normal form when all non-key columns depend on the whole primary key. In other words, a non-key column cannot depend on part of a composite primary key. A table with a simple primary key is automatically in second normal form.

- PassengerName depends on PassengerNumber. Dependence on part of the primary key causes repetition of (222, Elvira Yin).
- Removing PassengerName from Booking eliminates redundancy. Booking is now in second normal form.
- PassengerName moves to the Passenger table. The Passenger table has no redundancies, since PassengerName depends on the whole primary key.
Third normal form
Redundancy can occur in a second normal form table when a non-key column depends on another non-key column. Informally, a table is in third normal form when all non-key columns depend on the key, the whole key, and nothing but the key.

- BoardingZoneNumber depends on FareClass. Dependence on a non-key column causes the redundancy (First, 1).
- Removing BoardingZoneNumber from Booking eliminates redundancy. Booking is now in third normal form.
- BoardingZoneNumber moves to the Fare table. The Fare table has no redundancies since BoardingZoneNumber depends on the primary key.
The informal definition of third normal form, above, is accurate when the primary key is the only unique column. The formal definition, below, accounts for tables with several unique columns.
A candidate key is a simple or composite column that is unique and minimal. Minimal means all columns are necessary for uniqueness. A table may have several candidate keys. The database designer designates one candidate key as the primary key.
A non-key column is a column that is not contained in a candidate key.
Ex: The Employee table contains columns ID, SocialNumber, and Name:
- ID and SocialNumber are candidate keys, since both are unique and minimal.
- (ID, Name) is unique but not minimal, since Name is not necessary for uniqueness. Therefore (ID, Name) is not a candidate key.
- Name is the only non-key column.
A table is in third normal form if, whenever a non-key column A depends on column B, then B is unique. Columns A and B may be simple or composite. Although B is unique, B is not necessarily minimal and therefore is not necessarily a candidate key.
Boyce-Codd normal form
The definition of third normal form applies to non-key columns only, which allows for occasional redundancy. Boyce-Codd normal form applies to all columns and eliminates this redundancy.
A table is in Boyce-Codd normal form if, whenever column A depends on column B, then B is unique. Columns A and B may be simple or composite. This definition is identical to the definition of third normal form with the term ‘non-key’ removed.
Boyce-Codd normal form is considered the gold standard of table design. Although fourth and fifth normal forms remove additional types of redundancy, these redundancies are uncommon and of little practical concern.

- EmployeeProjectTask assigns employees to projects and tasks. Each employee is assigned at most one task on each project.
- TaskName is the only non-key column. TaskName depends on the unique column (EmployeeID, ProjectName) but not on EmployeeID or ProjectName. EmployeeProjectTask is in third normal form.
- The company has a special rule: TaskNames are never repeated on different projects. In other words, ProjectName depends on TaskName, which is not unique.
- Since ProjectName depends on a non-unique column, EmployeeProjectTask is not in Boyce-Codd normal form and contains redundant facts.
- Decomposing to EmployeeTask and TaskProject eliminates the redundancy.
- EmployeeTask contains no dependencies. TaskProject contains one dependency, ProjectName on the unique column TaskName. Both tables are in Boyce-Codd normal form.
Trivial dependencies
When the columns of A are a subset of the columns of B, A always depends on B. Ex: FareClass depends on (FlightCode, FareClass). These dependencies are called trivial.
Technically, trivial dependencies must be excluded in definitions of normal form: A table is in Boyce-Codd normal form if, for all non-trivial dependencies B → A, B is unique.
5.2 Normalization
Normalization
Implementing entities, relationships, and attributes usually generates tables with no redundancy. Occasionally, however, implementation results in redundant tables. This redundancy is eliminated with normalization, the last step of logical design.
Normalization eliminates redundancy by decomposing a table into two or more tables in higher normal form. Ex: A table in first normal form might be replaced by two tables in third normal form. In principle, normalization decomposes tables to any higher normal form. Fourth and fifth normal form are complex, however, and have limited practical value. As a practical matter, database designers usually normalize tables to Boyce-Codd normal form.
Column A depends on column B when each B value is related to at most one A value. A and B may be simple or composite columns. In a Boyce-Codd normal form table, if column A depends on column B, then B must be unique. Normalizing a table to Boyce-Codd normal form involves three steps:
- List all unique columns. Unique columns may be simple or composite. In composite columns, remove any columns that are not necessary for uniqueness. The primary key is unique and therefore always on this list.
- Identify dependencies on non-unique columns. Non-unique columns are either external to all unique columns or contained within a composite unique column.
- Eliminate dependencies on non-unique columns. If column A depends on a non-unique column B, A is removed from the original table. A new table is created containing A and B. B is a primary key in the new table and a foreign key in the original table.
Normalization eliminates redundancy by removing A from the original table. Since the data relating A and B is recorded in a new table, no information is lost.
Terminology
In E. F. Codd’s original paper on the relational model, normalization meant achieving first normal form. Over time, however, normalization has come to mean achieving higher normal forms.

- The Registration table lists student registration for courses by term.
- RegistrationCode is a unique column.
- (StudentID, CourseNumber, Term) is a composite unique column.
- CourseName and Credit depend on CourseNumber, which is not unique. Registration is not in Boyce-Codd normal form.
- Redundancy is eliminated by removing CourseName and Credit. RegistrationNew is in Boyce-Codd normal form.
- CourseNumber, CourseName, and Credit are tracked in a new Course table.
- All dependencies in Course are on a unique column. Course is in Boyce-Codd normal form.
Denormalization
Boyce-Codd normal form is ideal for tables with frequent inserts, updates, and deletes. In a database used primarily for reporting, changes are infrequent and redundancy is acceptable. In fact, redundancy can be desirable in reporting databases, as processing is faster and queries are simpler. Therefore, reporting databases may contain tables that, by design, are not in third normal form.
Denormalization means intentionally introducing redundancy by merging tables. Denormalization eliminates join queries and therefore improves query performance. Denormalization results in first and second normal form tables and should be applied selectively and cautiously.
In the figure below, the Booking, Passenger, and Fare tables are denormalized into a single Booking table. The red highlight indicates redundancy in the denormalized table.

Database design
As tables and keys are specified, the database designer reviews each table for Boyce-Codd normal form. Dependencies and unique columns are identified. If any dependencies are not on unique columns, the table is decomposed into smaller tables in Boyce-Codd normal form. Tables that experience infrequent inserts, updates, and deletes may be denormalized to simplify and accelerate SELECT queries.
