Introduction
The process by which an exiting database schema is modified to organise the data in an efficient, non-redundant manner is called Normalization. The primary goal of the process is to reduce data redundancy across tables and ensure that the data dependencies are meaningful, so that the data is logically stored and consumes less space. The process was first put forward by Edgar F Codd in his paper A Relation Model of Data for Large Shared Data Banks. Describing in generic terms the process decomposes the original non normalized table into two or more relations such that they convey the same meaning as the original table.
Problems addressed by normalization
Just to highlight various problems that can be faced while designing a database let's presume that we are to maintain the online library for the Browser Library Store. We will be needed to have a track of information pertaining to various books, publishers and authors. Initially we consider a single table for the purpose with fields such as Title, ISBN, Author, AuthorBio, Subject, Publisher, Price, Pages. Now using this schema for our database, can lead to various inconsistencies and anomalies. Storing unrelated group of information in a single table results in multiple records having same information, as in having same author for multiple books in this case leads to repeating the author information over these records. An improper/missing updating of any of the records leaves the database in an inconsistent state and can lead to false query results. This phenomenon is known as updating anomaly.
We won't be able to enter any information pertaining to author or publisher without a book being housed in our library or even cannot have multiple entries per book as we are having ISBN as the primary key here. This is known as insertion anomaly.
Similarly we can't delete record for any of the books from our library without deleting the author information, the scenario better known as the deletion anomaly.
Ideally, we should design our database so as to avoid these inconsistencies and anomalies, by normalizing the tables and enforcing referential integrity between them.
What are normal forms
A set of guidelines laid down by the database community for ensuring that the databases are normalized. The normal forms describe the degree of strictness of a table. The lesser the degree of normalization for a table, the greater are the chances of it having logical inconsistencies and data anomalies. The process tends to slow down the data retrieval, since data which may have been retrievable from one record in an non normalized design may have to be retrieved from several records in the normalized form. Therefore the rule of the thumb says that normalize while it doesn't have a measurable effect on performance. A typical database is normalized up to the third normal form, where in some cases may require the schema to consider higher degree of normalization. Lets take a look at what each of these normal forms has to offer us.
First Normal Form
The first normal form should be the simplest one to conform to. This rule calls for eliminating repeating groups in individual tables and creation of separate table for each related set of data. Each set of related data is further identified with a primary key. This requires values in each table to be atomic. In the example alluded to above, we have non atomic column author, which could be further broken into fields like first name, last name for better retrieval. Similarly there could be a subject field which needs to be atomic. We could achieve 1NF conformance by breaking down the original table into separate tables with related data. So we end up creating new tables namely Book, Author, Publisher, Subject and some tables to establish relations between them, which are all in 1NF.
| ISBN Number | Title | Price | PublisherID |
|---|---|---|---|
| 0321194871 | Exceptional C++ | $79 | 1 |
| 2119487103 | Introduction to Database System | $126 | 2 |
| 2342395603 | Exceptional C++ Style | $89 | 1 |
| 4545323233 | Inside C++ Object Model | $66 | 3 |
The information for the author, subject and publisher are taken out of the original table and the entire related set of information is moved to separate tables. The relation between the Book table and the Publisher table is modelled using the primary key of 'many' as the foreign key of the 'one' in the relation. Since the books and authors, books and subjects have many to many relationship the same is to be modelled as a separate tables.
Author Table
| AuthorID | First Name | Last Name | AuthorInfo |
|---|---|---|---|
| 1 | Herb | Sutter | One of the most prominent C++ experts. |
| 2 | Stanley | Lippman | Architect with the Visual C++ development team at Microsoft |
| 3 | Chris | Date | Specializing in relational database technology |
The author table contains the information about the author with AuthorID as the primary key.
| PublisherID | Name | Address | City | State | Zip | Phone |
|---|---|---|---|---|---|---|
| 1 | Addison-Wesley | 75 Arlington Street, Suite 300 | Boston | MA | 02116 | 201-767-5021 |
| 2 | Prentice Hall | Upper Saddle River | New Jersey | New Jersey | 07458 | 800-677-7377 |
| 3 | Sams Publishing | 800 East 96th Street | Indianapolis | Indiana | 46240 | 800-889-3358 |
Subject Table
| SubjectID | Name |
|---|---|
| 1 | C++ |
| 2 | Database |
In order to establish the relationship between the books, authors and subject we'll consider two more tables viz. BookAuthor table and BookSubject table.
BookAuthor Table
| ISBN Number | AuthorID |
|---|---|
| 0321194871 | 1 |
| 2119487103 | 3 |
| 2342395603 | 1 |
| 4545323233 | 2 |
This table models the many to many relationship between books and authors. A book may have more than one authors, similarly an author may have penned a number of books.
BookSubject Table
| ISBN Number | SubjectID |
|---|---|
| 0321194871 | 1 |
| 2119487103 | 2 |
| 2342395603 | 1 |
| 4545323233 | 1 |
The relationship between books and subjects is again many to many as a book may be covering more than one subjects, similarly a subject may have a number of books on it.
The only drawback of such normalization process is that we might need to use joins for even a simple cross table query.
The article will be updated soon...

What people said... (0)
Post a Comment