NormaliZation in DBMS !

Suyash Thonte
5 min readNov 23, 2020

Hey Guys!! So , first of all we will see what exactly Normaliztion is

Normalization : Normalization is used to organize the data and remove redundancy(repetition) of data . Normalization actually decomposes (divides) larger table into smaller ones and ensure that the data is logically stored without any repetition.

So , why Normalization came into existance ??

The normalization process was created largely in order to reduce the negative effects of creating tables that will introduce anomalies into the database.

Now, what is anomalies??

Anomalies is occured when there is too much redundancy in data. or it can also occure when there is poor construction . what poor construction means here is, when the designer creates the database and doesn’t identify the entities that depend on each other for existence.

there are 3 type of anomalies

  1. Insertion anomalies.
  2. Deletion anomalies.
  3. Updation anomalies .

Update Anomalies happen when the person charged with the task of keeping all the records current and accurate, is asked, for example, to change an employee’s title due to a promotion. If the data is stored redundantly in the same table, and the person misses any of them, then there will be multiple titles associated with the employee. The end user has no way of knowing which is the correct title.

Insertion Anomalies happen when inserting vital data into the database is not possible because other data is not already there. For example, if a system is designed to require that a customer be on file before a sale can be made to that customer, but you cannot add a customer until they have bought something, then you have an insert anomaly. It is the classic “catch-22” situation.

Deletion Anomalies happen when the deletion of unwanted information causes desired information to be deleted as well. For example, if a single database record contains information about a particular product along with information about a salesperson for the company and the salesperson quits, then information about the product is deleted along with salesperson information

So far now we have disscused about the problem more, rather than disscussing on solution so, lets dive into solution more deeper.

There are basically 7 normalization form

1. 1NF (First Normal Form)

2. 2NF (Second Normal Form)

3. 3NF (Third Normal Form)

4. BCNF (Boyce-Codd Normal Form)

5. 4NF (Fourth Normal Form)

6. 5NF (Fifth Normal Form)

7. 6NF (Sixth Normal Form)

The Theory of Data Normalization in SQL is still being developed further. But in most practical applications, normalization achieves its best in 3rd Normal Form !!

1 . First Normal Form (1 NF) :

In this Each record needs to be unique i.e Each table cell should contain a single value.

after 1 NF →

2. Second Normal Form (2 NF) :

The very first condition for 2 NF is be in 1 NF and there should be single column Primary key .

after 2 NF →

The prime key attributes are Stu_ID and Proj_ID

Non-key attributes are Stu_Name and Proj_Name which should be depend on primary key attributes But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

3 . Third Normal Form :

For a relation to be in Third Normal Form, it must be in Second Normal form and should not have any transitive functional dependencies .

after 3 NF →

Here, Stu_ID is the key and only prime key attribute and City can be identified by Stu_ID as well as Zip itself. zip is not superkey and city is not prime attribute Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations

4 . Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF is also referred as 3.5 Normal Form.

So it states that, For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation ZipCodes. So,

Stu_ID → Stu_Name, Zip and

Zip → City So, both the relations are in BCNF.

5 . Fourth Normal Form (4 NF) :

4 NF came into existance when there is Multi-valued Dependency occur in any relation

s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey.And in the table below, there is no relationship between the columns course and hobby. They are independent of each other.

So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.

After 4 NF →

The functionally dependent columns are moved in a separate table and the multi-valued dependent columns are moved to separate tables.

6 . Fifth Normal Form (5 NF) :

  • A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
  • 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
  • 5NF is also known as Project-join normal form (PJ/NF).

7 . Sixth Normal Form (6 NF) :

In 6NF, the relation variable is decomposed into irreducible components. A relation is in 6NF, only if, It is in 5NF, and every join dependency on the relation is trivial.

So this was all about Normalization and its various forms and how it is used in Database Management System. Will See you guys in next blog!

--

--

Suyash Thonte

Senior Software Engineer @ Bounteous | Full Stack Developer