In this blog we are going to see about Normalization and its basic types in a simple and clear manner. Questions on this topic is common in many Sql interviews. Since Sql is base for any Data related work, we can expect this questions for some of the Data Engineer interview as well. Lets get started.
"We are going to Normalization to avoid Redundancy and Anomalies"
The above line should be your base answer for the Normalization Explanation.
Now let us see about the Anomalies, We are going to give scenario occurence example for each Anomalies.
Insertion Anomalies:
When a new employee enters into a company but his Department is not added initially, So it will be Null.
Updation Anomalies:
If the employee leave the Department, then all the dates must be updated for him. If we miss anything it will leads to error.
Deletion Anomalies:
Let us consider a table with column names as Id, Name, Price . If we delete a record with price=300 then the entire record will with the price =300 will get deleted, that is we lost the data of Id and Name of the particular product as well. So, we lose records.
Types of Normalization:
Now let us see the types of Normalization with examples one by one
i) 1st Normal Form:
The Table must have Atomicity. That is No cells have multiple values.
Roll No |
Name |
Subject |
1 |
A |
Tamil, English |
2 |
B |
Java |
From the Above table we can see the Roll No 1 has subject values as Tamil, English. This must not be there for 1st Nomal Form.
Roll No |
Name |
Subject |
1 |
A |
Tamil |
1 |
A |
English |
2 |
B |
Java |
Now, from the table we can see it Tamil, English were split into separate rows. Thus we achieve Atomicity.
ii) 2nd Normal Form:
The table must satisfy 1st Normal Form and it should not be Partial Dependency
St id |
Sub id |
marks |
Teacher |
1 |
10 |
90 |
Ram |
1 |
20 |
100 |
Raj |
2 |
10 |
72 |
Ram |
The Teacher column is dependent on one of the primary key . Therefore it has Partial Dependency.
St id |
Sub id |
marks |
1 |
10 |
90 |
1 |
20 |
100 |
2 |
10 |
72 |
Split Table 2:
Sub id |
Teacher |
10 |
Ram |
20 |
Raj |
iii) 3rd Normal Form:
Table must be satisfy 2nd Normal Form and it should not have Transitive dependency.
Emp id |
Emp Name |
Emp |
Emp state |
Emp city |
10 |
Raju |
201010 |
UP |
Noida |
20 |
ravi |
0228 |
US |
Boston |
The Above table is already in Second Normal Form. But we can see "Emp state" and "Emp city" depends on "Emp Zipcode" but "Emp Zipcode" depends on "Emp id" which is the Primary Key. Hence it is in Transitive Dependency.
Emp id |
Emp Name |
Emp |
10 |
Raju |
201010 |
20 |
ravi |
0228 |
Split Table 2:
Emp |
Emp State |
Emp |
201010 |
UP |
Noida |
0228 |
US |
Boston |
To Over come the Transitive Dependency, now we have split the table into two tables. One with "Emp id" , "Emp Name" and "Emp ZipCode". Another with "Emp ZipCode", "Emp State", "Emp City".
iv) BCNF:
BCNF stands for Boyce Codd Normal Form. It should be satisfy 3rd Normal Form and it should not have Reverse dependency
Std id |
Subject |
Professor |
101 |
Java |
Jack |
101 |
C++ |
Raj |
102 |
Java |
Jack |
103 |
Java |
Max |
The above table is Already in 3rd Normal Form. The primary key is "Std id" and "Subject". Note that in this example the scenario is a professor will handle only 1 subject but one subject can have many professor.
Split table 1:
Std id |
Prof id |
101 |
1 |
101 |
2 |
102 |
1 |
103 |
3 |
Split Table 2:
Prof id |
Professor |
Subject |
1 |
Jack |
Java |
2 |
Raj |
C++ |
3 |
Max |
Java |
To Over come the Reverse Dependency, now we have split the table into two tables. One table contains "Std id" and "Prof id". the "Prof id" column is generated by us. And the another table contains "Prof id", "Professor", "Subject". Thus we overcome the scenario that we faces above and now reverse dependency is removed.
0 Comments