Database Normalization and De-normalization:
Database
normalization is the process of refining the data in accordance with a series
of normal forms. This is done to reduce data redundancy and improve data
integrity. This process divides large tables into small tables and links them
using relationships.
Normalization
gives data meaning by defining relationships with other data. The large table
will be divided into multiple small tables and have relationship between them
to retrieve the required data based on the need.
1NF- First Normal Form:
· Each table cell should contain a
single/atomic value
· Every record in the table must be
unique
· No cell have more than one value.
· No repeating groups - Each column
name should be unique, the repeating columns names should be removed separated
and a new table for these columns.
2NF - Second Normal Form:
· The rules of the second normal form
are
· -Table must be in 1NF
· -All data must be depend on primary
Key
· -Every non-key attribute must be
fully dependent on key attributes( relation using Primary -> Foreign Key
(Parent and child relation)
-
No
partial dependency
3NF - Third Normal Form
· Data must be in 2NF
· No transitive functional
dependencies
Benefits:
· Writes are faster as we have separate
tables
· Used for OLTP systems
· Required data can be selected
· Data integrity maintained
· Reduce redundancy
CONS:
Slower
reads
Heavy
querying not recommended.
Complex
joins required for the data retrieval
What is a transitive
dependency?
In simple
terms, if changing a non-key column causes any other non-key column to change,
then it's called a transitive dependency.
Composite Key
Sometimes it’s
hard to define unique records with one single column. In such cases, we can
have two or more columns that uniquely identify each record in a table. Such
columns are known as composite key.
De-normalization:
· Basically, the process of taking a
normalized schema and making it non-normalized is called de-normalization where
will be adding required duplicated columns to exiting tables for easy data
retrieval.
Pros:
· Used for data warehouse and OLAP
systems
· More disk space as will redundant data
in multiple tables
· Less no.of joins
· Makes reads faster
· Can we used analysis purpose
· Speed up select operation.
Cons:
· Slower writes
· Potential data inconstancy
· Requires more storage
Comments
Post a Comment