Performance Evaluation of 3rd Normal Form Decompositions
When a relational database is chosen, normalization theory is a set of guidelines that may lead to efficient database designs. Thus, normalization of tables in a database is a common process used for the analysis of relational databases. Sufficient normalization of databases aims to decompose existing relational tables in order to minimize database redundancy while preserving dependencies between attributes. It also facilitates correct insertion, deletion, and modification of data in the database. Given an un-normalized relational database, a redesign with no data redundancy and which is guaranteed to preserve dependencies is not always achievable. Previous studies have shown that it is not always possible to decompose databases into relations complying with the Boyce Codd Normal Form (which eliminates many of the simple redundancies), such as to guarantee the preservation of functional dependencies. Due to this fact, the immediately weaker normalization concept that guarantees the preservation of functional dependencies, the 3rd Normal Form, is generally regarded as an industry standard for many types of database applications. Normalization does not always reduce the size of a given database and frequently increases the modification and retrieval time of a given transaction. Decrease in performance stems from the fact that decomposition causes some queries to reconstruct original relations by joining multiple tables. The complexity of the joins between tables depends on the types of involved attributes and the integrity constraints imposed on those attributes. The effect of normalization on the size of the database depends on the data stored in it. Since there is no guarantee on the effect in terms of size of the database, database designers need to always predict the impact each normalization step may have on the overall database. Thus, database designers always need to select good trade-offs between memory consumed by the database, presence of challenging database maintenance anomalies, and speed at which one can query data from the database. For the above reason, performance minded database designers tend to de-normalize relations that are accessed frequently. The table decomposition process sometimes involves reducing the set of functional dependencies to a minimalistic set which is known as canonical cover. The main focus of this thesis is to examine the query performance of distinct versions of the same database where all these versions in the 3rd Normal Form as they are all derived using the same canonical cover. The databases represent the same relations with distinct schemas but are populated with the same data. This allows us to evaluate the impact of normalization approaches on the database space requirements. In order to evaluate the performance of each version of the database we run the same queries on each of them. Here we define the performance of a database as the time taken by the database to execute the query.