|
Published in The Navigator, Summer 1998 Copyright © 1998, Larissa T. Moss, Method Focus Inc. The most compelling reason for creating an Entity-Relationship model is to understand the data content and the data relationships of a business. Without a thorough understanding of the business data one cannot construct a fully integrated and flexible database which can support access to all the defined data through all the defined relationships. Traditionally we used process modeling to design operational systems. Why? Because an operational system is a stand-alone system which needs to perform a specific business function. That business function needs to manipulate a certain subset of business data. Therefore, we collect and store that data in the most efficient way for the processes (programs) to perform their functions. This is different in a data warehouse. In a data warehouse we want to provide integrated data across all business functions. Data warehouses do not have a specialized set of processes (programs) to perform specific business functions; instead a data warehouse must be designed in such a way that processes not yet conceived can be performed against it in the future. This means that all the data relationships pertinent to the business must be known and understood, because they become the access paths to the data. To achieve this we use the data modeling technique instead of the process modeling technique. The major difference is the collection and storage of data. In process modeling, data from different business objects is combined and stored in data stores around the processes, whereas in data modeling, data is collected and stored separately for each business object (or entity). Storing the data separated by business objects and connecting these business objects according to the business rules (or policies), allows for maximum flexibility to access all the data in any way. The process of data modeling involves the following steps:
The last step in an Entity-Relationship data model is to ensure that each data element is either a unique identifier or a descriptive attribute of the business object it belongs to. To do this we use a technique called Normalization. The result is a data model in 3rd Normal Form. This technique is a check (or audit) of the attributes. We compare each non-key attribute to it's key and determine if:
If any of these conditions exist, as illustrated by the examples below, the entity should be normalized; i.e. split into two entities.
Repeating group
Example: Crs-Nr. Course Title Year Semester(1) #Students(1) Sem.(2) #Stud.(2)... A333 DW modeling 1997 Spring 22 Summer 18If each semester was stored on a separate row it would produce this redundancy:
Crs Nr. Course Title Year Semester #Students
A333 DW modeling 1997 Spring 22
A333 DW modeling 1997 Summer 18
A333 DW modeling 1997 Fall 26
A333 DW modeling 1997 Winter 33
Normalize into:
Entity COURSE: Course Nr. Course Title
Entity ATTENDANCE: Course Nr. Year Semester # Students
Fully dependent on all components of a composite key
In that case all non-key attributes must depend on the entire primary key (all of its attributes). Otherwise, like in the previous case, we may be repeating the values of those attributes that do depend on the entire key for each attribute that doesn't. Example:
Crs Nr. Student ID Grade # Units (#Units is not dependent on
A333 12345 A 3 Student ID; creates redundancy)
A333 34567 B 3
A333 56789 C 3
Normalize into:
Entity COURSE: Course Nr. # Units
Entity GRADE: Course Nr. Student ID Grade
Fully dependent on the key and not on a non-key attribute
Example:
Crs Nr. # Units Tuition (Tuition$ is dependent on #Units,
A333 3 $995.00 regardless of Course Nr.)
A111 3 $995.00
A444 3 $995.00
A567 2 $795.00
Normalize into:
Entity COURSE: Course Nr. # Units
Entity TUITION: # Units Tuition $
The most flexible relational design is a data model in 3rd normal form. However, because of performance considerations and its complexity of access, it is rarely possible or desirable to implement such a model without some denormalization. Regrettably, and much too frequently, developers and managers use this fact as an excuse to skip the process of Entity-Relationship modeling. This often results in major database redesigns, or worse, in stand-alone systems, because some important business relationships were misunderstood or missed entirely.
|