When you are physical data modeling the following tasks are performed in an iterative manner:
- Identify tables. Tables are the database equivalent of classes; data is stored in physical tables. As you can see in Figure 1 the university has a Student table to store student data, a Course table to store course data, and so on. Figure 1uses a UML-based notation (this is a publicly defined profile which anyone can provide input into). If you have a class model in place a good start is to do a one-to-one mapping of your classes to data tables, an approach that works well in “greenfield” environments where you have the luxury of designing your database schema from scratch. Because this rarely happens in practice you need to be prepared to be constrained by one or more legacy database schemas which you will then need to map your classes to. In these situations it is unlikely that you will need to do much data modeling, you will simply need to learn to live with the existing data sources, but you will need to be able to read and understand existing models. In some cases you may need to perform legacy data analysis and model the existing schema before you can start working with it.
- Normalize tables. Data normalization is a process in which data attributes within a data model are organized to increase the cohesion of tables and to reduce the coupling between tables. The fundamental goal is to ensure that data is stored in one and only one place. This is an important consideration for application developers because it is incredibly difficult to stores objects in a relational database if a data attribute is stored in several places. The tables in Figure 1 are in third normal form (3NF).
- Identify columns. A column is the database equivalent of an attribute, and each table will have one or more columns. For example, the Student table has attributes such as FirstName and StudentNumber. Unlike attributes in classes, which can either be primitive types or other objects, a column may only be a primitive type such as a char (a string), an int (integer), or a float.
- Identify stored procedures. A stored procedure is conceptually similar to a global method implemented by the database. In Figure 1 you see that stored procedures such as averageMark() and studentsEnrolled() are modeled as operations of the class UniversityDB. These stored procedures implement code that work with data stored in the database, in this case they calculate the average mark of a student and count the number of students enrolled in a given seminar respectively. Although some of these stored procedures clearly act on data contained in a single table they are not modeled as part of the table (along the lines of methods being part of classes). Instead, because stored procedures are a part of the overall database and not a single table, they are modeled as part of a class with the name of the database.
- Apply naming conventions. Your organization should have standards and guidelines applicable to data modeling, and if not you should lobby to have some put in place. As always, you should follow AM’s practice of Apply Modeling Standards.
- Identify relationships. There are relationships between tables just like there are relationships between classes. The advice presented relationships in UML class diagramsapplies.
- Apply data model patterns. Some data modelers will apply common data model patterns, David Hay’s (1996) book Data Model Patterns is the best reference on the subject. Data model patterns are conceptually closest to analysis patterns because they describe solutions to common domain issues. Hay’s book is a very good reference for anyone involved in analysis-level modeling, even when you’re taking an object approach instead of a data approach because his patterns model business structures from a wide variety of business domains.
- Assign keys. A key is one or more data attributes that uniquely identify a row in a table. A key that is two or more attributes is called a composite key. A primary key is the preferred key for an entity type whereas an alternate key (also known as a secondary key) is an alternative way to access rows within a table. In a physical database a key would be formed of one or more table columns whose value(s) uniquely identifies a row within a relational table. Primary keys are indicated using the <> stereotype and foreign keys via <>. Read here for more about keys.
Although similar notation is used it is interesting to note the differences between the PDM of Figure 21 and the UML class diagram from which is ti based:
- Keys. Where it is common practice to not model scaffolding properties on class models it is common to model keys (the data equivalent of scaffolding).
- Visibility. Visibility isn’t modeled for columns because they’re all public. However, because most databases support access control rights you may want to model them using UML constraints, UML notes, or as business rules. Similarly stored procedures are also public so they aren’t modeled either.
- No many-to-many associations. Relational databases are unable to natively support many-to-many associations, unlike objects, and as a result you need to resolve them via the addition of an associative table. The closest thing to an associative table in is WaitList which was introduced to resolve the on waiting list many-to-many association depicted in the class diagram . A pure associative table is comprised of the primary key columns of the two tables which it maintains the relationship between, in this case StudentNumber from Student and SeminarOID from Seminar. Notice how in WaitList these columns have both a PK and an FK stereotype because they make up the primary key of WaitList while at the same time are foreign keys to the other two tables. WaitList isn’t truly an associative table because it contains non-key columns, in this case the Added column which is used to ensure that the first people on the waiting list are the ones that are given the opportunity to enroll if a seat becomes available. Had WaitList been a pure associative table I would have applied the associative table stereotype to it.
I will often use a CASE tool to create physical data models. The two features I require of a data modeling tool are the ability to generate data definition language (DDL) code required to create the database schema and the ability to reverse engineer a data model from an existing database schema. Virtually all data modeling tools still on the market today support these features.
This artifact description is excerpted from Chapter 12 of The Object Primer 3rd Edition: Agile Model Driven Development with UML 2.