Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

ERD and RS

Learning Goals

By the end of this section you will:

  • be able to develop Entity Relationship Diagrams by:

    • identifying the entities in a data scenario

    • categorizing attributes under the entities

    • establishing the relationships between the entities

    • resolving many-to-many relationships

    • identifying foreign keys

  • be able to develop a Relational Schema from the Entity Relationship Diagram

  • use sample table to check the Relational Schema

The purpose of the ERD and Relational Schemas are to design and represent the database that forms your application’s datastore (model).

The first step in designing a database is developing an understanding of the scenario. For learning about designing databases we will use the following scenario:

EDR — Entity Relationship Diagram

An Entity Relationship Diagram (ERD) is a type of flowchart that illustrates how “entities” such as people, objects or concepts relate to each other within a system. lucidchart_2017_er

In following the process of creating a ERD, you will create a relational database that complies with normalisation rules to the level of 3NF.

Watch the videos below to understand designing an ERD.

We will use the following steps to create our Entity Relationship Diagrams, using the Student Subject Database as an example:

Step 1: Identify the entities

Look at the scenario and the data needs and identify all the entities that the database needs to store. Entities can be people, places, things or processes.

In the Student Subject Database we have the following entities.

ERD Step 1

Step 2: Add attributes

Refer back to your data needs and requirements and add the relevant attributes to each entity.

The Student Subject Database ERD after stage 2:

ERD Step 2

Step 3: Establish relationships and cardinality

The next step is to create the relationship between the entities:

  1. Connect the related entities using lines.

  2. Establish the cardinality by working out the range of connections each entity can have to the second entity.

Cardinality options

ERD cardinality options

For example in the Student Subject Database:

You will end up with the following ERD

ERD Step 3

Step 4: Resolve many-to-many relationships

In this course we will use bridging entities to deal with many-to-many relationships.

To do this:

  1. find a many-to-many relationship

  2. place a bridging entity between the two entities

  1. reconnect the relationships

  2. establish the new cardinality

  1. enter the primary keys from the original entities as the two parts of the composite key for the bridging entity

  2. add any other relevant attributes

For our Student Subject Database:

At the end of step 4 our ERD looks like this:

ERD Step 4

Step 5: Identify the foreign keys

Identifying the foreign keys establishes how the different entities are connected together. If there is a relationship drawn between entities, then there must be a primary key—foreign key connection.

Steps to establish foreign keys

Our Student Subject Database example at the end of step 5:

ERD Step 5

Relational Schema

To convert your final ERD to a Relational Schema (RS) you will need to identify the datatype of each field.

We will be using SQLite which provides the following data type options:

Since Diagram.net does not provide a three column entity table, we will separate the attribute name and data type using the | symbol.

The RS for the Student Subject Database will look like:

ERD Step 6

Sample Tables

Sample tables are used to ensure that the database structure that the ERD produced can effectively store the required data adequately.

You should create tables that reflects the entities identified and attributes identified by the ERD. Then fill the table with sample data. This data may be given to you, or you may produce mock data that is representative of data that will be stored in the database. A good resource for creating mock data is mockaroo.

Below are the sample tables for the Student Subject Database.

Teacher tableStudent tableSubject tableEnrolments table

ERD from Data Files

For Unit 3 you will need to analyse data files to create a relational schema. We will now look at how to analyse a data file and create a relational schema from it.

We will be using cvs files as our datafile. It is importnat to note that csv file can only represnet flat files, while we will be working with relational databases. Therefore, we will need to decompose the data in the csv file into multiple tables.

Before we do this it will be useful to remember the rules of Normalization.

A table is in 3NF if:

  1. Each column contains atomic values

  2. Each column contains the same type of data

  3. Each column and row has a unique identifier

  4. The order in which the data is saved does not matter

  5. There are no partial dependencies

  6. There are no transitive dependencies

While we can look at the raw csv file as a datafile, it is easier to analyse if we open it in Excel or similar software.

View csv file in Excel

Navigate to the data folder in the unit repository and open the file F1Driver_Dataset.csv in Excel.

The first thing Excel will want to do is convert the data in the file. Do want to see the data as it is, so select Don’t Convert, then get rid of the yellow banner warning of dataloss, by click on the x on the right.

Since the table is almost 900 records long, we want a way to easily look over all the data. Fortunately Excel has a feature called Filters that will help us with this.

To turn filters on:

  1. Go to the Data tab in the ribbon

  2. Click on Filter

You should now see little arrows next to each column header. Clicking on these will allow you to see all the unique values in that column, and filter the data based on these values.

Non-atomic values

The first thing we will do is to check that each column only contains atomic values.

You will notice that both the Seasons and the Champioship Years columns contain multiple values in a single cell, therefore non-atomic values.

non-atomic values

When you find non-atomic values in a column, this means that this column will need to be split into a separate table. Note, that once you have found one non-atomic value in a column, you don’t need to check the rest of that column.

So we can see that we will need to create two tables in addition to the main table, one for Seasons and one for Championship Years.

Open Draw.io and use the Entity Relation symbols to create these three tables.

f1_driver_1  

Relationships and Cardinalities

Now add the relationships and cardinalities between the tables.

f1_driver_2

Remove the many-to-many relationship by a bridging entity.

f1_driver_3

Notice the one-to-one relationship between Seasons and Champions. This means that we can merge these two tables into one.

f1_driver_4

Partial Dependencies

Next we need to check for partial dependencies. Particial dependencies can only occur in tables with composite primary keys. We only have one table with a composite primary key, the Raced table. Since there are no columns in this table that are not part of the primary key, there are no partial dependencies.

If there were partial dependencies, we would need to create new tables to remove them.

Transitive Dependencies

Finally we need to check for transitive dependencies. A transitive dependency occurs when a non-key attribute depends on another non-key attribute.

If we look back at our csv files we will see many columns that can be inferred from other columns. For example, the Pole Rate can be inferred from the Pole_Positions column and the Race_Entries column. This means that the Pole Rate column depends on the Pole_Positions and Race_Entries columns. Therefore it is transitive.

There are many other columns that are also transitive, so we will not include them in our schema.

transitive_dependencies

Final ERD

We will now include the remaining columns in our schema and end up with the final ERD below.

final_schema

From here, you need to follow the Relational Schema and Sample Tables as above.

final_rs