Where To Start with an Entity Relationship Diagram

Written by Colin Savage on June 30, 2021. Posted in Articles.

In my previous article An Entity Relationship Diagram Isn’t Just For Database Design I talked about the usefulness of Entity Relationship Diagrams.

Richard Larson did an excellent overview of some of the more technical considerations when putting together an ERD For The Love Of Data

But one of my colleagues recently asked me for help, as they hadn’t done one before and weren’t even sure where to start. If you’ve no experience of putting together an entity relationship diagram, it can be quite daunting to take that first step.

As a consequence, I’ve put this together for the benefit of those who fall more into that “where do I start?” category, and to explain how you might start to document your ERD.

Write a ‘user story’ or summary to describe what you’re trying to do.

If you’re looking at a specific process or application, write down what it is you’re trying to do. Those who have worked in an Agile environment will be familiar with user stories but for those that aren’t, they’re a simple way of describing a product feature, or features, from the point of view of a user of that product or feature.

For example, if you were building a job application system, your first story might be something like “As a job searcher, I want to see jobs that match my profile by company and location”. Another might be “As a recruiter, I want to see details of candidates for roles we’ve advertised and the status of various applications”.

Writing down a succinct overview or summary of what you’re trying to do will usually suggest a number of entities. Having done this, go through the summary and highlight or underline the nouns.

As an example, my colleague was looking at designing a User Access Management system, and the story for that could be:

“I want to be able to assign access rights to users to allow them access to specified third-party products or in-house applications. I need to be able to know what their job role is, what team & department they work for, what location they work in, what level of access they want, and details about the hardware they use. I also need to see who their line manager is and who the authoriser of the request is.”

The underlined words will be the initial prospective entities.

Access right
User
Third-party product
In-house application
Job role
Organisation (may require multiple entities to represent a structure hierarchy or matrix e.g., Division, Department, Team, etc.)
Location
Hardware
Line manager
Authorising manager

Details about the user, along with the types of access level, would probably be attributes (of User and Access Right respectively) rather than entities.

The story approach is a great way to start when you’re faced with a blank sheet of paper.

Analyse your physical data.

If you’re trying to create an ERD from a spreadsheet consisting of lots of rows and columns of data, then you could start by looking at each data item – probably the columns – and decide what the key to that data item might be.

An example might be a spreadsheet showing sales orders over the month, listing:

Order number
Customer
Customer location
Salesperson
PO number
Product line
Product ordered
Quantity ordered
Unit price
Discount given
Order date
Shipping date
Shipping company
Date delivered
Goods Received Note number
Invoice number
Invoice amount

Entities that could be derived from this would include:

Product Line, with a one-to-many relationship to Product. Unit Price might be an attribute of Product, or both Product & Customer if negotiated individually with each Customer.
Customer Entity, with a one-to-many relationship to Customer Location.
Discount – might be related to Customer, Product, or be a factor of both if negotiated individually.
Customer Order, with relationships to both Customer & Product, and including attributes such PO number, Product ordered, Quantity Ordered, Order Date, Shipping Consignment (which would have Shipping Details).
Shipper would be a separate entity but could be related to Customer (if they always used a specific shipper), Product (if the product required specialised handling) or just Customer Order (if the shipper used just depended on who could deliver it quickest).
Shipping Consignment, which could involve multiple orders are shipped to multiple customers in one consignment.
Customer Invoice, which might combine multiple orders for a single customer.

If you’re ‘reverse engineering’ a physical database with defined tables into a logical model, be wary of simply translating those physical tables directly into logical ones, however. Application databases are often implemented in a way that ignores the theories governing logical design. There are good reasons for this; it could be for performance reasons, as too many table joins when retrieving data tend to slow down database performance. It’s often faster to read a big, flatter table than to read a multitude of truly relational tables designed to textbook standards.

Another reason is that textbook database design advocates a technique called ‘Normalisation’ or ‘Third Normal Form’ to produce a logical database design. If you want a truly relational, theoretically ‘pure’ design then normalisation is a good technique to apply. But one of the outcomes of normalisation is that Foreign Keys are removed, whereas in a physical database you need these to be able to link tables efficiently.

Adopt a hybrid approach.

You could quite usefully combine these two approaches, by firstly drafting out your summary and creating a high-level model based on that. Then take your physical data and map that onto the model, assigning the data to attributes within entities. This validates your initial view, and you may find that some don’t fit and you need to define new entities but once you’ve done that initial mapping, you can then carry out the normalisation process to refine your model into a fully logical one.

This technique will be particularly useful if you’re dealing with a large database, with many tables and attributes, or are doing a top-down strategic transformation across an organisation or major functional area.

Summary

Putting together a data model can be a daunting task if you’re not experienced (and sometimes even if you are). By approaching the task in a structured way though, you can take a top-down view, leverage existing data assets or use a combination of the two to achieve the desired outcome.

Colin Savage

Colin is a Business Analyst with 35 years’ experience across various sectors, but mainly Financial Services. I've worked on a number of strategic transformation programmes and am currently working for the UK Government employee on a major Payments transformation programme in an Agile environment. LinkedIn profile at https://www.linkedin.com/in/colin-savage-7a94a410/

Where To Start with an Entity Relationship Diagram

Write a ‘user story’ or summary to describe what you’re trying to do.

Analyse your physical data.

Adopt a hybrid approach.

Colin Savage