Where To Start with an Entity Relationship Diagram
In my previous article An Entity Relationship Diagram Isn’t Just For Database Design I talked about the usefulness of Entity Relationship Diagrams.
Richard Larson did an excellent overview of some of the more technical considerations when putting together an ERD For The Love Of Data
But one of my colleagues recently asked me for help, as they hadn’t done one before and weren’t even sure where to start. If you’ve no experience of putting together an entity relationship diagram, it can be quite daunting to take that first step.
As a consequence, I’ve put this together for the benefit of those who fall more into that “where do I start?” category, and to explain how you might start to document your ERD.
Write a ‘user story’ or summary to describe what you’re trying to do.
If you’re looking at a specific process or application, write down what it is you’re trying to do. Those who have worked in an Agile environment will be familiar with user stories but for those that aren’t, they’re a simple way of describing a product feature, or features, from the point of view of a user of that product or feature.
For example, if you were building a job application system, your first story might be something like “As a job searcher, I want to see jobs that match my profile by company and location”. Another might be “As a recruiter, I want to see details of candidates for roles we’ve advertised and the status of various applications”.
Writing down a succinct overview or summary of what you’re trying to do will usually suggest a number of entities. Having done this, go through the summary and highlight or underline the nouns.
As an example, my colleague was looking at designing a User Access Management system, and the story for that could be:
“I want to be able to assign access rights to users to allow them access to specified third-party products or in-house applications. I need to be able to know what their job role is, what team & department they work for, what location they work in, what level of access they want, and details about the hardware they use. I also need to see who their line manager is and who the authoriser of the request is.”
The underlined words will be the initial prospective entities.
- Access right
- Third-party product
- In-house application
- Job role
- Organisation (may require multiple entities to represent a structure hierarchy or matrix e.g., Division, Department, Team, etc.)
- Line manager
- Authorising manager
Details about the user, along with the types of access level, would probably be attributes (of User and Access Right respectively) rather than entities.
The story approach is a great way to start when you’re faced with a blank sheet of paper.
Analyse your physical data.
If you’re trying to create an ERD from a spreadsheet consisting of lots of rows and columns of data, then you could start by looking at each data item – probably the columns – and decide what the key to that data item might be.
An example might be a spreadsheet showing sales orders over the month, listing:
- Order number
- Customer location
- PO number
- Product line
- Product ordered
- Quantity ordered
- Unit price
- Discount given
- Order date
- Shipping date
- Shipping company
- Date delivered
- Goods Received Note number
- Invoice number
- Invoice amount
Entities that could be derived from this would include:
- Product Line, with a one-to-many relationship to Product. Unit Price might be an attribute of Product, or both Product & Customer if negotiated individually with each Customer.
- Customer Entity, with a one-to-many relationship to Customer Location.
- Discount – might be related to Customer, Product, or be a factor of both if negotiated individually.
- Customer Order, with relationships to both Customer & Product, and including attributes such PO number, Product ordered, Quantity Ordered, Order Date, Shipping Consignment (which would have Shipping Details).
- Shipper would be a separate entity but could be related to Customer (if they always used a specific shipper), Product (if the product required specialised handling) or just Customer Order (if the shipper used just depended on who could deliver it quickest).
- Shipping Consignment, which could involve multiple orders are shipped to multiple customers in one consignment.
- Customer Invoice, which might combine multiple orders for a single customer.
If you’re ‘reverse engineering’ a physical database with defined tables into a logical model, be wary of simply translating those physical tables directly into logical ones, however. Application databases are often implemented in a way that ignores the theories governing logical design. There are good reasons for this; it could be for performance reasons, as too many table joins when retrieving data tend to slow down database performance. It’s often faster to read a big, flatter table than to read a multitude of truly relational tables designed to textbook standards.
Another reason is that textbook database design advocates a technique called ‘Normalisation’ or ‘Third Normal Form’ to produce a logical database design. If you want a truly relational, theoretically ‘pure’ design then normalisation is a good technique to apply. But one of the outcomes of normalisation is that Foreign Keys are removed, whereas in a physical database you need these to be able to link tables efficiently.
Adopt a hybrid approach.
You could quite usefully combine these two approaches, by firstly drafting out your summary and creating a high-level model based on that. Then take your physical data and map that onto the model, assigning the data to attributes within entities. This validates your initial view, and you may find that some don’t fit and you need to define new entities but once you’ve done that initial mapping, you can then carry out the normalisation process to refine your model into a fully logical one.
This technique will be particularly useful if you’re dealing with a large database, with many tables and attributes, or are doing a top-down strategic transformation across an organisation or major functional area.
Putting together a data model can be a daunting task if you’re not experienced (and sometimes even if you are). By approaching the task in a structured way though, you can take a top-down view, leverage existing data assets or use a combination of the two to achieve the desired outcome.