Six Things Every Business Analyst Should Know About Data

Written by Dan Tasker on July 16, 2013. Posted in Articles.

1) Just because data is less popular than process doesn’t mean it’s less important

Business users are happy to discuss workflow diagrams representing their processes but few if any appreciate a data model representing the information managed by those workflows. In spite of this lack of appreciation of the role of data in information systems, the reality is that the primary purpose of most business processes is the creating, updating and/or referencing of data. An Accounting system supports the creation of Accounting Transactions related to maintaining Accounts. An Inventory Management system supports maintaining Inventory Items as stock is manufactured or received and orders are fulfilled.

The trick is to talk process throughout requirements analysis, but at the same time think (and record) details about the data involved. When a data issue arises that needs understanding I’ve learned to use screen mockups rather than a data model. For example, a quick sketch of an Order Entry screen with fields containing a few Customer details, Order details and Line Item details. Users easily relate to this and for the duration of the discussion data is the focus, not process.

2) Users operate in the real world and don’t have time to theorise about data

Business requirements are supposed to be implementation independent, whereas a data model is supposed to be logical or conceptual. Those terms may be meaningful to a Business Analyst but to a user they are theoretical, not real.

When data modelling was first proposed, new terms were used to distinguish logical from physical. The terms Entity, Attribute and Relationship were introduced. Process modelling has never required special ‘logical’ terms for Function, Process or Activity. So why can’t we use the original business friendly data terms when doing logical data modelling? For example, saying to a group of business users, “Here is the Customer record.

It contains Name and Contact Detail fields and there is a link so you can find the Order records for that Customer.”

Business users don’t care about data models and don’t want ‘theory’. So speak their language using the data terms they are comfortable with.

NOTE: Because the primary audience of this blog post are Business Analysts I will continue to use the terms Entity, Attribute and Relationship in this article.

3) The majority of entities belong only in detailed requirements.

I like to divide entities into three categories:

Big “E” Entities – the primary data concepts within an organisation. One or more databases containing records will exist and those records will have an ‘identifier’ assigned. Examples include Customer, Invoice, Account, Purchase Order, Inventory Item and Asset. NOTE: Big E Entities should be used when defining scope or high level business needs. Specifying their Attributes and Relationships should be left to detailed requirements.
Small “e” entities – containers for attributes that support Big E Entities. For example, if a Customer has multiple delivery addresses then Delivery Address can be treated as a Small e entity containing address details with a relationship back to the Customer. Identifying Small e entities, their attributes and relationships should be part of detailed requirements.
Micro Entities – simple sets of values, very often applying to a single attribute. Examples include Industry Type and Credit Status. Enumerating value sets is definitely a part of detailed requirements.

NOTE: One organisation’s Small e entity can be another organisation’s Big E Entity. In the example above, Delivery Address would contain a simple set of address details related to a single customer. There is no business advantage ‘reusing’ a given address if two customers live at the same location. However, in organisations that provide services to addresses (e.g. utility companies) Service Address is a Big E Entity. It is reused as customers move in and out and it has associations to other entities (e.g. plant or equipment installed at that location).

4) Relationships and attributes are both detail level

It is easy to believe that a data model that only involves relationships is ‘higher level’ than one that includes attributes. This may be true if by relationship you mean just a simple line connecting two entities, indicating that there is some relationship between the entities (to be defined in detail later on).

Just like attributes are not fully defined by just their name, relationships have properties that must be defined as part of detailed requirements. Attributes need a data type specified and depending on the type, other details. Relationships need cardinality defined (e.g. how many of entity A can or must relate to entity B and vice versa).

Including relationships between entities (even Big E Entities) can be a slippery slope prior to detailed requirements analysis. You need to be aware that the data model is not ‘high level’ it contains some, but not all of the detail. And getting business users to sign off incomplete work is never an easy task.

5) Real keys are meaningless numbers, facts help users identify instances

As mentioned in point #3, many Big E Entities will exist currently along with an identifier that the business has grown accustomed to (rightly or wrongly). Often they want any new system that will be replacing the current system to carry on using those ‘keys’ (i.e. newly created instances getting assigned keys that preserve whatever scheme the old system was using).

What those users really want is to be able to identify instances using something they know. If the old keys are a composite of one or more facts (like an Employee ID being made up of the first four characters of a person’s surname plus their year of hire plus a couple of digits thrown in for uniqueness) then the users are saying they want to be able to identify an employee by knowing a surname. If that isn’t enough, the system better present some additional details so it’s possible to distinguish between two people with similar surnames.

NOTE: Users will usually accept an improved (meaningless) key if their current key values are migrated and available as an alternate identifier for locating the instances they know and love.

6) Truly mandatory data or nice to have mandatory?

Truly mandatory data relates to the very definition of the entity involved. For example, an Order is only truly an order if it has at least one item being requested. Another example is a Car Rental requiring a value for Drop-off Location, but only in cases where the customer is not returning the car to the same location (i.e. a one-way rental).

Business users will often ask for something to be considered mandatory because the information has added business value. An example is the business wanting every cancelled Customer Order to include a Cancellation Reason (possibly selected from a value list or else free-form text).

The first two examples involve data that is critical to correct functioning of any system that maintains those entities. Such attributes or relationships should be identified as mandatory as part of documenting detailed requirements. With the third example, Cancellation Reason data will require extra effort on the part of the business to ensure data quality. Otherwise busy users will be able to pick any reason or enter enough text to satisfy the system and move on.

Users appreciate being reminded that they have neglected to provide truly necessary information. Conversely they get annoyed when not allowed to proceed until they have ‘completed the form’. When business users want ‘nice to have’ data made mandatory they need to understand there is a risk regarding the quality of the results.

In conclusion, to be effective as a Business Analyst eliciting business requirements, you should recognise the importance of data but at the same time understand that users want to talk about processes. When you do speak about data, use screen mockups rather than data models and stick to business-friendly terms such as Record, Field and Link. Recognise the most critical [Big E] Entities at the start and use them for scoping, while leaving [Small e] entities and Micro Entities for detailed requirements. Avoid dealing with Relationship details until detailed requirements. Analyse what facts users want to use to identify instances but don’t string those facts together in an attempt to define a key. And when it comes time to identify mandatory attributes and relationships, ensure that users understand the impact of ‘nice to have’ mandatory fields on the system users and the quality of the data.

What are your thoughts and personal experience?

Don’t forget to leave your comments below.