Skip to main content

Well-defined Data Part 1 – Series Introduction

The objective of this series is to take an in-depth look at data required for an IT-based business information system.

Techniques and concepts for business analysts thinking about and documenting entities, attributes, and relationships will be presented. This introduction to the series defines what is meant by well-defined data and the rationale for it.

What Is Well-defined Data?

Data, to be well-defined, should be both well organized and well specified. Well-organized data follows the old proverb, “A place for everything and everything in its place.” In business analysis terms this translates to “An entity for everything and every attribute in its appropriate entity.”
Well-defined data begins with determining the best business name for an entity or attribute. From that point the defining continues, capturing a definition plus details needed to get that data up and running in an IT-based system. Ideally some form of data dictionary would be used to record these details. Throughout this series one example of a data dictionary template will be used. It will be seen to include entity properties such as current volumes and growth rate, and attribute properties such as data formats and precision.

Who Needs Well-defined Data? 

Outdated IT System Replacement — Any organization that wants to build or acquire a new system to replace an outdated one needs the best possible data definition for data in the current system. There will be fields that turned out not to be of any business use. There will be fields originally intended to be used for one thing that ended up being used for something else. And there will be data needed by the business that the current system does not support. Some or all of this unsupported data may be managed by the business outside the current system in spreadsheets and such. The best possible definition of all this data is needed to support designing or acquiring a replacement system, migrating data to it, and training current users where to find the data they need in the new one.

IT System Vendor — Any vendor of commercial off the shelf (COTS) software needs the data underlying its software well defined. This information is used to convince prospective customers of the software’s capabilities, and to respond accurately to requests for quotes (RFQs). When a sale of the software is made, well-defined data is needed to support system configuration, data migration, training, and development of any bespoke reports or interfaces required.

Requirements Documenter — A business analyst responsible for producing requirements documents should include well-defined data in those documents. A high-level requirements document (Stakeholder requirements in IIBA BABOK® terminology) typically will have a glossary rather than a fully-detailed data dictionary. The glossary name and definition will be useful as input to the data dictionary developed later in the project. A detail requirements document (Solution and Transition requirements in IIBA BABOK® terminology) ideally would include a full data dictionary as a central point of definition for entities and attributes referenced in detail specifications for screens, reports, interfaces, and batch processes. 

Other Waterfall SDLC Team Members — Any member of a team involved in waterfall development based on signed-off requirements needs well-defined data. This includes:

  • Designers
  •  Developers
  •  Testers involved in integration, end-to-end, and user acceptance testing
  •  Data migration team members
  •  Trainers — of end-users or train-the-trainers
  •  Technical writers of user manuals


People in all of these roles look to requirements documents to support their deliverables. A central place where data is defined, either in each document or centrally for the project, would be of great benefit. NOTE: If available, an organization-wide data dictionary should be referenced for existing business data definitions, adding to that resource any additional project-defined terms and their definitions.

Agile Scrum Team Members — As user stories are written and refined they will reference entities and attributes that need to be delivered. Maintaining these in a shared data dictionary would mean consistent delivery of the data component across different epics or features.

What’s Ahead in this Series

Entities — The next three articles focus on Entities. The first will discuss generic business entities. These have business names and definitions common within the IT systems that support functions common to all organizations, such as accounting or human resources. For example, the entities General Ledger Account and Journal Entry within accounting, and Staff Member and Position within human resources.

The following two articles focus on line of business-specific entities. The line(s) of business an organization is in influence the entity names applicable to its products, customers, sale-related business events, and locations. For example, an airline sells a Ticket to a Passenger on a specific Flight. A public library acquires a Book Copy allowing a registered Patron to Borrow it from a Branch.

The first of these two articles discusses five generic line-of-business functions — Marketing, Product Development, Sales, Customer Care, and Product Decommissioning. These are seen to represent a product lifecycle common to all organizations. The following article focuses on each of the four primary business entity concepts a given line of business deals with — products, customers, sales, and locations.

Attributes — an attribute’s properties vary based on its intended purpose within its entity. Articles will be dedicated to discussing each the following purposes:

  • Being the entity’s Primary Business Identifier — E.g. Customer Number, Employee Number, Account Number. 
  • Naming — E.g. codes, abbreviations, people, products, buildings.
  • Quantifying — E.g. Currency amounts, product dimensions.
  • Point-In-Time Happening — E.g. Date of Birth, Purchase Date/Time.
  • Describing — E.g. in sentences, photographically, graphically.
  • Classifying — selecting from a pre-defined list. E.g. Customer Type, Skill, Gender.
  • Identifying an external entity instance — E.g. customer’s driver’s licence or credit card details.

Attribute History — Two kinds of attribute history will be discussed — business-meaningful history and audit history. Business-meaningful history means that users need visibility of changes to an attribute’s value over time, as part of normal business processes. E.g. Account Status, Discount Rate. Audit history is only needed in exception cases, where an attribute’s value is not correct, and the business wants to know the source of the incorrect value and what the previous value was.

Relationships — The three classic relationship types — one to many, many to many, and one to one — will be discussed. The use of screen mock-ups as a mechanism for defining these will be compared to how they are normally defined using entity/relationship diagrams.

Derivable Data — Different levels of complexity of data derivation will be discussed, including simple totalling, point-in-time summations (such as year to date figures), and complex rule-based derivations (such as discount amounts based on historical purchases).

Additional Topics — As this series evolves there may be additional topics that prove worthy of presenting. One that I know is lurking in the background is mandatory data (attributes or relationships). Stay tuned.

Click here for Part 2 — Generic Business Entities