Well-defined Data Part 8 – Attributes That Classify

Written by Dan Tasker on October 4, 2018. Posted in Articles.

A classification attribute allows the recording of a meaningful fact about an entity instance, with that fact drawn from a pre-established set of values.

Common forms for presenting a set of such values to users include drop-down lists, checkboxes and radio buttons.

This article will discuss three levels of complexity of attribute-based classification:

Self-defining — where the attribute represents something that is either true or not.
Value-only — where any one of the predefined values may be applicable to a given entity instance.
Complex — where one value that is applicable to an entity instance impacts other values that can be applicable to that instance.

Naming and maintaining value sets will also be discussed.

Self-defining Classification Attributes

A self-defining classification attribute is one where the fact it represents either applies or doesn’t apply to a given instance. Examples include a person holding a valid passport (or not) or eggs being from free-range chickens (or not). Because the valid set of values is a simple yes/no (or true/false) pair, it’s up to the attribute name and definition to provide the business meaning of a positive or negative value. The name of the attribute need only be descriptive enough to allow people to understand the general nature of the classification — e.g. ‘Has Valid Passport’, ‘Is Free-Range’. The attribute’s definition should provide further business details related to an appropriate choice in a given instance.

NOTE: It’s recommended that self-defining classification attributes represent the active (or positive) condition. This avoids responses that involve a double negative — e.g. ‘No Valid Passport’ requires a response of ‘No’ to indicate that the person actually has a valid passport.

Also worth noting is that there is a difference between a self-defining classification attribute and a classification attribute that has only two possible business values. For example, in accounting systems, a journal entry is classified as being either a ‘debit’ or a ‘credit’. It would be possible to define a classification attribute named ‘Is a Debit’, where a value ‘false’ implies that the entry is a credit. However, from a well-defined data perspective, the value set should be composed of the valid business values.

Value-only Classification Attributes

A value-only classification attribute is one where the only thing the organization cares about is the business-meaningful values in the classification scheme. These values may be made available for selection or be derived based on defined business rules. For example, a car dealership will classify each car by color, from a fixed value set relevant to their business, including ‘black’, ‘white’, ‘red’, etc. This set of values is sufficient for sales staff and car buyers to find all of the cars they are looking for in a specific color.

Conversely, to an organization that manufactures cars, paint color is a critical component in the manufacturing process. In this context, ‘paint color’ would be a full-fledged business entity with its own business entity identifier, plus naming, quantifying, and classification attributes of its own.

Complex Classification Attributes

A complex classification attribute is one that not only has a set of valid values, but those values involve relationships to other classification values. Continuing with the car business example, the dealership will deal with cars from different manufacturers, which will call for one value set for a car manufacturer and a second value set for the car model. There is a parent/child-type relationship between these two attributes, with a manufacturer being the parent of multiple car models. Implementation of a parent/child relationship in a user interface might involve the user initially selecting a parent value from one combo box and then selecting a child value from a second combo box, which would list only those child values relevant to the selected parent.

Another type of relationship between classification attribute values involves allowable transitions within the same value set. For example, consider the case of a business entity that has a defined set of status values, but the business wants to ensure that an instance is only allowed to transition to a selected subset of other values based on its current ‘status’ value. In addition to the value set, each value needs to identify the other values that it can transition to. State transition diagrams are good for graphically representing valid value transitions.

Classification Attribute Naming

The terms type, class, and category are often used when naming a classification attribute (e.g. ‘Customer Type’, ‘Product Category’, ‘Class of Service’). An organization’s users — familiar with existing business processes that involve one of these generically-named attributes — will also be familiar with their value sets. A classification attribute having a name that provides no clue regarding the classification scheme is only ‘well defined’ when examples of its values are available. For example, the name ‘Customer Type’ is, by itself, meaningless. Add example values ‘new’ and ‘existing’ and all becomes clear.

Value Set Change Process

The maintenance of value sets within an IT-based system typically takes place under the ‘Administration’ functionality. The process should ensure that all change requests originate from an authorized source. Users of the classification scheme ideally are given advance notice of any changes.

A value that becomes no longer applicable for the organization, and should therefore no longer be available for selection, should be end-dated rather than deleted. Similarly, a newly added value should have an effective date associated with it, unless all new values for a given classification scheme take effect immediately.

Additional complexity in changing a value set arises when the classification values are involved in business rules, process flows, or interfaces to other systems. Any added value needs to be accounted for in the rule specification, in existing or added process flow decision points, and/or interfaces. Both the value change and the associated system changes will need to be tested before they are put into production.

Well-defined Classification Attributes

The attribute should have the best name that the organization can provide. Similarly, the values, when textual, should also be as meaningful as possible. As with attributes that name (discussed in Part 6 of this series) the organization may want, in addition to the full ‘name’ of each value, an abbreviated and/or code value. When this is the case, these should be included as part of the definition.

At attribute definition time, the full value set may or may not be known or available. If not, or if it’s a large value set, enough examples should be included to make the classification scheme understandable. The definition should indicate whether the values are examples or they represent the complete value set.

Where classification values are derivable, that derivation should be identified, either as part of the definition or by referencing the derivation business rules (maintained separately).

The example values provided should be sufficient for designers to know what is needed from a database-definition perspective. The examples should be indicative of field size, data type, and precision if required for numeric value sets, and so these properties should not be required.

NOTE: As mentioned at the beginning of this article, there are a number of ways value sets can be presented in user interfaces (e.g. combo-boxes, checkboxes, radio buttons). Usability is a design issue and, as such, is outside the scope of this series.

Coming in Part 9 — Point in Time Attributes

Click here for Part 1 – Series Introduction