Data Migration – The Journey of a Thousand Miles
You’ve held workshops, you’ve consulted your stakeholders, you’ve written requirements, you’ve produced use cases, you’ve collaboratively designed the UI and everyone is happy. And then seemingly out of nowhere you find that the data from the old system refuses to fit neatly into your new system. Suddenly the project that has been going so well is plunged into chaos.
How the heck did it happen?
Data migration is typically the most overlooked component of a project that involves moving from an old system to a new system. (Note: I’m using the generic term system to cover everything from applications to websites.) While there can be many people involved in discovering the new business requirements or in designing a new UI, the data migration task itself tends to be either forgotten about or delegated to one person (typically a more junior member of the team). On the surface data migration appears to be one of the easier tasks to complete. After all it’s just transferring the data from one system to another. This perceived simplicity leads many project managers (and sometimes the business analyst) to think that data migration can be separated from the main body of tasks needed to deliver the system. Even the BABOK talks vaguely about this area in section 7.4 Define Transition Requirements. The task of data migration isn’t specifically mentioned. It’s framed as, “move information between the new and old solution” or in the case of the data itself, “Rules for conversion of this information will need to be developed, and business rules may need to be defined to ensure that the new solution interprets the converted data correctly.”
This impression that the task is small in scale typically leads to scheduling the data migration near the end of the project rather than at the beginning. Unfortunately, leaving the migration analysis until later or not understanding the full implications can have fairly devastating results. The project can wind up running late and the budget is blown. Even worse, the new system starts with bad data from the old system or no data at all. If a decision is made not to move the data to ensure the delivery date doesn’t shift then the data ends up split between two systems. The old system needs to keep going for longer than intended and costs balloon as two systems are maintained to do the same job.
Data migration typically goes wrong because of a misunderstanding of what it means to collect data. If you reduce a computer system to its most basic components a computer is merely a way to collect data, store data, perform an operation on the data, get a result from the data and then use that result to generate an outcome or more data. For example, in a billing system you collect data (the name of the person being provided with a service, the contact details for the person and the service the person is being billed for), you perform an operation on it (calculating if the person owes any money for the billing period) and then you generate an outcome and/or more data (you send a bill to the person and then receive a payment from the person or the person is still in debt).
How that data is collected and stored in the old system and how it’s collected and stored in the new system determines whether your data migration will be straight forward or difficult. In data warehousing the process of moving data from a source system to a target system is known as ETL (Extract, Transform, Load).
The key to understanding the difficulties you may experience with data migration is the ‘transform’ part of ETL. It’s highly unlikely that you are going to move from your old system to a new system and not have to transform your data in some way. For example, a typical problem is that the old system may store the address details in one field. The values are comma separated. This means the street number, street name, suburb and city values are contained in one field. However the new system now has a separate field for each of these values. You now have to figure out how to move the values that are sitting in one single field in your old system to the new system with multiple fields. If you’re very lucky the users have consistently separated each value in the field with a comma. If you’re unlucky then there haven’t been any rules. Or you have several users who have made up their own rules – instead of separating each element with a comma they have separated each element with a pipe (|).
To the inexperienced and non-technical project members on the team this type of problem can seem to be a mere annoyance and Project Managers can sometimes dismiss this as a technical person over stating their case.
However even the smallest data problem can start to quickly add costs and require the business to make some tough choices. For example, if the business wants to solve the address value problem and move the values to the new system correctly then this would require someone to write an ETL script to transform that data. And before the script can be run the data is going to have to be analyzed and cleaned to ensure that the ETL can execute without failure. If there are thousands of records (or millions) then cleaning the data to be consistent enough to transform and load to the new system may require hiring temporary personnel to manually correct the records depending on the state of the data. For example, if the address values have been entered in an unstructured manner and no transform rules can be applied then it can only be corrected using human intervention and judgement.
The seemingly minor technical issue of transferring data suddenly becomes a costly and time consuming task requiring temp workers and a developer to write the ETL scripts. Faced with rising costs and having to extend the completion date for delivery the business can start to panic and the subsequent decisions can result in the data in the new system being poorly structured before it goes live. For example, the offending values are simply moved en masse into a comment field in the new system with the intent that the users will correct the problem during their normal working day.
Other issues with the data result in the entire process being deemed, “too hard” and only the data that can be transferred on a one-to-one basis is moved. For example, only a person’s first name, middle name, last name, gender and date of birth go into the new system. Everything else is archived. Archiving data is perfectly fine if you never have to look at the data again. However this will be highly unlikely and having to search between two systems creates a less than optimal user experience.
Data migration tends to have five consistent factors that contribute to issues during delivery of a project.
- The person responsible for performing the gap analysis may not have a data background or has ignored the significance of redesigning the business processes in terms of data collection and storage.
- The data migration itself is left to the last minute and is assigned to a different business analyst or a tester. They are typically isolated from the business because the migration is seen as separate technical task. The task might also be assigned to the least experienced member of the team such as a junior business analyst.
- The business has decided not to collect certain types of data any more or they are unsure as to why they collected the data in the first place. The initial analysis fails to identify the other units or departments in the organization that may still have a use for the data.
- The migration takes far longer than anticipated. A data migration can turn into a considerable intellectual challenge that requires months of analysis. This is especially true for payroll projects that have to migrate an organization’s entire employee history including leave, over time, allowances and pay rates.
- No one has factored in the defect rate for the migration. Even successful migrations can have a small defect rate that needs to be addressed once the records are moved. Knowing whether the migration has to be started again or whether the defect can be manually corrected can make all the difference between a delay of days, weeks or months. It’s very rare for a data migration to achieve 100% perfection when moving data from one system to another. You should always allow additional time to review and clean data in the new system if it’s needed.
Considering all of the above points, are data migration issues solvable? The key is to start as early as possible, and make sure you consult with your development team.
Your first step is to talk to your Data Warehouse team or find a developer who knows ETL.
Depending on the complexity of your data migration you’re going to need help. You need to get that help from someone who knows ETL.
You should also have a clear understanding of what it means to Extract, Transform and Load.
Your second step is to construct a data model for both systems.
What does your current system do? With any luck there is already an existing model. What does your new system do? With any luck someone has already completed a data model in your team.
If you don’t have a model for one system (or it’s missing for both systems) then you need to construct one. It’s the only way you can compare the state of the data in both systems and check for gaps.
Your third step is to identify odd fields or problems with the way the data is stored.
When you look at your model do you have a one-to-one match between both systems? Or are there strangely labelled fields that seem to make no sense? Is it as per the example at the start of the article – you have values concatenated into a single field that must be split into separate fields in the new system?
Depending on how rushed your developers or vendors were when they developed the old system they may have cut some corners in terms of how fields were named ‘under the hood’. I recently reviewed a system that had their date fields labelled, “Date1”, “Date2”, “Date3” and “Date4”. The UI had date fields on different pages that had slightly different meanings (one was a create date, one was a modify date, one was a delete date and one was a create date for a separate record). However, when these dates were stored into fields in the database they didn’t have any context. For example, is “Date1” the date the record was updated or the date the record was created?
You need to have a good understanding of what each field means before you can decide how (or if) you can move the data to the new system.
Your fourth step is to look at the specifications for the new system.
If you can’t get a data model for the new system because it’s still being designed see if you can spot any obvious problems from the specification (if the project has one).
Look for things that everyone will have presumed is covered off but wasn’t. Is data missing? Is the cardinality (the relationship between the data) incorrect?
Any or all of these things could indicate that you need to re-do the gap analysis or begin a new gap analysis.
Your fifth step is to find all the consumers of the data.
Other units or business areas in your organization may consume the data. You need to find all the consumers of this data because even if the business no longer wants to collect the data the data may be very important to other areas.
For better or for worse someone asked for the data. It’s entirely possible that it was simply added for one particular person’s reporting needs at the time. However, you need to dig these facts out and make sure that if the decision is to ignore the data going forward, it won’t result in problems later on.
This seems another obvious step but can be accidentally lost if the new system is complex or there are many people involved in the project.
Your sixth step is to check your findings with the business.
After your research is completed you should be able to explain any anomalies with the business but more importantly you should identity the possible outcomes of any migration problems. Typically the lack of data in the new system may interfere with the user’s ability to complete their tasks. The business may not have been aware of this problem when specifying the requirements for the new system.
You should be prepared for your data migration to be more difficult than originally assessed and to be forced to make decisions in which the outcome is not always ideal. You should be prepared for the business to misunderstand the implications of what they’ve asked for or when they do understand they become overwhelmed by the decisions that need to be made.
Project pressures may result in solutions that are not only less than ideal, they also create problems from a BAU (Business as Usual) perspective.
Attempting your data migration too late in a project may mean that your journey doesn’t even begin.
A successful project will realize they must start their data migration analysis as soon as possible and use experienced analysts to give themselves any chance of completing a successful transition to a new system.
Don’t forget to leave your comments below.