Skip to main content

Author: Yvonne Harrison

Data Migration – The Journey of a Thousand Miles

You’ve held workshops, you’ve consulted your stakeholders, you’ve written requirements, you’ve produced use cases, you’ve collaboratively designed the UI and everyone is happy. And then seemingly out of nowhere you find that the data from the old system refuses to fit neatly into your new system. Suddenly the project that has been going so well is plunged into chaos.

How the heck did it happen?

Data migration is typically the most overlooked component of a project that involves moving from an old system to a new system. (Note: I’m using the generic term system to cover everything from applications to websites.) While there can be many people involved in discovering the new business requirements or in designing a new UI, the data migration task itself tends to be either forgotten about or delegated to one person (typically a more junior member of the team). On the surface data migration appears to be one of the easier tasks to complete. After all it’s just transferring the data from one system to another. This perceived simplicity leads many project managers (and sometimes the business analyst) to think that data migration can be separated from the main body of tasks needed to deliver the system. Even the BABOK talks vaguely about this area in section 7.4 Define Transition Requirements. The task of data migration isn’t specifically mentioned. It’s framed as, “move information between the new and old solution” or in the case of the data itself, “Rules for conversion of this information will need to be developed, and business rules may need to be defined to ensure that the new solution interprets the converted data correctly.”

This impression that the task is small in scale typically leads to scheduling the data migration near the end of the project rather than at the beginning. Unfortunately, leaving the migration analysis until later or not understanding the full implications can have fairly devastating results. The project can wind up running late and the budget is blown. Even worse, the new system starts with bad data from the old system or no data at all. If a decision is made not to move the data to ensure the delivery date doesn’t shift then the data ends up split between two systems. The old system needs to keep going for longer than intended and costs balloon as two systems are maintained to do the same job.

Data migration typically goes wrong because of a misunderstanding of what it means to collect data. If you reduce a computer system to its most basic components a computer is merely a way to collect data, store data, perform an operation on the data, get a result from the data and then use that result to generate an outcome or more data. For example, in a billing system you collect data (the name of the person being provided with a service, the contact details for the person and the service the person is being billed for), you perform an operation on it (calculating if the person owes any money for the billing period) and then you generate an outcome and/or more data (you send a bill to the person and then receive a payment from the person or the person is still in debt).

How that data is collected and stored in the old system and how it’s collected and stored in the new system determines whether your data migration will be straight forward or difficult. In data warehousing the process of moving data from a source system to a target system is known as ETL (Extract, Transform, Load).

The key to understanding the difficulties you may experience with data migration is the ‘transform’ part of ETL. It’s highly unlikely that you are going to move from your old system to a new system and not have to transform your data in some way. For example, a typical problem is that the old system may store the address details in one field. The values are comma separated. This means the street number, street name, suburb and city values are contained in one field. However the new system now has a separate field for each of these values. You now have to figure out how to move the values that are sitting in one single field in your old system to the new system with multiple fields. If you’re very lucky the users have consistently separated each value in the field with a comma. If you’re unlucky then there haven’t been any rules. Or you have several users who have made up their own rules – instead of separating each element with a comma they have separated each element with a pipe (|).

To the inexperienced and non-technical project members on the team this type of problem can seem to be a mere annoyance and Project Managers can sometimes dismiss this as a technical person over stating their case.
However even the smallest data problem can start to quickly add costs and require the business to make some tough choices. For example, if the business wants to solve the address value problem and move the values to the new system correctly then this would require someone to write an ETL script to transform that data. And before the script can be run the data is going to have to be analyzed and cleaned to ensure that the ETL can execute without failure. If there are thousands of records (or millions) then cleaning the data to be consistent enough to transform and load to the new system may require hiring temporary personnel to manually correct the records depending on the state of the data. For example, if the address values have been entered in an unstructured manner and no transform rules can be applied then it can only be corrected using human intervention and judgement.

The seemingly minor technical issue of transferring data suddenly becomes a costly and time consuming task requiring temp workers and a developer to write the ETL scripts. Faced with rising costs and having to extend the completion date for delivery the business can start to panic and the subsequent decisions can result in the data in the new system being poorly structured before it goes live. For example, the offending values are simply moved en masse into a comment field in the new system with the intent that the users will correct the problem during their normal working day.

Other issues with the data result in the entire process being deemed, “too hard” and only the data that can be transferred on a one-to-one basis is moved. For example, only a person’s first name, middle name, last name, gender and date of birth go into the new system. Everything else is archived. Archiving data is perfectly fine if you never have to look at the data again. However this will be highly unlikely and having to search between two systems creates a less than optimal user experience.

Data migration tends to have five consistent factors that contribute to issues during delivery of a project.

  1. The person responsible for performing the gap analysis may not have a data background or has ignored the significance of redesigning the business processes in terms of data collection and storage.
  2. The data migration itself is left to the last minute and is assigned to a different business analyst or a tester. They are typically isolated from the business because the migration is seen as separate technical task. The task might also be assigned to the least experienced member of the team such as a junior business analyst.
  3. The business has decided not to collect certain types of data any more or they are unsure as to why they collected the data in the first place. The initial analysis fails to identify the other units or departments in the organization that may still have a use for the data.
  4. The migration takes far longer than anticipated. A data migration can turn into a considerable intellectual challenge that requires months of analysis. This is especially true for payroll projects that have to migrate an organization’s entire employee history including leave, over time, allowances and pay rates.
  5. No one has factored in the defect rate for the migration. Even successful migrations can have a small defect rate that needs to be addressed once the records are moved. Knowing whether the migration has to be started again or whether the defect can be manually corrected can make all the difference between a delay of days, weeks or months. It’s very rare for a data migration to achieve 100% perfection when moving data from one system to another. You should always allow additional time to review and clean data in the new system if it’s needed.

Considering all of the above points, are data migration issues solvable? The key is to start as early as possible, and make sure you consult with your development team.

Your first step is to talk to your Data Warehouse team or find a developer who knows ETL.

Depending on the complexity of your data migration you’re going to need help. You need to get that help from someone who knows ETL.

You should also have a clear understanding of what it means to Extract, Transform and Load.

Your second step is to construct a data model for both systems.

What does your current system do? With any luck there is already an existing model. What does your new system do? With any luck someone has already completed a data model in your team.

If you don’t have a model for one system (or it’s missing for both systems) then you need to construct one. It’s the only way you can compare the state of the data in both systems and check for gaps.

Your third step is to identify odd fields or problems with the way the data is stored.

When you look at your model do you have a one-to-one match between both systems? Or are there strangely labelled fields that seem to make no sense? Is it as per the example at the start of the article – you have values concatenated into a single field that must be split into separate fields in the new system?

Depending on how rushed your developers or vendors were when they developed the old system they may have cut some corners in terms of how fields were named ‘under the hood’. I recently reviewed a system that had their date fields labelled, “Date1”, “Date2”, “Date3” and “Date4”. The UI had date fields on different pages that had slightly different meanings (one was a create date, one was a modify date, one was a delete date and one was a create date for a separate record). However, when these dates were stored into fields in the database they didn’t have any context. For example, is “Date1” the date the record was updated or the date the record was created?

You need to have a good understanding of what each field means before you can decide how (or if) you can move the data to the new system.

Your fourth step is to look at the specifications for the new system.

If you can’t get a data model for the new system because it’s still being designed see if you can spot any obvious problems from the specification (if the project has one).

Look for things that everyone will have presumed is covered off but wasn’t. Is data missing? Is the cardinality (the relationship between the data) incorrect?

Any or all of these things could indicate that you need to re-do the gap analysis or begin a new gap analysis.

Your fifth step is to find all the consumers of the data.

Other units or business areas in your organization may consume the data. You need to find all the consumers of this data because even if the business no longer wants to collect the data the data may be very important to other areas.

For better or for worse someone asked for the data. It’s entirely possible that it was simply added for one particular person’s reporting needs at the time. However, you need to dig these facts out and make sure that if the decision is to ignore the data going forward, it won’t result in problems later on.

This seems another obvious step but can be accidentally lost if the new system is complex or there are many people involved in the project.

Your sixth step is to check your findings with the business.

After your research is completed you should be able to explain any anomalies with the business but more importantly you should identity the possible outcomes of any migration problems. Typically the lack of data in the new system may interfere with the user’s ability to complete their tasks. The business may not have been aware of this problem when specifying the requirements for the new system.


You should be prepared for your data migration to be more difficult than originally assessed and to be forced to make decisions in which the outcome is not always ideal. You should be prepared for the business to misunderstand the implications of what they’ve asked for or when they do understand they become overwhelmed by the decisions that need to be made.

Project pressures may result in solutions that are not only less than ideal, they also create problems from a BAU (Business as Usual) perspective.

Attempting your data migration too late in a project may mean that your journey doesn’t even begin.

A successful project will realize they must start their data migration analysis as soon as possible and use experienced analysts to give themselves any chance of completing a successful transition to a new system.

Don’t forget to leave your comments below.


Agifalling – verb

  1. A description of a hybrid waterfall/Agile project in the throes of failing. “Oh no, we’re agifalling,” or “I’ve agifallen and I can’t get up.”

Many of you may have encountered what’s become known as a ‘hybrid’ project. It combines waterfall and Agile techniques to deliver software. The principle of combining the two approaches seems to have evolved from dissatisfaction with both techniques: waterfall is perceived as too rigid and Agile is perceived as not rigid enough. Although many people have had success with combining the two practices, a merely bad project with either technique can turn into a disaster when waterfall and Agile are combined without sufficient understanding or caution. If delivery is late using waterfall, delivery will completely fail on a bad hybrid project. If developers find it hard to code from traditional requirement specifications, they’ll be completely confused on a bad hybrid project. If the customer balked at the costs of doing waterfall, they’re in for a nasty surprise on a bad hybrid project.

So how does a bad hybrid project start and what does it look like? Bad hybrid projects usually begin with a poor grasp of both waterfall and Agile, and to add insult to injury, the waterfall components are included because no one trusts anyone enough to let the teams manage themselves, and Agile gets adopted because someone read a blog that said Agile would make developers code faster.
The project is consequently split into different practices depending on your role. The developers get the most Agile work. This usually involves two weeks of sprints and daily stand-ups but there’s no product owner and zero contact with the customer. The project manager’s job doesn’t change – they move around the tasks on their Gantt chart until they have everything in sequential order in much the same way as they did with waterfall, only it’s relabelled “Agile” (or “iterative”).

And what becomes of the BA? The BA is caught in the middle between delivering requirements on a waterfall-type schedule while trying to keep up with an Agile development team. In an effort to cover the bases, a BA is usually faced with producing more documentation than they would have delivered on a waterfall project.

A bad hybrid project can result in a BA writing use cases and then extrapolating user stories from the use cases for the developers. Or, distrustful of user stories (not to mention Kanban boards) and thinking use cases are too hard, the project decides that all of their requirement elicitation woes will be cured by producing extremely detailed screen prototypes that take weeks to produce (not to mention 10 to 30 pages of documentation per screen) but contain zero business context.

Although most BAs understand that the job is all about communicating the requirements in a way that allows the requirements to be understood, BAs on hybrid projects can find that they can’t keep up with delivering requirements in a way that will keep pace with a sprint. Usually a BA on these types of projects is pushed to “just make a start” without any idea of the overall features required. This is based on the assumption that Agile will help the overall goals and features naturally make themselves known as an outcome of the process. If the team has been tasked with delivering hi-fi screen prototypes as their sole source of requirements then the opposite is true. As super-detailed prototypes are developed and people are sidetracked by fonts and radio buttons, the overall design and purpose of the application seems to become more obscure.

The ensuing pressure and panic to define requirements using nothing but screen prototypes coupled with trying to keep pace with the development team usually results in a project that is in a constant state of anxiety. Keeping to schedule somehow turns into doing whatever the customer wants without question, even if what the customer wants is a very bad idea, because it’s far easier just to say yes to everything and shove the wants down the line. Any problems are pushed to the development team, which ends up with an ever-increasing product backlog and sprints that never get out of testing.

Without the additional Agile practices of regularly reflecting on the work done to date and adjusting the approach as needed, all that happens is everyone wearily traipses off to the daily stand-up so they can be harangued by the project manager about why the schedule is slipping.

What can you do if you’re a BA on a bad hybrid project?

Firstly, it’s going to depend on where you are on the project. If it’s already agifalling and the developers are drowning in sprints that never seem to deliver then it may already be too late. You can try suggesting to the project manager that it’s time to limit the requests from the customer and prioritise the product backlog. It’s also probably time to get an idea of the overall feature set to help sort out the backlog. This won’t make you popular since people cling to the notion that Agile is about accepting any and all changes at any stage, but there’s no way you’re ever going to end at the rate you’re going.

You could try talking to the project manager about Agile being a collaboration between the customer and the development team, which means that it would be desirable if the customer could be on site with the developers and the business analyst(s) for a number of hours per week to allow direct and unhindered communication. And if that’s not possible, at least arrange regular conference calls and/or Skype sessions. You can also look at going back to the prototypes giving the development team the most trouble and ensure that you do a quick user story or state a business goal for each screen, showing how the various screens relate to each other. This will at least give the developers some context.

If the idea of running a hybrid project is being discussed but has yet to start then you may have an opportunity to help the project manager avoid the problems that will rapidly appear. This will depend on how comfortable the project manager is with trying new techniques. However, if a hybrid project is pitched from the perspective of making their lives easier, a BA may be able to offer tips and tricks on how to keep the requirements process running in a way that actually focuses on delivering value for the customer (identifying business goals/needs, etc.) while tracking with the developers fortnightly sprint cycle. The project manager may be deeply uncomfortable with most Agile techniques (except for the ones that purportedly make developers code faster) but it may be an opportunity for a BA to design a sneaky template and wedge in a user story or two anyway. No one ever said that a user story had to be on a sticky note.

For example, you could add a small paragraph at the top of the screen prototype (if you’ve been forced down that path) that outlines the goal and how the customer envisions using the screen from a business perspective.

If you’re going down the path of writing use cases first and then extrapolating user stories from the use cases, you can definitely save time by factoring in the user stories at the start. Make them part of the use case as an additional section. This is a little sacrilegious but you’re on a bad hybrid project so any thoughts of using best practice or standard techniques have already gone out the window. Do whatever you can to save everyone time further down the track.

My final thoughts on the hybrid project is that they’ve emerged as yet another ICT attempt to find the silver bullet that will make the extremely hard software development stuff happen as if by a miracle. All of the techniques, whether plan driven or change driven, tend to be successful with the right team of people, the right attitude and a huge dose of pragmatism.

A bad project is a bad project no matter what technique, practice or methodology is used. But that’s a different article.

Don’t forget to leave your comments below.