Replacing Software: Common Risks
A while back I wrote a post on the hidden costs of replacing existing software systems. A related topic are the risks associated with replacing software.
Every software project faces its own risks. In this post I want to talk about common ones I have seen on replacement projects over the years.
Overview
- Missing critical features
- Organizational and/ or user resistance
- Losing data
- Underestimating integrations
- Lack of visible progress
- Schedule slippage
- Performance regressions
- Incomplete contingency plans
- Technical debt transfer
Missing Critical Features
There are the obvious features, and then there are the hidden ones. It is easy to overlook functionality such as background jobs, rare (but important) workflows, third-party integrations, etc.
I once worked on a project where we migrated an existing system to the cloud. One of the restrictions the cloud platform provider informed us of, was that long-running processes would not be supported.
The migration project timeline was estimated by a different development team than the one that originally built the system. The new team did not know about a data import job that ran once a day. This job took several hours to complete, and would thus violate the cloud provider's restrictions.
Unfortunately, the code for this job made up 1/3 of the entire codebase. It was, by far, the most complex part of the system.
Oops.
Organizational and/or User Resistance
People do not like change.
Users may have been working with the existing system for decades. They have gotten used to the old, flawed processes and the outdated UI. They know the quirks and workarounds.
Telling users to change the way they are doing things is a hard sell.
A lot of developers are resistant to changing operating systems, IDEs, or programming languages. I have seen people quit over such changes.
Also, often they are not as tech-savvy as the designers and developers would expect. Learning a new software system is difficult for many. And especially when you are already stretched thin in your daily work, not getting the new software to do what you want it to is nerve wrecking.
Organizational resistance might also be a problem. As an example, imagine a separate team's software becoming obsolete with your changes. Or the manager, who fears losing control over processes. Or employees fearing their jobs being automated away by the new system.
Losing Data
There is a lot that can go wrong when migrating data from one system to another:
- Data formats may not match
- It might not be possible to retain history or metadata
- Sometimes data is simply forgotten or deemed irrelevant during migration until it is too late
As long as you keep backups of everything, there should be a way to migrate missing data later on. But that is a time-consuming and error-prone process.
Underestimating Integrations
This one is a favorite of mine.
On past projects I have worked on integrations a lot. I continue to be surprised how often system integrations are underestimated - even on greenfield projects.
When working on legacy systems, integration points are often times outright forgotten.
Tell me if you heard one of these before:
- "Oh, we don't actually know if anyone is still using that API."
- "Other applications use the same database, I think..."
- "The system writes files over FTP somewhere. Not sure why though."
Another story from a past project: We were replacing a cost calculation software for printing jobs. The old system needed to have everything entered manually, but the requirement for the replacement was to fetch data from and write to the ERP system.
Turned out, the ERP had an API for just that. The API offered reading and writing free text fields. We had to write custom parsers for those fields and rely on users entering everything correctly and the ERP not changing anything during software updates.
Ah, yes, and creating and deleting entries was not possible. Only reading and updating fields.
Fun times.
You can imagine that we underestimated that particular integration point quite a bit.
Lack of visible Progress
I could have named this one "Management gets tired of hearing 'Nothing to show yet' week after week". When replacing software, especially large systems, it can take a long time until there is something visible to show stakeholders.
Even when you show them "progress", paper mockups, wireframes, or prototypes, are not what management might want to see. Few teams can build working software quickly.
Even then, the software
- might not be something you want to show yet
- is certainly not ready for production
Stakeholder management is difficult to begin with. And, in my experience, when working on modernization projects, it is even more so.
Expectations are often high, because there is already the old, working system. Also, more often then not, those modernization projects have been in the pipeline for a while. There is a certain impatience to get it over with.
Schedule Slippage
A consequence of the other difficulties mentioned here and in my previous post.
Unaccounted or underestimated risks lead to delays.
It is usually a combination of many different risk factors, combined with an optimistic schedule.
I have yet to see a replacement project that finished on time. Hell, I would be surprised to see one first hand that does not run over schedule by at least 50%.
Performance Regressions
New software is not necessarily faster than old software.
The old system may have been optimized over years. Every database query, every caching layer, every load balancer, every CDN setting, every background job, may have been tweaked to squeeze better performance out of the thing.
Build a new system, and you start over.
"Never optimize prematurely" is a common saying in software development - and for good reason!
But the performance of the new system has to at least match the old one. So you will have to invest time and effort into performance testing and optimization.
If your schedule is already tight (see above), there may be no time for performance improvements and it might simply suck when the new system goes live.
I promise I'll stop with the "stories from past projects" soon, but:
I was a (very small) part of a huge effort to introduce SAP to the factory floor at a manufacturing company. My part was to write and test several HTML-based user interfaces that had to be operated by a hand-held scanner.
The problem was, that my frontend had to do several round-trips to the SAP backend for every work item processed. And the responses took half a second each.
The people working on the floor had to process their items rapidly - they scanned several per second by hand.
The new system became a bottleneck and there was no good way to optimize it further without serious changes to the backend.
Incomplete Contingency Plans
What if things go wrong?
When modernizing software, you have to have mitigation plans and rollback mechanisms in place.
This is not a problem unique to replacement projects. Whenever you make significant changes to software, you have to be prepared for mistakes.
But when replacing software, the time until you get critical feedback is usually a lot longer - sometimes months.
Especially on "big bang releases" (where you switch from the old to the new system in one go), anything and everything can go wrong.
It is almost guaranteed that the first attempt to take the new system live will fail.
If you can not switch back to the old system quickly, you are in big trouble.
One more for the road:
This is not a software replacement story (but the software in question was replaced after this incident).
On a legacy system I maintained at the time, my team and I tried to introduce an ORM to replace raw SQL queries. It took a few weeks to implement, but tests looked promising.
Then we took the updates live. Within minutes, the system slowed and crashed.
What happened? We did not know at first, but the ORM loaded the entire database into memory on startup. On the test system, we did not notice, as the database was small enough to be loaded easily.
Problem was: the introduction of the ORM needed irreversible database schema changes. We could not roll back. We needed to fix the system and deploy patches, urgently.
Every patch broke something else and for a miserable two weeks, our system simply did not work. Customers were not happy.
Neither was I, as I was on-call that week. I was screamed at a lot.
Technical debt transfer
When replacing software, there is always the risk of building up new technical debt.
If you aim to replace the old system because it was a mess, be careful not to create another one.
That can happen for many reasons, but the most common is schedule pressure.
When the deadline looms, shortcuts are taken and hacks follow.
I'm sure you have seen newly developed software with a long backlog of technical debt as soon as it is live.
Conclusion
I could list off many more risks, but these are the ones I have seen.
There are different modernization strategies, replacing software is not always the best option. I argue, it seldom is.
If you need help evaluating your options, feel free to reach out to us. You can find the contact information in the footer of this website.
And if you want to talk about your stories, let us know as well!
