Quality Testing Legacy Code – Challenges and Benefits

Bob Quillin

May 10, 2022

Many of the world’s businesses are running enterprise applications that were developed a decade ago or more. Companies built the apps using a monolithic application architecture and hosted them in private data centers. With time, these applications have become mission-critical for the business; however, they come with many challenges as they age. Testing legacy code uncovers some of these flaws.

In many cases, companies developed the apps without following commonly accepted best practices like TDD (Test Driven Development), unit tests, or automated testing. The testers usually created a test-plan document that listed all potential test cases. But as the developers added new features and changed old ones, testing use cases may not have kept up with the changes. As a result, tests were no longer in sync with the application functionality.

Thus, testing became a hit-or-miss approach, relying mainly on the domain knowledge of a few veteran employees. And when these employees left the organization, this knowledge departed with them. The product quality suffered. Customers became unhappy, and employees lost morale. This is especially salient these days, in what is being called The Great Resignation.

Poor Code Quality Affects Business: Prevent It By Testing Legacy Code

Poor code quality can lead to critical issues in the product’s functionality. In extreme cases, these issues can cause accidents or other disasters and even lead to deaths. The company’s reputation takes a hit as the quality of its products plummet.

Poorly written code results in new features taking longer to develop. The product does not scale as usage increases, leading to unpredictable performance. Product reliability is a big question mark. Security flaws make the product vulnerable, inviting the unwelcome attention of cyber-attackers.

Current users leave, and new prospects stay away. The company spends more on maintaining technical debt than on innovation in order to boost consumer and employee confidence.

Ultimately, the company’s standing suffers, as do its revenues. Thus, code quality directly affects a company’s reputation and financial performance.

How Do We Define Code Quality?

How do we go about testing legacy code quality, and what characteristics does good code have? There is no straightforward answer, as coding is part art and part science. Therefore, estimating code quality can be a subjective matter. Nevertheless, we can measure software quality in two dimensions: qualitatively and quantitatively.

Qualitative Measurement of Code Quality

We cannot conveniently or accurately assess code quality in this way with tools. Instead, we must measure them by other means, such as code reviews by experts, or indirectly, by observing the product’s performance. Here are some parameters that help us evaluate code quality.


Software applications must keep changing in response to market and competitor requirements. So, developers should be able to add new features and functionality without affecting other parts of the system. Extensibility is a measure of whether the design of the software easily allows this. 


Maintainability refers to the ease of making code changes and the associated risks. It depends on the size and complexity of the code. The Halstead complexity score is one measure of maintainability. (Note that extensibility refers to adding large chunks of code to implement brand new features, whereas maintainability refers to making comparatively minor changes).


Testability is a function of the number of test cases needed to test the system by covering all code paths. It measures how easy it is to verify all possible use cases. The cyclomatic complexity score is an indicator of how testable your app is.


Portability shows how easily the application can run on a different platform. You can plan for portability from the start of development. Keep compiling and testing on target operating systems, set compiler warning levels to the highest to flag compatibility issues, follow a coding standard, and perform frequent code reviews.


Sometimes developers use the same functionality in many places across the application. Reusability refers to the ease with which developers can share code instead of rewriting it many times. It is easier to reuse assets that are modular and loosely coupled. We estimate reusability by identifying the interdependencies in the system.


Reliability is the probability that the system will run without failing for a period of time. It is also called availability. A measure of reliability is Mean Time Between Failures (MTBF).

To summarize, these parameters are difficult to quantify, and we must determine them by observation over a period. If the application performs well on all these measures, it is likely to be high quality.

Related: How to Conduct an Application Assessment for Cloud Migration

Quantitative Measures of Code Quality

In addition, there are several quantitative metrics for measuring code quality.

Defect Metrics

Quality experts use historical data (pertaining to the organization) to predict how good or bad the software is. They use metrics like defects per hundred lines of code and escaped defects per hundred lines of code to quantify their findings.

Cyclomatic Complexity

The Cyclomatic Complexity metric describes the complexity of a method (or function) by a number. In simplistic terms, it is the number of unique execution paths in the code and hence the minimum number of test cases needed to test it. The higher the cyclomatic complexity, the lower the readability, and the higher the maintainability.

Halstead Metrics

The Halstead metrics comprise a set of several measurements. Their basis is the number of operators and operands in the application. The metrics represent the difficulty in understanding the program, the time required to code, the number of bugs testers should expect to find, and others.

Weighted Micro Function Points (WMFP)

The WMFP is a modern-day successor to classical code sizing methods like COCOMO. WMFP tools parse the entire source code to calculate several code complexity metrics. The metrics include code flow complexity, the intricacy of arithmetic calculations, overall code structure, the volume of comments, and much more.

There are many other quantitative measures that the industry uses in varying degrees. They include Depth of Inheritance, Class Coupling, Lines of Source Code, Lines of Executable Code, and other metrics.

The Attributes of Good Code Quality

We have seen that it is problematic to quantify code quality. However, there are some common-sense attributes of good quality:

  • The code should be functional. It should do what users expect it to do.
  • Every line of code plays a role. There is no bloating and no dead code.
  • Frequently run automated tests are available. They provide assurance that the code is working.
  • There is a reasonable amount of documentation.
  • The code is readable and has sufficient comments. It has well-chosen names for variables, methods, and classes. The design is modular.
  • Making changes, and adding new features, is easy.
  • The product is not vulnerable to cyber-attacks.
  • Its speed is acceptable.

What is Technical Debt?

Technical debt results from a software team prioritizing speedy delivery over perfect code. The team must correct or refactor the imperfect code later.

Technical debt, like the financial version, is not always bad. There are benefits to borrowing money to pay for things you cannot afford. Similarly, there is value to releasing code that is not perfect. You get experience, feedback, and in any case, you repay the debt later on, though at a higher cost. But, because technical debt is not as visible to business leaders, people often ignore it.

There are two types of technical debt. A team consciously takes on intentional debt as a strategic decision. It inadvertently incurs unintentional debt because of monolithic application code.

Again, like financial debt, technical debt is manageable to some extent. Once it grows beyond a point, it affects your business. Then you have no choice but to address it. Technical debt is difficult to measure directly. However, a host of issues inevitably accompany technical debt. You either observe them or find them while testing. Here are some of them:

The Pace of Releasing New Features Slows Down

At some point, teams start spending more time on reducing tech debt (refactoring the code to get it to a state where adding features is not very difficult) than on working on new features. As a result, the product lags behind the competition.

Releases Take Longer

Code suffering from tech debt is code difficult to read and understand. Developers who add new features to this codebase find it difficult and time-consuming. Release cycle times increase.

Poor Quality Releases

Thanks to technical debt, developers take longer than planned to deliver builds to the QA team. Testers have insufficient time to test thoroughly; therefore, they cut corners. The number of defects that escape to production increases.

Regression of Issues

As technical debt increases, the code base becomes unstable. Adding new code almost inevitably breaks some other functionality. Previously resolved defects resurface.

When you face these issues in your organization, you can be sure that you have incurred a significant amount of technical debt and must pay it off immediately.

How to Get Rid of Technical Debt

The best way of paying off technical debt is to stop adding new features and focus only on refactoring and improving the code. List out all your problems and resolve them one by one. Map sets of fixes to releases so that the team continues its cadence of rolling out regular updates.

When these issues get out of hand, focus exclusively on paying off the technical debt. It is time to stop maintaining and start modernizing.

Related: What is Refactoring in Cloud Migration? Judging Legacy Java Applications for Refactoring or Discarding

Differences in Testing Legacy Code vs. New Code: Best Practices for Testing Change Over Time

Often, the only way to pay off tech debt for enterprise applications is to modernize them. But then, how can the team make sure that tech debt does not accumulate again in the modernized apps? It is unlikely because testing a modern application differs from testing a legacy app. Let’s look at some of these differences.

Testing Legacy Code

  • Testers have difficulty understanding the complexity of large monolithic applications.
  • Fixing defects may have unintended consequences, so testers often expend a lot of effort to verify even minor code changes. The team must constantly test for regression.
  • Automated testing is beneficial but has to be done from scratch. Unit tests may not make sense. Instead, integration or end-to-end tests may be more suitable. The team should prioritize the areas to be automated.
  • Developers should add automated unit tests when they work on new features.

Testing Modern Applications: Challenges and Advantages

  • Modern applications are often developed as cloud-native microservices. Testing them requires special skills.
  • The software needs to run on several devices, operating systems, and browsers, so managers should plan for this.
  • Setting up a test environment with production-like test data is challenging. Testing must cover performance and scalability.
  • Test teams need to be agile. They must complete writing test plans, automating tests, running them, and generating bug reports within a sprint.
  • UI/UX matters a lot. Testers must pay a lot of attention to usability and look-and-feel.
  • Developers follow Test Driven Development (TDD). Also, Continuous Integration/Continuous Delivery pipelines support running automated test cases. Correspondingly, this improves and maintains quality and reduces the burden on test teams.

Determining Code Quality: the Easy Way

As we have seen, testing legacy code and assessing its quality is a complex undertaking. We have described some parameters and techniques for qualitatively and quantitatively appraising the quality of the code.

We must either use tools to measure these parameters or make manual judgments, say, by doing code reviews. But each tool only throws light on one parameter. So, we need to use several tools to get a complete picture of the quality of the legacy app.

So, to evaluate the quality of legacy code and decide if it is worth modernizing requires us to use several tools. And after we have partially or fully modernized the application, we want to calculate the ROI by measuring the quality of the modernized code. Again, this requires multiple tools and is an expensive and lengthy process.

vFunction offers an alternative approach: using a custom-built platform that provides tools to drive modernization assessment projects from a single pane of glass. vFunction can analyze architectural flows, classes, usage, memory, dead code, class linkages, and resources even in megaliths. They use this analysis to recommend whether modernization makes sense. Contact vFunction today to see how they can help you assess the feasibility of modernizing your legacy applications.

Bob Quillin

Bob Quillin is the Chief Ecosystem Officer for vFunction, responsible for developer advocacy, marketing, and cloud ecosystem engagement. Bob was previously Vice President of Developer Relations for Oracle Cloud Infrastructure (OCI). Bob joined Oracle as part of the StackEngine acquisition by Oracle in December 2015, where he was co-founder and CEO. StackEngine was an early cloud native pioneer with a platform for developers and devops teams to build, orchestrate, and scale enterprise-grade container apps. Bob is a serial entrepreneur and was previously CEO of Austin-based cloud monitoring SaaS startup CopperEgg (acquired by IDERA in 2013) and held executive and startup leadership roles at Hyper9, nLayers, EMC, and VMware.

Get started with vFunction

See how vFunction can accelerate engineering velocity and increase application resiliency and scalability at your organization.