Landing Page Clusters Graphic

ON-DEMAND PODCAST

What is ‘Zombie Code’ and Why Should Architects Care?

Featuring Amir Rapson, CTO and Co-Founder of vFunction

In this conversation, Amir Rapson (CTO and Co-founder of vFunction) and Oliver White (Director of Communities at vFunction) discuss Zombie Code, also referred to as “dead code”, and why architects and developers should be aware of it.

If you aren’t aware of “dead code”, you’re not alone. This is code that isn’t accessed, monitored, or observed by development teams, and frequently appears in legacy systems. We have an enlightening discussion how dead code accumulates, why it’s bad from a technical debt and security perspective, and what teams can do about it.


The Best Java Monolith Migration Tools

As organizations scale to meet the growing tsunami of data and the sudden rise of unexpected business challenges, companies are struggling to manage and maintain the applications that run their business. When unprepared, exponential data growth can tax a company’s legacy monolith Java systems and their IT departments. 

Intesa Sanpaolo Case Study (pdf)

In this case study we will describe the challenges, how Intesa Sanpaolo decided to convert one of its main business-critical applications from a monolithic application to microservices, and how a platform called vFunction helped to turn this challenge into a success.

ROI Calculator

The vFunction Return on Investment (ROI) calculator determines the key benefits – based on actual customer metrics – achieved by using the vFunction cloud native modernization platform including:

  • Time to Market Acceleration
  • Total Cost Saving
  • Total Time Savings

Video Transcript

Editor’s note: the transcription below has been edited for clarity and readability.

Oliver White (OW): Good day everyone, and welcome to this vFunction podcast. I’m Oliver White, and joining me is our CTO and co-founder, Amir Rapson. Today, we are talking about zombie code, also referred to as “dead code”, which is slightly misleading because this code is still very much alive when called into action or potentially accessed by cyber attackers. This is what we’re talking about today. Amir, it’s great to speak with you.

Amir Rapson (AR): Great to speak with you also, Oliver, very excited on this first podcast with vFunction.

OW: Yeah, likewise! So until very recently, honestly, I’d never heard of dead code or zombie code before, and it became immediately appealing to me. And I bet that there’s a lot of people out there who aren’t aware of this either. Maybe you could just give us a quick overview of what zombie code is and why we should maybe worry about it.

AR: So dead code is really maybe a term that people are a little bit more familiar with. And dead code is usually referred to as unreachable code. There’s certain coding or application that’s never reached. 

But even theoretically, you never get to that point in the code, and then your IDE kind of points it out to you that this is unreachable code or dead code and then you can just delete it. There are actually a lot more cases where you should be very mindful about code that doesn’t run. And also, you have to look at it in the context in which it’s supposed to run, and I’ll elaborate on that.

So one easy use case to realize is that, okay, you have a certain application, but that application changes over time, and the way your users are using that application also changes over time, and they might not use a certain functionality for a long time. And you as developers or architects, you rarely have that insight into just what functionality and code isn’t used anymore.

OW: So what you’re talking about is, for example, you’ve added some new functionality to a service to make the user experience better, ideally. And that is using new code, and therefore, not calling on older code that isn’t part of this upgrade, but that old code hasn’t been removed and it’s still around.

AR: That is an example, but I want to also explain why it’s so difficult to find, because it’s not completely separate and independent paths. It’s not that, okay, you added some functionality and it’s going to this route, and now you added a new functionality through this route and you can then just delete the whole thing. We’re talking about code and classes and they maybe start differently, but they call the same classes, and then they kind of go and do their own separate thing.

Once you put some classes or code into a project, you don’t really know who’s going to use it, and you don’t know if that class is maybe used through other use cases as well. Maybe there’s a different API that gets to that class and calls it eventually, or maybe there isn’t. 

So it’s not that it’s unreachable–your IDE has no idea if that class is being used, and you have to profile your code somewhere to somehow to really understand if it’s used or not. So this is dead code, really dead code. In the context of your production application it’s dead, it never runs. It might run in your tests, by the way, and it’s covered in your tests, but it doesn’t run, and that’s dead.

OW: So we talked about why we wanted to call it “zombie code” versus “dead code”, and I think I suggested that, well, if it can be reactivated again somehow or touched by a production system, then it’s not exactly dead. It’s kind of more like a zombie that’s lurking.

AR: Yeah, that’s true. But it even gets more complicated than that. And also, I don’t think we talked about why it’s so risky to have these pieces of code in your application. So maybe we’ll touch on that a bit later.

OW: Let me play devil’s advocate for a minute. Okay, we’ve got a few classes floating around. Alright, they’re not being called, you said. They’re being run through tests even though nothing’s really accessing them. It’s a few lines of code, let’s say, why is it actually such a problem? Why can’t we just ignore it?

AR: The answer here is two-fold. One is that you may continue to, in a good case, maintain those pieces of code that never run. So you’re actually wasting resources on code that doesn’t need to run. That’s the best case. 

The worst case is that you stopped maintaining those classes a while ago. Now, a developer comes to add a new functionality. They come across a certain class that they think is doing something. Through some behavior, that dead code starts to run again. So that code is now revived, but now it’s code that wasn’t maintained for a while, it wasn’t used by anyone.

OW: Now, it’s zombie code…

AR: And then it becomes zombie code. So you might awaken the dead there a bit.

OW: There’s no way of preventing anyone from stumbling across this code? It’s hidden, it’s not monitored, it’s not traced, and it’s floating out there. And when a developer is writing new code, they have no idea if they might actually touch that zombie code out there?

AR: No they don’t. If you’re a developer, sometimes you stumble upon those pieces of code and then you say, “Wait, I have no idea how this class works. It says that it’s doing something, but we don’t do that anymore.” 

So that’s what it’s like when you stumble upon that dead code. But again, what happens is that in the best case you fix it to work somewhere, you see that it’s not covered by tests, so you add some test and you spend a lot of time on doing something that won’t add any value. And the worst case, you just say, “Okay, well, if it works, it works, and don’t touch it.”

OW: So as long as it’s not breaking anything, it’s okay, don’t touch it.

AR: But then you maybe kind of revive that code, yeah.

OW: The “don’t touch it because you might break something” mentality is definitely not where the industry is moving these days, so it sounds like zombie code accumulates over time to really impact technical debt?

AR: Yes, that’s true. It’s exactly that. So it grows over time, and if your engineering velocity and the rate of innovation is actually higher, then you’re also probably accumulating dead code at a faster pace. 

You’re adding new functionality, your users are still doing things in your system probably through new paths of the code, and the old ones are still there. You slowly transition users because that’s what you do nowadays­–you don’t switch everything, and you do some AB testing where you switch half the people to the new flow. 

And then these classes kind of stay behind, and they’re kind of still being used through some other flows. It’s not easy. By the way, there’s another class of dead code that is even more complex than that.

OW: Oh, tell us.

AR: This is the interesting piece because this is very hard to do even with existing tools other than vFunction, really. So classes can be used for several domains, and there are several services that might use the same class, but maybe not exactly the same way. So a certain service can use a certain class one way and a different service will use that same class a different way. And through that class, it may call different code paths. 

If you think about that specific class only in the context of a specific service, then everything that’s not called from that specific service is actually dead code, because if I’m looking at it as a class, as something that might run in a certain in a separate service, then half of it is useless, so half of it is going to be dead code.

If I take the same class and think about it in the other service, then the other half is going to be dead code. Now, when I look at my standard code coverage tests and my code coverage tools, everything is covered. No dead code, right? If I look to profile it through an APM or something like that, I’ll see those classes running, so I won’t see any dead code. 

Only if I look at it in the context of where that class was called, based on a service, based on a domain, based on an endpoint, then I start to really understand very deeply which paths in the code shouldn’t be there. And these are exactly the points in the code where that causes clutter, complexity, and breaks the modularity of your code.

OW: So there’s dead-dead and slightly dead code.

AR: So dead, mostly dead and haha…

OW: So the first kind of dead code or zombie code we were talking about is essentially stuff that’s never accessed, and it’s literally floating along as extra baggage. But what you just referred to, I think, is code that is dead, let’s say, for half of its life cycle for no good reason and then active, right? And this is something that I would presume leads to accumulating technical debt and other problems. And in fact, that’s harder to find because you need to kind of see it at runtime, right?

AR: Yes, and understand that you’re looking at runtime, not at a specific class, but the whole call tree and the whole stack of classes being called one after the other.

OW: We’ve talked about why this is just generally annoying; it’s annoying for developers, it’s negative for adding technical debt, it accumulates not only over time, but the faster you innovate, which was not wonderful to hear. 

But let’s talk about another aspect, which is security. And recently we’ve seen this group called Elephant Beetle that has figured out a way to exploit legacy Java apps to the tune of millions of dollars. Is zombie code, let’s say, neutral, a negative or somehow a protector, in terms of security from cyber-attacks that go after legacy Java applications?

AR: Well, yes and no. No, because it doesn’t affect your regular users. And because it’s within your code, it’s still scanned with the same tools; however, it’s not maintained the same. 

When developers look at code, they don’t look at that code exactly in the same frequency as they do with classes that they’re actually working on. So it’s not really maintained. The best practices that were there five years ago when that code was written are not the same best practices that you have at the moment. Maybe it wasn’t reviewed properly. So those are pieces of code that are simply rusting.

OW: From what I understand, this code, unless something happens at compile time, most developers working on a project would have no clue that this code even exists. They wouldn’t be able to see it. It’s not being tracked. It’s not being monitored. There’s no observability into this code, either, unless you’re specifically going in and searching for it. Is that correct?

AR: Yes, that’s correct. That’s exactly it.

OW: So the security threat is more like this is just code that nobody knows is out there. And if somebody really tries to look for it, they might find it and exploit that?

AR: Yes, yes.

OW: Okay, thank you. That makes sense. Let’s talk a little bit about what we are supposed to do about this. Obviously, at vFunction, we have automated analysis methods and patents that use AI and data science to find zombie code, among other things. But what would it be like for a developer to try to identify and destroy zombie code in their own legacy monolith, what would the manual do-it-yourself (DIY) process? How would that start? What would that look like?

AR: That’s a good question. I think that, as with anything else around, good software engineering it’s about being aware. So whenever you spot a piece of code that you think you remember that it did something, but you don’t know what it’s still used for, don’t just pass it on, but rather mark it to something that needs more exploring, because you don’t want to leave that dead code behind. 

Look at your code coverage tools. In your code coverage tools, you do see which classes are covered and which are not. If a certain class is not covered at all, look at it as dead code, right? 

Or add specific tests and ask yourself why it’s not covered. If you have an APM running in your production systems, see if you can get that data and compare that with your coverage reports to see if certain paths are covered and certain paths are not. 

If you do have test coverage, try every now and then to add a simple set of tests. Some of your tests that are related to a specific service, let’s say a specific model or specific module, and see the coverage of those specific modules rather than the context, rather than all the tests together of the entire system, and start to explore those things. 

So this is the way you’ll do it manually. It will take time, but you’ll find it eventually, especially in a position where you do suspect that they’re much bigger than they should be.

OW: Yeah. So it starts with a gut feeling. But then it sounds like we’re looking at static analysis tools, APM monitoring tools like New Relic, AppDynamics, etc, and then unit tests, and integration tests?

AR: Yes.

OW: So you run all these tests somehow and then what’s the output? Imagine I do those four things, what am I left with to try to see the big picture? How would I put all that together?

AR: So, the dynamic flows are the base, that’s what’s needed for something to run. And that dynamic analysis is done either in production, which gives you the real production flows, or during tests that give you the test flows. Both of them are dynamic analysis based on running flows through the system. 

What’s actually dead is what isn’t running those flows, and those are the static dependencies that don’t become dynamic dependencies. So you need to take your static analysis and kind of carve out pieces that didn’t appear in the dynamic analysis in those specific flows.

OW: It this like looking for a black hole?

AR: Like a dark piece, yes, that’s actually a good example. Look at the constellation of your classes, and look for the dark places in the middle. That’s exactly it.

OW: This sounds like a lot of work, and I would be really thrilled to have conversations with engineers who have actually done exactly this and have figured out how to identify zombie code manually. Is there a better way to do it? And let’s say, we know there is and it’s called automation, and [laughter] now we’re going to talk about how vFunction does this and kind of how… Maybe a little bit of estimating how much time it would take for one developer to embark on this for, let’s say, an application that has 500 Java classes versus doing it in an automated way. Could you shed some light on that?

AR: So 500 classes is not a big project…

OW: Oops, did I say hundred? I meant thousand–5000 classes.

AR: Five thousand classes is actually a lot to do it manually. I think it will take many, many weeks for someone. And I think it’s also a hard task to distribute between a team. So it’s not like take 10 people, give them a week, and it’ll happen.

OW: Mainly because of different levels of expertise, interest, and motivation?

AR: Yeah, interaction with the systems, carving out tests, running it on data CI/CD, getting the results back, looking at the reports. I think it also takes time to get it right. So I would say many weeks. For just 500 classes, I would say two weeks probably for one person to review the code properly that way. 

Definitely automation is needed, and that’s what we do at vFunction. We have a patent that compares the dynamic analysis with the static analysis, also to do that in the context of domains. But you can also do that with apps, like compiling a map of your domains and your services. So even without that, it just will find those places in the dependency graph, those black holes and point them out to you, so you have somewhere to start. 

So for a developer with vFunction, it’s probably like to have vFunction installed, let it run in a production environment for a while, look at the results, and 10 minutes later you have your list of usual suspects.

OW: Well, that sounds a lot better than doing it by hand. When you identify dead code, what are best practices for handling it? In some cases, can you literally just delete the code and that’ll be safe, but in other cases, I imagine you can’t do that?

AR: You must delete the code. It’s very hard to know where exactly… Where you can really save those interdependencies, because if there is one class and it ties into three other classes that are needed, or if you delete that class and two other classes are not needed anymore, it may be a little bit iterative. Again, with vFunction there’s automation for it, but otherwise it’s just going to be tedious, I think.

OW: And this is part of the complexity, mystery, and the tediousness of dealing with legacy applications that are 20 years old and 10 million lines of code, and tens of thousands of Java classes. This is what prevents a lot of people from even trying to do something about them. And I like that vFunction does a lot to just bring everything together into a big picture so that you can see what’s going on and then figure out how to take action.

AR: Yeah, that’s the challenge, and that’s also why I love what we do at vFunction, it’s a real-life problem that engineers want to deal with. It’s a big challenge to understand and find your way through these large applications, to find the right way around it, make it better, convert it, successfully, to a more modern architecture. Those are really big time challenges for engineers.

OW: Yeah, they’re very big challenges. They’re not as exciting or cutting edge as greenfield projects with Kubernetes, but these are the projects that are actually driving the business in most cases. So it’s good that we’re putting a laser focus on some of this old stuff. Back to the future. 

Well, Amir, thank you for your time today. This was a really fun conversation for me. I learned a lot. I hope our listeners and viewers learn something too. To our audience, thank you for listening. 
If you are sick of managing scary old systems like we’ve been talking about, you can visit vfunction.com, check out our ROI calculator. This will be able to give you an idea of how much legacy applications are costing you to maintain each year, so that you can show this to your executive team and potentially get some project backing for your own modernization initiatives.

Technology leaders can now evaluate the cost of technical debt, determine what to modernize first, and then take action – all in one platform.